Tuesday, May 7, 2013

SQL Server 'Merge' functionality - A cool feature


I recently had the requirement to write a program in C# that will read a DB table and then update another DB table based on certain criterion. The requirement further stated that if a set of data is present in the former/first talbe then update the later/second table, if not then insert.

This can be done in serveral ways -
  • In the Stored procedure itself looping through the records from the first table and comparing them with the second one.
  • Use a cursor. (lesser efficient) 
  • Read one table in the C# code and then send one row at a time to another stored procedure to match (least efficient way)...
I had used bulk insert into SQL server using C# code serveral times earlier but I was looking for something that will help me to do bulk update as I was hoping to have millions of records that was going to take a lot of time. Thats when I came across this wonderful new feature of SQL Server - Merge also called Upsert.
Using merge you can do insert/update/delete all at the same time no matter what is the size of the table, in a snap of a finger. And I have used it and it really completed within few seconds. The process used to take several minutes earlier and I could save a lot of processing time.
The basic syntax goes like this.

 MERGE INTO TARGET_TABLE USING SOURCE_TABLE ON (condition)
   WHEN MATCHED THEN
   UPDATE SET column1 = value1 [, column2 = value2 ...]
   WHEN NOT MATCHED THEN
   INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...
  • rows present in the Source but missing from the Target do run the action > specifically the NOT MATCHED action
  • rows missing from the Source and present in Target are ignored > no action is performed on the Target.
One of the most important advantage of MERGE statement is all the data is read and processed in one shot. In previous versions three different statement has to be written to process three different activity (INSERT, UPDATE or DELETE), however using MERGE statement all update activity can be done in one pass of database tables. This is quite an improvement in performance of database query.

Example:
Try this example out:
CREATE TABLE Students
(
  StudentID INTEGER PRIMARY KEY,
  StudentName VARCHAR(25)
)

GO

INSERT INTO Students
    VALUES(1,'Sam')
INSERT INTO Students
    VALUES(2,'Ria')
INSERT INTO Students
    VALUES(3,'John')
INSERT INTO Students
    VALUES(4,'James')
INSERT INTO Students
    VALUES(5,'Rossi')
GO 


CREATE TABLE StudentCapability
(
    StudentID INTEGER REFERENCES Students,
    StudentMarks INTEGER
)
GO


INSERT INTO StudentCapability
    VALUES(1,10)
INSERT INTO StudentCapability
    VALUES(2,7)
INSERT INTO StudentCapability
    VALUES(3,3)
GO 

 

Now the Merge Syntax:
 
 MERGE StudentCapability AS sc
             USING (SELECT StudentID,StudentName FROM Students) AS s
      ON sc.StudentID = s.StudentID

  WHEN MATCHED AND sc.StudentMarks <= 7 THEN DELETE

  WHEN MATCHED THEN UPDATE SET sc.StudentMarks = sc.StudentMarks + 2

  WHEN NOT MATCHED THEN
      INSERT(StudentID,StudentMarks)
      VALUES(s.StudentID,1)

 OUTPUT $action, Inserted.*,Deleted.*; -- WILL SHOW YOU THE AFFECTED ROWS


The output should look like this:


$action          StudentID        StudentsMarks     StudentID      StudentsMarks 
UPDATE               1                                12                          1                             10
DELETE               NULL                        NULL                   2                            7
DELETE               NULL                        NULL                   3                            3
INSERT               4                                  1                            NULL                   NULL
INSERT               5                                  1                            NULL                   NULL

[

And when you are done may be it will be useful:

        Drop table StudentCapability
        Drop table Students]

Note:
  • The semicolon after to end the Merge statement. It is mandatory.
  • When a MATCH clause is used along with conditions, it has to be specified first amongst all other WHEN MATCH clause. 
  • After the MERGE statement has been executed, we should compare previous result and the new result to verify if our conditions are fully met. The OUTPUT statement will help achieve that.
Hope this will help someone reduce performance time to a great deal. It did for me.