Replicate/Migrate data from one database/table to another – part 1

This is about a very simple though very useful and frequently used requirement to replicate/merge data from one database/table to another.

There can be various scenarios wherein we do get data from multiple sources into one database/table and then we might be performing some data cleansing, transformation & enrichment processes on these data and then eventually forming a final record-set. This may be happening into some staging area (or into a database not exposed to external world). This is a good practice for multiple reasons but a very basic advantage is to segregate the heavy operations from the data table being accessed by the external users/applications.

So, once all is prepared here locally we need to replicate/migrate it to the external database/table. This comprises of inserting new records, updating the existing records and deleting (hard or soft based upon the requirement) onto the target table.

Requirement: Let me take a very simple requirement to depict the solution around wherein we have a source/staging table which has got some new employees added, a few attributes of some of the existing employees are updated and a few employees have left. We need to merge this to target table.

(Data cleansing, transformation & enrichment etc processes are out of scope here and for the sake of simplicity, we are assuming that both tables are in the same database otherwise if in the different database, we can use the fully qualified name of the table like database.schema.tablename)

SourceTable: dbo.T_SourceEmployee

Target Table: dbo.T_DestinationEmployee

PrimayKey:     EmployeeID

We will talk about the solution/approach first then will try with actual SQL queries.

Approach: So the basic idea is to, 1. Reach out to the records in the source table which don’t exist into the target table, 2. Find out the records which are there in the target table but don’t exist into the source table anymore and 3. Figure out if anything has changed for the records which do exist into both source and the target table

Now, what should be the order of executing these above steps? The best approach is to

  1. Update
  2. Insert
  3. Delete

Reason for this order of execution is that, if we insert first then we would need to operate on for larger record-set for further operations unnecessarily. We can perform the delete operation at the beginning though i.e. delete, update & then insert.

Technical solution: Now, there can be several coding techniques to achieve this outcome. We will discuss one of them and in the next we will talk about how to deal with the shortcomings of this first approach.

This is the approach of making use of JOINS.

  1. UPDATE

UPDATE trg
SET Designation = src.Designation,
ManagerName = src.ManagerName
FROM dbo.T_DestinationEmployee trg
INNER JOIN dbo.T_SourceEmployee src
ON trg.EmployeeID = src.EmployeeID
WHERE src.Designation <> trg.Designation –values likely to be changing
OR src.ManagerName <> trg.ManagerName

2. INSERT

–New records
INSERT INTO dbo.T_DestinationEmployee
(
EmployeeID
,EmployeeName
,DateOfJoining
,Designation
,ManagerName
)
SELECT
src.EmployeeID
,src.EmployeeName
,src.DateOfJoining
,src.Designation
,src.ManagerName
FROM dbo.T_SourceEmployee src
LEFT JOIN dbo.T_DestinationEmployee trg
ON trg.EmployeeID = src.EmployeeID
WHERE trg.EmployeeID IS NULL


3. 
DELETE

–Delete no more valid records
DELETE trg
FROM dbo.T_DestinationEmployee trg
LEFT JOIN dbo.T_SourceEmployee src
ON trg.EmployeeID = src.EmployeeID
WHERE src.EmployeeID IS NULL

There are a few shortcomings to this above approach/solution, which we would be talking of in the forthcoming articles – .

—-Scripts to try with
CREATE TABLE dbo.T_SourceEmployee
(
EmployeeID INT PRIMARY KEY
,EmployeeName VARCHAR(255)
,DateOfJoining DATE
,Designation VARCHAR(255)
,ManagerName VARCHAR(255)
)
GO
CREATE TABLE dbo.T_DestinationEmployee
(
EmployeeID INT PRIMARY KEY
,EmployeeName VARCHAR(255)
,DateOfJoining DATE
,Designation VARCHAR(255)
,ManagerName VARCHAR(255)
)
GO
INSERT INTO dbo.T_SourceEmployee
VALUES(‘101’, ‘John’, ’01Jan2014′, ‘Associate’, ‘Paul’)
,(‘102’, ‘Richard’, ’01Feb2014′, ‘Manager’, ‘Simon’)
,(‘103’, ‘Ridham’, ’10Feb2014′, ‘Software Engineer’, ‘Shankar’)
GO
UPDATE dbo.T_SourceEmployee
SET Designation = ‘Sr. Associate’
WHERE EmployeeID = 101
GO

 

Advertisements
This entry was posted in SQL Server, SQL Server 2008, Uncategorized. Bookmark the permalink.

2 Responses to Replicate/Migrate data from one database/table to another – part 1

  1. Pingback: Replicate/Migrate data from one database/table to another – Part 2 | Tech Updates

  2. Pingback: Replicate/Migrate data from one database/table to another – Part 3 | Tech Updates

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s