Sunday, May 23, 2010

Implementing Distinct UNION Component in SSIS 2008

SSIS 2008 comes with dataflow component called UNION ALL, which is used for combining the result set from two different input sources and provides one output.
For example, suppose you want to fetch books list from two different FLAT File source and have to insert into one table [bookLis].
To implement this you can add two FLAT File source in DataFlow task, combine them using UNION ALL component and finally result from UNION ALL component can be passed on to OLEDB Destination (as indicated in below image).
















This works fine for until you dont have duplicates in input FLAT Files. Take the scenario when you have got some duplicate records in two FLAT Files and you want to insert only UNIQUE records in table. Unfortunately UNION ALL component does not provide you facility to output distinct records. So how can we implement this business requirement in SSIS.

There is one solution to the problem. we can make use of SORT component to remove duplicate records from the output produced by UNION ALL. Add SORT component to DataFlow task having input from UNION ALL component. Edit Sort component to specify sorting order on the basis of column which uniquely identifies a book record (say BookID). This will sort the result set generated by UNION ALL. Now this SORT component also provides a option to remove duplicate based on the sorted column, so if you check this option the result generated by SORT component will not have any duplicate records.
You can pass output from SORT component to your OLEDB Destination for final save. (as indicated in below images)



































I hope that you find this post useful. Feel free to provide your comments.
Thanks for reading, Cheers.