Sometimes you run into a situation where you want to allow users to update data in a simple fashion, and incorporate the data into a data warehouse. For example the client’s ERP system maintains the data element called Departments but doesn’t have the ability to store Department Groups for reporting since it is not relevant to the ERP system. Using a SharePoint list as a data source can be an easy solution. Users can update it very simply, your ETL package can add or update new Departments from the ERP system via the data warehouse, and you can pull the user entered data back into the data warehouse. There are some drawbacks to it, such as lack of data integrity checks on the SharePoint side. But if you build your ETL package right, you can correct any integrity issues which might be created by user error.
Here is what I will demonstrate, using a SharePoint list to manage the applying Department Groups to Departments which are maintained in the accounting system:
PART 1 – DOWNLOAD AND INSTALL SSIS TOOLBOX ITEMS
PART 2 – USE THE SHAREPOINT LIST AS A DATA SOURCE
PART 3 – INSERT RECORDS INTO THE SHAREPOINT LIST
PART 4 – UPDATE RECORDS INTO THE SHAREPOINT LIST
PART 5 – DELETE RECORDS INTO THE SHAREPOINT LIST
PART 6 – BUILD DATA INTEGRITY INTO YOUR PACKAGE
PART 1 – DOWNLOAD AND INSTALL SSIS TOOLBOX ITEMS
1. From the server where you will be building the SSIS solution, open your browser and navigate to this URL. http://sqlsrvintegrationsrv.codeplex.com/releases/view/17652
2. Download, run and follow the instructions to install SharePointListAdaptersSetup.msi (2008) or SharePointListAdaptersSetupForSqlServer2005.msi
3. Close BIDS if it is open
4. Open BIDS from the Tools menu select Choose Toolbox Items. Go to the SSIS Data Flow Items tab. – **See SQL Server 2012 Update at the end of this post.
5. Add a checkmark to the SharePoint List Destination and SharePoint List Source items.
6. Open up or create a new Integration Services solution. Go to the Data Flow tab.
7. Open up the Toolbox
8. Drag SharePoint List Source from the General section…
…to the Data Flow Sources section.
9. Drag SharePoint List Destination from the General section…
…to the Data Flow Destinations section.
Now you are ready to use a SharePoint List as a source or destination in your packages.
PART 2 – USE THE SHAREPOINT LIST AS A DATA SOURCE
1. Create a Data Flow in your package.
2. Go to the Data Flow tab and drag the SharePoint List Source into the data flow as your source.
3. Double click on the source. Scroll down to the bottom where it says SiteURL.
4. Paste the URL for the SharePoint site into the SiteURL (exclude anything from the word Lists onward in the URL)
Example: if your SharePoint site URL for Lists is
http://ca-shpt01/Reporting/Lists/Department%20Groups/AllItems.aspx
then only type
5. Type the SiteListName. For example if your List name is Departments,
then type “Departments”.
6. If you are wanting to import data from a particular view of this list in SharePoint, other than the default view, “MyView” for example:
then add the name of the view in the SiteListViewName, otherwise leave it blank.
7. Click the Refresh button at the bottom of the box to load the SharePoint List metadata into your Data Flow.
8. Then click on the Column Mappings tab to see the columns available from your SharePoint list.
9. Drag a Data Flow Destination onto your data flow and proceed as normal to pull data into your destination from the SharePoint list.
You’ll need to know a little about SharePoint lists to understand what data lives in which column. You’ll need the ID column if you will be doing any updating from the data warehouse into the SharePoint list. The Title is whatever item is the linked item in the SharePoint list – meaning if a user clicks on one of the items in the SharePoint list, which field is set up as the hyperlink to the item (usually the first column visible in the list).
Because there is often a push and a pull of data to and from the SharePoint list I tend to create a table in the data warehouse specifically for the data from the SharePoint list and I let the source data dictate the table structure by clicking on the New button for the “Name of the Table or view”. In this way, when the data gets pushed back to SharePoint there are no data type issues.
PART 3 – INSERT RECORDS INTO THE SHAREPOINT LIST
Keep in mind that the SharePoint list does not manage any data integrity, so if you add duplicate records with the same linked “Title”, Department Name in this case, you will get duplicate records in your list. Be sure to configure your data source to insert only new records, not existing records. Here we will add any new Departments that have been added to the ERP system (if any) and push them up to SharePoint.
1. Add a Data Flow to your package.
2. Go to the Data Flow tab and drag an OLE DB source as your source. Configure your source to pull any new Departments that you want to insert into the SharePoint list. In this example SP_Department is the data which has been imported from the SharePoint list, and DIM_Department holds the Departments imported from the ERP system:
SELECT �
DP.Department_Code,
DP.Department_Description
FROM SP_Department as SP RIGHT OUTER JOIN
dbo.DIM_Department as DP ON SP.Department_Code = DP.Department_Code�
WHERE SP.ID IS NULL
3. Drag the SharePoint List Destination into the data flow as your destination.
4. Double click on the SharePoint List Destination and configure the SiteUrl, SiteListName and SiteListViewName the same way you did in Part 2 steps 4-6 above. In addition, ensure that the BatchType is set to Modification. “Modification” is used for Updates and Inserts, “Deletion” is of course for Deletions.
5. Click on the Refresh button in the bottom left of the Advanced Editor window and select the Input Columns tab. Map the Source columns to the Destination columns. Since we are INSERTING records, be sure NOT to map to the ID field. Doing so would have the data flow expect an update rather than an insert.
Note that the Department_Description is being mapped to the Title field, which is the linked SharePoint field described in Part 2 – item 9, above.
6. Run your data flow task and check the SharePoint list for the added records.
PART 4 – UPDATE RECORDS INTO THE SHAREPOINT LIST
Here we want to update the Department descriptions in SharePoint, in case a user has decided to edit it. The only item the user should be editing is the Department Group.
1. Add a Data Flow to your package.
2. Add an OLE DB Source to the Data Flow tab and configure the source with the correct query to update the Department descriptions for Departments that already exist in the SharePoint list. For example:
SELECT �
SP.ID,
DP.Department_Code,
DP.Department_Description
FROM SP_Department as SP INNER JOIN
dbo.DIM_Department as DP ON SP.Department_Code = DP.Department_Code
3. Drag the SharePoint List Destination into the data flow as your destination.
4. Double click on the SharePoint List Destination and configure the SiteUrl, SiteListName and SiteListViewName the same way you did in Part 2 steps 4-6 above. Esure that the BatchType is set to Modification.
5. Click on the Refresh button in the bottom left of the Advanced Editor window and select the Input Columns tab. Map the Source columns to the Destination columns. Since we are UPDATING records, BE SURE TO map to the ID field. Without the ID the data flow will implement an insert rather than an update.
6. Run your data flow task and check the SharePoint list for the updated Department Descriptions.
PART 5 – DELETE RECORDS INTO THE SHAREPOINT LIST
Here we want to delete any Department records from the SharePoint list that a user may have entered in error. The user training indicated to users that Departments should only be added from the ERP system and not by users.
1. Add a Data Flow to your package.
2. Add an OLE DB Source to the Data Flow tab and configure the source with the correct query to select the Departments for deletion from the SharePoint list. The only field you need in the result is the ID field which maps to the ID in the SharePoint List. If the ID is in this query it will get deleted from the SharePoint list. In this example we will remove any duplicate records, and remove any Departments that don’t exist in the ERP system:
–DELETE DUPLICATE Department Codes. Take the oldest record.
SELECT ID FROM SP_Department
WHERE Department_Code IN (
SELECT DP.Department_Code FROM SP_Department as SP LEFT OUTER JOIN
dbo.DIM_Department as DP ON SP.Department_Code = DP.Department_Code�
GROUP BY DP.Department_Code HAVING COUNT (sp.id) > 1)
AND ID NOT IN
(SELECT MIN(ID) FROM SP_Department
WHERE Department_Code IN (
SELECT DP.Department_Code FROM SP_Department as SP LEFT OUTER JOIN
dbo.DIM_Department as DP ON SP.Department_Code = DP.Department_Code�
GROUP BY DP.Department_Code HAVING COUNT (sp.id) > 1))
UNION
–DELETE Department Codes that don’t exist in ERPSELECT SP.ID FROM SP_Department as SP LEFT OUTER JOIN
dbo.DIM_Department as DP ON SP.Department_Code = DP.Department_Code
WHERE DP.Department_Code IS NULL
3. Drag the SharePoint List Destination into the data flow as your destination.
4. Double click on the SharePoint List Destination and configure the SiteUrl, SiteListName and SiteListViewName the same way you did in Part 2 steps 4-6 above. In addition, ensure that the BatchType is set to Deletion.
5. Click on the Refresh button in the bottom left of the Advanced Editor window and select the Input Columns tab. Map the ID input column to the ID destination column.
6. Run your data flow task and check the SharePoint list for the deleted Departments.
PART 6 – BUILD DATA INTEGRITY INTO YOUR PACKAGE
Putting this all together will enforce the data integrity for this simple list. Here is what this package looks like.
Here is what each step does.
1. Truncate Tables: Truncates these tables –
DIM_Department
DIM_Department_Group
SP_Department
2. Populate DIM_Department – populates the DIM_Department table from the ERP system.
3. Retrieve Dept Group from SP – Populates the DIM_Department_Group table from another SharePoint List. This is used as a drop down in the Department SP list for users to select the correct department.
4. Retrieve SP_Depts – Populates the SP_Department table from the SharePoint list. See Part 2 above.
5. Update SP Dept Descriptions – updates any modified Department descriptions. See Part 4 above.
6.Add Missing Depts to SP – adds any new or missing Departments to the SharePoint list. See Part 3 above.
7. Delete invalid Depts from SP – deletes any invalid Departments from the SharePoint list. See Part 5 above.
8. Truncate SP_Depts – Truncate the SP_Department table in preparation for reload of corrected data.
9. Retrieve Updated SP_Depts – Populate SP_Department table from the updated SharePoint list. See Part 2 above.
10. Update Dept Group Codes – Updates the DIM_Department table with the Department_Group_ID.
UPDATE DIM_Department
SET Department_Group_ID = SP.Department_Group_ID
FROM SP_Department AS SP INNER JOIN
DIM_Department as DP on DP.Department_Code = SP.Department_Code
This may seem like a lot of work, but it actually doesn’t take that much time once you get the hang of it. Of course if you have more complex lists or require a lot of lists to manage this kind of data, you will want to look into other tools. This is a good solution to the occasional one-off with a fairly simple data requirement, where the client doesn’t have another tool available to handle it.
Update April 24, 2012
If you are populating date fields, see this post. http://thedataqueenblog.azurewebsites.net/2012/04/populating-date-fields-in-sharepoint-using-ssis/
Update May 2, 2012
SQL Server 2012: All components that are installed on the local machine now automatically appear in the new SSIS Toolbox. When you install additional components, right-click inside the toolbox and then click Refresh Toolbox to add the components. The SharePoint List Destination and the SharePoint List Source appeared in the “Common” folder in the SSIS Toolbox when I refreshed the toolbox.