Maintaining Data Integrity of a SharePoint List with SSIS

I like using InfoPath to maintain data integrity as it has a much finer control than SSIS.  I will blog about that later.  But it can be very useful to stick with SSIS to maintain the data integrity on a simple list which synchronizes data between a database and a SharePoint list.  For example if users do not have an available interface to define groups for their customers, they can use a SharePoint list to house the customers and define the groups. 

In this example the customers are coming from an ERP system and so the account names and ids must be correctly maintained in the SharePoint list.  It’s the typical three tasks required for maintenance: insert, update and delete.  Any new customers should automatically be added to the list, any descriptions which have changed should be updated, and any customers who have been incorrectly added to the list should be deleted.   I find the simplest way to manage this is to have a separate table in the data warehouse strictly for the data coming from the SharePoint list. This way you can match up to your ERP tables to see what data management is needed.

Create your SharePoint list 

Have the field sizes match the ERP field sizes, to ensure you don’t run into any size conflicts.  Best practices for creating a SharePoint list is to name your list and your field names without any spaces in them.  Then go back and change them to add any spaces that you might want for readability.  SharePoint doesn’t like spaces and will add strange characters internally to fill the gaps.  When you go to do your SSIS package it makes things less readable.  So, in this example a column in the list will be called “Account Status”.  So I first name it “AccountStatus”, and then go back and change it to “Account Status”.  Internally it will still be called “AccountStatus” without the space.

image

Here is my list for this example.  It has 4 columns, Debtor Name, Debtor ID, Account Status and Debtor Group.  All of these columns except Debtor Group will be maintained by the SSIS package.  Users will key in the Debtor Group.

Populate the data warehouse with the most current data from business system

This example assumes a data warehouse situation. Create a package in your SSIS project.  Keep in mind where in your project you need to run this package.  It should come after the Debtor table has been updated from your ERP system.  It’s best to put the two related items in the same package, to ensure that the order of operations does not get lost in a master package somewhere.  So first step is to add the SSIS tasks which populate your Debtor table.  Put them in a sequence container.

image

 

Insert, Update and Delete data in SharePoint list

There are series of 8 SSIS tasks which will “synchronize”, at a particular point in time, your SharePoint list with the data warehouse.  To do this create a Sequence Container called “Update Debtor Group SharePoint list, and add and connect these 8 consecutive tasks. 

1. Delete from SP_Debtor_Group:   Add an Execute SQL taks to your container called “Delete from SP_Debtor_Group”.  I always prefix any working SharePoint tables with SP for ease of identification.  We haven’t created this table yet, but it will be automatically created in the next step.  For now just add the task  which will execute the sql statement “Delete from SP_Debtor_Group”

2. Populate SP_Debtor_Group: Add a Data Flow task to your container called “Populate SP_Debtor_Group”.  This will populate your new table with whatever data currently exists in the list in SharePoint.  Right now there is probably no data in the list, but let’s create the step.  Add a SharePoint List Source to the Data Flow work surface,  Edit the Properties of the source for SiteUrl and SiteListName to connect to your Debtor Group SharePoint list. (For instructions on how to do this, view my blog post here.)    Add an OLE DB Destination and connect to your data warehouse, set Data Access mode to “Table or view – fast load”, and click on the NEW button beside the “Name of the table or the view” to create the new table.  The initial statement will look like this, but edit it to give the table the chosen name, and move or delete fields as required.

image

Keep in mind that you need to keep the ID field in order to do any updates in SharePoint, and the Title field is whatever is the field which shows up as a hyperlink in your SharePoint list, in this case Debtor Name.  I prefer to keep the field types as whatever SharePoint has allowed, since when we are sending data back up to SharePoint it’s easiest to have it already in the correct data type.  If I need the data type to be something different in the data warehouse I will do the conversion before populating the data warehouse table.  Your Create Table script should look like this.

image

 

Click on the Mappings tab and map the Input Columns from the SharePoint list to the Destination Columns in your new table.

image

3. Delete duplicate records from SharePoint: A business decision was made to delete the newest record from SharePoint in cases where duplicates exist. The only reason duplicates might exist is if a user added it, and this SharePoint List is only supposed to be updated by the SSIS packages.   Add a Data Flow Task to your container, called “Delete duplicates from SharePoint”. Add an OLE DB Source to your Data Flow work surface.

SELECT MAX(ID) as ID FROM SP_Debtor_Group
WHERE DebtorID IN
(
SELECT DebtorID
  FROM SP_Debtor_Group
  GROUP BY DebtorID
  HAVING COUNT(*)>1)
  GROUP BY DebtorID

We only need the ID in order to identify which records to delete. Write the appropriate query to identify those records which are duplicates.  Add the query to your OLE DB Source. Place a SharePoint List Destination on your work surface. Edit the Properties of the destination for SiteUrl and SiteListName. Be sure to change the BatchType to “Deletion”. Click the Refresh button. Go to the Column Mappings tab and ensure that the ID is mapped on the Input Column and the Destination Column. Click OK.

4. Delete invalid records from SharePoint: Add a Data Flow Task to your container, called “Delete from SharePoint”.  Add an OLE DB Source to your Data Flow work surface. 

SELECT SP.ID
FROM SP_Debtor_Group as SP LEFT OUTER JOIN
DIM_Debtor as DB ON  SP.DebtorID = DB.Debtor_ID
WHERE DB.Debtor_ID IS NULL

We only need the ID in order to identify which records to delete.  Write the appropriate query to identify those records.  Add the query to your OLE DB Source.  Place a SharePoint List Destination on your work surface.  Edit the Properties of the destination for SiteUrl and SiteListName.  Be sure to change the BatchType to “Deletion”.  Click the Refresh button.  Go to the Column Mappings tab and ensure that the ID is mapped on the Input Column and the Destination Column.  Click OK.

5. Update descriptions in SharePoint: Add a Data Flow task to your container. Call it “Update SharePoint descriptions”. Add an OLE DB Source to your Data Flow surface. Write the SQL statement that you would like to use to ensure that any descriptions are update appropriately. Remember that DebtorGroup is owned by the users, so we don’t want to overwrite anything in SharePoint. We won’t update that field. And we need the SharePoint ID field in our query in order to identify which SharePoint record to update. The fields we would like to update are Debtor and Account Status, to ensure that any changes are uploaded to SharePoint. Here is the query.

SELECT SP.ID,
DB.Account_Name as Debtor,
DB.Account_Status as AccountStatus
FROM SP_Debtor_Group as SP INNER JOIN
DIM_Debtor as DB ON SP.DebtorID = DB.Debtor_ID

Drag a SharePoint List Destination onto your Data Flow surface. Connect the OLE DB Source to the destination. Edit the destination with the appropriate SiteUrl and SiteListName and be sure the BatchType is Modification, since we will be modifying the SharePoint records. Click the Refresh button. Then select the Column Mappings tab. Map the Input and Destination columns. Remember that Title is whichever field has a hyperlink in your list (Debtor) and that you have to include the ID column in order to identify which record to update.

image

6. Insert new records into SharePoint: Add a Data Flow Task to your container, called “Add records to SharePoint”. Add an OLE DB Source to your Data Flow work surface.

SELECT
DB.Debtor_ID as DebtorID,
DB.Account_Name as Debtor,
DB.Account_Status as AccountStatus,
DB.Account_Name as DebtorGroup

FROM SP_Debtor_Group as SP RIGHT OUTER JOIN
DIM_Debtor as DB ON SP.DebtorID = DB.Debtor_ID
WHERE SP.DebtorID IS NULL

Write a query to populate new records into SharePoint. In this case the Debtor Group should default to the Debtor Name until the users can update it. Add the query to your OLE DB Source. Place a SharePoint List Destination on your work surface. Edit the Properties of the destination for SiteUrl and SiteListName. Be sure that the BatchType is “Modification”. Click the Refresh button. Go to the Column Mappings tab and ensure that the fields are correctly mapped on the Input Column and the Destination Column, especially Title. Click OK.

7. Delete from SP_Debtor_Group again: Add another Execute SQL taks to your container called “Delete from SP_Debtor_Group again”, to refresh the data in your data warehouse table with the newly cleansed data from SharePoint.  Add the task which will execute the sql statement “Delete from SP_Debtor_Group”

8. Populate SP_Debtor_Group again: Add a Data Flow task to your container called “Populate SP_Debtor_Group”. This will populate your table with the cleansed data from SharePoint.  Add a SharePoint List Source to the Data Flow work surface, Edit the Properties of the source for SiteUrl and SiteListName to connect to your Debtor Group SharePoint list.Click Refresh.  Add an OLE DB Destination to the Data Flow surface.  Select the SP_Debtor_Group table as the destination.

Click on the Mappings tab and map the Input Columns from the SharePoint list to the Destination Columns in your new table.

image

 

These 8 steps will periodically clean the data in your SharePoint list.  It will depend how often you run the package, how clean the data will stay.  And of course how careful the users are.  It may seem cumbersome, but these steps can save a lot of headaches on bad data when using the end result in reporting.

One thought on “Maintaining Data Integrity of a SharePoint List with SSIS”

  1. What i do not understood is actually how you’re no longer really much more well-favored than you may be now.

    You’re so intelligent. You already know thus
    significantly with regards to this matter, produced me for my part believe it from numerous various
    angles. Its like women and men don’t seem to be fascinated until it’s something to accomplish with Lady
    gaga! Your individual stuffs great. Always handle it
    up!

Leave a Reply

Your email address will not be published. Required fields are marked *