Moving SharePoint Documents to the File System

You’ll want to read my previous post Moving SharePoint List Attachments to the File System, to get all the details and requirements for setting up and running these SSIS script tasks.

This is an SSIS Package code which will iterate through the document library to get some relevant information about the documents, and then move specified documents from a document library to the file system.

I will just explain the two script tasks steps, as the rest will be specific to your task.

image

Populate SP_ExpenseAttachments Sript Task

This code iterate through the document library to get some relevant information about the documents

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
using Microsoft.SharePoint;
using System.Data.SqlClient;
using System.Net;

namespace ST_573f63e769424529b4c14ec196d01e4f.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        /*
        The execution engine calls this method when the task executes.
        To access the object model, use the Dts property. Connections, variables, events,
        and logging features are available as members of the Dts property as shown in the following examples.

        To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
        To post a log entry, call Dts.Log("This is my log text", 999, null);
        To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);

        To use the connections collection use something like the following:
        ConnectionManager cm = Dts.Connections.Add("OLEDB");
        cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

        Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.

        To open Help, press F1.
    */

        public void Main()
        {
            // Read the Library document info and write it to a SQL table

            string SharePointSite = (string)Dts.Variables["SPSite"].Value;
            SPSite mySite = new SPSite(SharePointSite);
            SPWeb myWeb = mySite.OpenWeb();
            SPList myList = myWeb.Lists["ExpenseAttachments"];
            SPDocumentLibrary myLibrary = (SPDocumentLibrary)myList;
            SPListItemCollection collListItems = myLibrary.Items;

            foreach (SPListItem myListItem in collListItems)
           {
               String ItemId = myListItem.ID.ToString();
               String attachmentAbsoluteURL = SharePointSite + "/" + myListItem.File.Url;

                String attachmentname = myListItem.File.Name;

                //Set up SQL Connection

                string sSqlConn = Dts.Variables["SqlConn"].Value.ToString();
                SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);
                SqlCommand cmd = new SqlCommand();
                SqlDataReader reader;
                cmd.CommandType = CommandType.Text;
                cmd.Connection = sqlConnection1;
                sqlConnection1.Open();

                cmd.CommandText = "INSERT INTO SP_ExpenseAttachments (WorkflowName,DocumentLibrarySharePointID,AttachmentName,AttachmentURL) VALUES ('Expense','" + ItemId + "','" + attachmentname + "','" + attachmentAbsoluteURL + "')";

                reader = cmd.ExecuteReader();
                sqlConnection1.Close();

                    }

                    Dts.TaskResult = (int)ScriptResults.Success;
                }
            }
        }
Read Attachment information and move Expense attachments

This code accepts a document id from a variable, populates some relevant information about the document into a SQL table and copies and renames the document to the file system.

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
using Microsoft.SharePoint;
using System.Data.SqlClient;
using System.Net;

namespace ST_573f63e769424529b4c14ec196d01e4f.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        /*
        The execution engine calls this method when the task executes.
        To access the object model, use the Dts property. Connections, variables, events,
        and logging features are available as members of the Dts property as shown in the following examples.

        To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
        To post a log entry, call Dts.Log("This is my log text", 999, null);
        To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);

        To use the connections collection use something like the following:
        ConnectionManager cm = Dts.Connections.Add("OLEDB");
        cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

        Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.

        To open Help, press F1.
    */

        public void Main()
        {
            // Read the document info and write it to a SQL table

            string SharePointSite = (string)Dts.Variables["SPSite"].Value;
            SPSite mySite = new SPSite(SharePointSite);
            SPWeb myWeb = mySite.OpenWeb();
            SPList myList = myWeb.Lists["ExpenseAttachments"];
            SPDocumentLibrary myLibrary = (SPDocumentLibrary)myList;
            SPListItemCollection collListItems = myLibrary.Items;

            int ItemID = (int)Dts.Variables["ItemID"].Value;
            String sItemID = ItemID.ToString();

            SPListItem myListItem = myList.GetItemById(ItemID);
            String attachmentAbsoluteURL = SharePointSite + "/" + myListItem.File.Url;

                String attachmentname = myListItem.File.Name;

                //Set up SQL Connection

                string sSqlConn = Dts.Variables["SqlConn"].Value.ToString();
                SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);
                SqlCommand cmd = new SqlCommand();
                SqlDataReader reader;
                cmd.CommandType = CommandType.Text;
                cmd.Connection = sqlConnection1;
                sqlConnection1.Open();

                cmd.CommandText = "INSERT INTO SP_Attachments  (WorkflowName, DocumentLibrarySharePointID, AttachmentName, AttachmentURL, Moved, NewFileName) VALUES ('Expense','" + ItemID +"','" + attachmentname + "','" + attachmentAbsoluteURL + "','" + 0 + "','E' + RIGHT('00000000000' + CAST(" + ItemID + " as VARCHAR),11)" + ")";

                reader = cmd.ExecuteReader();
                sqlConnection1.Close();

                string MRI = (string)Dts.Variables["MRI_File_Location"].Value;
                DirectoryInfo dir = new DirectoryInfo(MRI);

                if (dir.Exists)
                {

                    // Create the filename for local storage using 
                    String FileExt = attachmentname.Substring(attachmentname.Length-4);
                    String ItemNum = "00000000000" + sItemID;
                    String ItemName = ItemNum.Substring(sItemID.Length, 11);
                    String FileName = "\E" + ItemName + FileExt;
                    FileInfo file = new FileInfo(dir.FullName + FileName);

                    if (!file.Exists)
                    {
                        if (attachmentAbsoluteURL.Length != 0)
                        {
                            // download the file from SharePoint or Archive file system to local folder 
                            WebClient client = new WebClient();

                            //download the file from SharePoint 

                            client.Credentials = System.Net.CredentialCache.DefaultCredentials;
                            client.DownloadFile(attachmentAbsoluteURL, file.FullName);

                        }
                        //Mark record as Moved
                        sqlConnection1.Open();
                        DateTime Now = DateTime.Now;
                        cmd.CommandText = "UPDATE SP_Attachments SET Moved = 1, Moved_Date = '" + Now + "' WHERE WorkflowName = 'Expense' and DocumentLibrarySharePointID = '" + ItemID + "'";
                        reader = cmd.ExecuteReader();
                        sqlConnection1.Close();

                    }

                    Dts.TaskResult = (int)ScriptResults.Success;
                }
            }
        }
    }

Moving SharePoint List Attachments to the File System

You can use a Script Task in SSIS to move SharePoint list attachments to the file system.  This C# code references the Microsoft.SharePoint assembly. It’s very important to note that the the SharePoint attachments have to be on the same server that the package is running on.  Thes means that the package can only run on SSIS installed on the SharePoint box where the attachments are.  You will need to install SSIS and the corresponding msdb database on your SharePoint server if it isn’t already installed.

This is what the final package looks like:

image

Most of the these tasks are self explanatory and you’ll need to set up your own tables and logic to accomplish the goals of your package.  You’ll want a table that tells you which items have attachments.  See this post for details on how to import data from a SharePoint list.  Attachments is one of the fields you can import, which is simply a bit that says whether or not the list item has any attachments.

These are the variables used in the package:

image

For Each Loop:

image

 

image

Variables used in the script task:

image

References needed in the script task:

image

 

Here is the C# code:

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO;
using Microsoft.SharePoint;
using System.Data.SqlClient;
using System.Net;

namespace ST_573f63e769424529b4c14ec196d01e4f.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        /*
        The execution engine calls this method when the task executes.
        To access the object model, use the Dts property. Connections, variables, events,
        and logging features are available as members of the Dts property as shown in the following examples.

        To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
        To post a log entry, call Dts.Log("This is my log text", 999, null);
        To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);

        To use the connections collection use something like the following:
        ConnectionManager cm = Dts.Connections.Add("OLEDB");
        cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

        Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.

        To open Help, press F1.
    */

        public void Main()
        {
            // Read the attachments info and write it to a SQL table

            string SharePointSite = (string)Dts.Variables["SPSite"].Value;
            SPSite mySite = new SPSite(SharePointSite);
            //SPSite mySite = new SPSite("http://primenetdev/forms");
            SPWeb myweb = mySite.OpenWeb();
            SPList myList = myweb.Lists["Fitness Reimbursement Authorization"];

            int ItemID = (int)Dts.Variables["ItemID"].Value;
            SPListItem myListItem = myList.GetItemById(ItemID);
            int i = 1;
            foreach (String attachmentname in myListItem.Attachments)
            {
                //                MessageBox.Show("Each attachment");
                String attachmentAbsoluteURL =
                myListItem.Attachments.UrlPrefix // gets the containing directory URL
                + attachmentname;

                //Set up SQL Connection

                string sSqlConn = Dts.Variables["SqlConn"].Value.ToString();

                SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);

                SqlCommand cmd = new SqlCommand();

                SqlDataReader reader;

                cmd.CommandType = CommandType.Text;

                cmd.Connection = sqlConnection1;

                sqlConnection1.Open();
                //If its the first attachement, name it 12 digits ending in the Item ID
                if ((i.Equals(1)))
                {
                    cmd.CommandText = "INSERT INTO SP_Attachments  (WorkflowName, ItemSharePointID, AttachmentName, AttachmentURL, Moved, NewFileName) VALUES ('Fitness','" + ItemID + "','" + attachmentname + "','" + attachmentAbsoluteURL + "','" + 0 + "','F' + RIGHT('00000000000' + CAST(" + ItemID + " as VARCHAR),11)" + ")";
                }
                //Otherwise append an attachment id
                else
                {
                    cmd.CommandText = "INSERT INTO SP_Attachments  (WorkflowName, ItemSharePointID, AttachmentName, AttachmentURL, Moved, NewFileName) VALUES ('Fitness','" + ItemID + "','" + attachmentname + "','" + attachmentAbsoluteURL + "','" + 0 + "','F' + RIGHT('00000000000' + CAST(" + ItemID + " as VARCHAR),11) + CAST(" + i + "as VARCHAR))";
                }
                reader = cmd.ExecuteReader();
                sqlConnection1.Close();

                string MRI = (string)Dts.Variables["MRI_File_Location"].Value;
                DirectoryInfo dir = new DirectoryInfo(MRI);

                if (dir.Exists)
                {

                    // Create the filename for local storage using 

                    String ItemNum = "00000000000" + ItemID.ToString();
                    String ItemName = ItemNum.Substring(ItemID.ToString().Length, 11);
                    String FileName = "\F" + ItemName + i;
                    //If its the first attachement, name it 12 digits ending in the Item ID, otherwise append which attachement it is
                    if ((i.Equals(1)))
                    {
                        FileName = "\F" + ItemName;
                    }
                    FileInfo file = new FileInfo(dir.FullName + FileName);
                    i = i + 1;

                    if (!file.Exists)
                    {

                        if (attachmentAbsoluteURL.Length != 0)
                        {
                            // download the file from SharePoint or Archive file system to local folder 

                            WebClient client = new WebClient();

                            //if (Strings.Left(fileUrl, 4).ToLower() == "http") {
                            //download the file from SharePoint 

                            client.Credentials = System.Net.CredentialCache.DefaultCredentials;

                            client.DownloadFile(attachmentAbsoluteURL, file.FullName);

                        }
                        //Mark record as Moved
                        sqlConnection1.Open();
                        DateTime Now = DateTime.Now;
                        cmd.CommandText = "UPDATE SP_Attachments SET Moved = 1, Moved_Date = '" + Now + "' WHERE ItemSharePointID = '" + ItemID + "'";
                        reader = cmd.ExecuteReader();
                        sqlConnection1.Close();
                        //            MessageBox.Show("End");

                    }

                    Dts.TaskResult = (int)ScriptResults.Success;
                }
            }
        }
    }
}

Importing Empty Fields from Active Directory

Further to the series of posts on importing data from Active Directory, I’ve run into a new issue.  For this client I built the exact same solution as described here Getting Around Active Directory Paging on SSIS Import, but got this lovely error message: “Index was out of range. Must be non-negative and less than the size of the collection.” It turns out there were empty values in some of the single-value fields.  I hadn’t run into this previously, but I found a neat solution.

In the original solution I outlined how to create a simple SSIS script task in C# to import single value fields from Active Directory. I’ve added to this code to create a solution to import empty single-value fields.

A For Each statement for single-value fields has been added  to the script to check if the field is empty before setting the variable value. Even though there is only one possible value for a single value field, the For Each statement still works nicely to check if it’s empty.  Here is the code snippet of the For Each statement:

//If the property is null, set the variable to blank, else set it to the value in the property string Mail = ""; ResultPropertyValueCollection valueCollectionMail = results.Properties["Mail"]; foreach (String sField in valueCollectionMail) { //Replace any single quotes with two single quotes for SQL Statement

Mail = sField.Replace("'", "''"); }

Here is the complete code.  for more details on how to create the SSIS package and set up the references for the script task, please see Getting Around Active Directory Paging on SSIS Import.

        public void Main()
 
{
 
 //Set up the AD connection;
 
using (DirectorySearcher ds = new DirectorySearcher())
 
{
 
//Edit the filter for your purposes;
 
ds.Filter = "(&(objectClass=user))";
 
ds.SearchScope = SearchScope.Subtree;
 
ds.PageSize = 1000;
 
//This will page through the records 1000 at a time;
 
//Set up SQL Connection
 
string sSqlConn = Dts.Variables["SqlConn"].Value.ToString();
 
SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);
 
SqlCommand cmd = new SqlCommand();
 
SqlDataReader reader;
 
cmd.CommandType = CommandType.Text;
 
cmd.Connection = sqlConnection1;
 
//Read all records in AD that meet the search criteria into a Collection
 
using (SearchResultCollection src = ds.FindAll())
 
{
 
//For each record object in the Collection, insert a record into the SQL table
 
foreach (SearchResult results in src)
 
{
    string sAMAccountName = results.Properties["sAMAccountName"][0].ToString();
    string objectClass = results.Properties["objectClass"][0].ToString();

    //If the property is null, set the variable to blank, otherweise set it to the value in the property
    string Mail = "";
    ResultPropertyValueCollection valueCollectionMail = results.Properties["Mail"];
    foreach (String sField in valueCollectionMail)
    {
        Mail = sField.Replace("'", "''"); //Replace any single quotes with two single quotes for SQL Statement
    }

    //If the property is null, set the variable to blank, otherweise set it to the value in the property
    string displayName = "";
    ResultPropertyValueCollection valueCollectiondisplayName = results.Properties["displayName"];
    foreach (String sField in valueCollectiondisplayName)
    {
        displayName = sField.Replace("'", "''"); //Replace any single quotes with two single quotes for SQL Statement
    }

 
sqlConnection1.Open();

cmd.CommandText = "INSERT INTO AD_Users (sAMAccountName, objectClass, Mail, displayName) VALUES ('" + sAMAccountName + "','" + objectClass + "','" + Mail + "','" + displayName +"')";
 
reader = cmd.ExecuteReader();
 
sqlConnection1.Close();
 
} } } }

 

Here are links to the other posts in the Active Directory series:

Importing Data from Active Directory using SSIS Data Flows

How to Query Multi-Value Fields in Active Directory using SSIS

How to Query Multi Value fields in Active Directory using SSIS

Apparently what’s even more difficult than importing data from AD is figuring out how to import multi-value objects from Active Directory.  “Description” is an example of a standard AD multi-value field.  My client had many custom multi-value fields added to AD and needed to import the data from these fields into tables in a database.  You can accomplish this easily this by adding a bit of code to the C# code importing the single value attributes as outlined in my previous post Getting Around AD Paging on SSIS Import

This C# code is much simpler than trying to import each multi-value field using a Data Flow task.  Using Data Flow tasks can be done but it has some tricky problems like importing only those records with values in the multi-value field, working around paging, and how to deal with apparently empty objects that your query returns even though you specified that it only return those objects with values.  It’s also quite a bit slower as you need to populate variables and pass those variables to loops to iterate thru the multi-values for one account at a time.

Here is the code for importing one multi-value attribute into a table.  This code should be placed at an appropriate spot within the  “foreach (SearchResults” loop outlined in the Getting Around AD Paging on SSIS Import post.

 

string propertyName = “Description”; //or whichever multi-value field you are importing

ResultPropertyValueCollection valueCollection = results.Properties[propertyName];

//Iterate thru the collection for the user and insert each value from the multi-value field into a table

foreach (String sMultiValueField in valueCollection)

{

string sValue = sMultiValueField.Replace(“‘”, “””); //Replace any single quotes with double quotes

sqlConnection1.Open();

cmd.CommandText =

“INSERT INTO User_Descriptions (sAMAccountName,Description) VALUES (‘” + sAMAccountName + “‘,'” + sValue + “‘)”;

reader = cmd.ExecuteReader();

sqlConnection1.Close();

}

The nice thing about this code is that you can iterate through any records, even if the multi-value field is empty.  It won’t fail, it just won’t return a record.  This means you can add this same chunk of code multiple times edited for several different multi-value fields within the same script task, and have all your tables updated using the same script.  The package is very easy to maintain, with no package variables, no complex package logic, just a simple script.  Very elegant!

Get around Active Directory Paging on SSIS import

I have a client who is importing certain users from Active Directory.  The paging on their AD is set to 20,000 records.  When trying to pull data using a SQL statement, the query fails because it hits the maximum number of records and is unable to return more.   You could work around a problem like this by editing your query filter to ensure that you always retrieve fewer than 20,000 records at a time, for example using the whenCreated field.  However, there is no guarantee that whatever filter you use will always limit your return value to a maximum of 20,000 records.  And you now need to build a loop construct to retrieve all the records since you want more than 20,000 records.

This is much easier to solve than you might think, judging from the number of forum questions out there on the subject (and how long it took me to piece it together).   Here are the steps.

Create an SSIS package.

Add a string variable, scoped to the package, called SqlConn.  Populate it with the connection string to the database you want to populate with the AD records.

Add a script task to your package.  Open the script task, making sure that the ScriptLanguage is C# and not VB.

image

Click on the Edit Script button.  On the right hand side you should see the Project Explorer window.  Right click on the name of the Project File at the top of the tree and select Add Reference.

SNAGHTML6315ef17

On the .NET tab scroll down and find System.DirectoryServices. Select it and click OK.

image

Make sure you see the reference appear in the References folder in the Project Explorer window.

image

Add these statements at the beginning of your script.

using System.DirectoryServices;

using System.Data.SqlClient;

Paste this script to replace the public void Main().  Edit the ds.Filter and Insert string values to meet your table requirements.  Be sure to only select single value attributes of the object.   If you try to use this method to import multi-value attributes such as “Description” from AD it won’t work.  I’ll be writing about that next.

public void Main()

{

//Set up the AD connection;

using (DirectorySearcher ds = new DirectorySearcher())

{

//Edit the filter for your purposes;

ds.Filter = “(&(objectClass=user)(|(sAMAccountName=A*)(sAMAccountName=D0*)))”;

ds.SearchScope = SearchScope.Subtree;

ds.PageSize = 1000;

//This will page through the records 1000 at a time;

//Set up SQL Connection

string sSqlConn = Dts.Variables[“SqlConn”].Value.ToString();

SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);

SqlCommand cmd = new SqlCommand();

SqlDataReader reader;

cmd.CommandType = CommandType.Text;

cmd.Connection = sqlConnection1;

//Read all records in AD that meet the search criteria into a Collection

using (SearchResultCollection src = ds.FindAll())

{

//For each record object in the Collection, insert a record into the SQL table

foreach (SearchResult results in src)

{

string sAMAccountName = results.Properties[“sAMAccountName”][0].ToString();

string objectCategory = results.Properties[“objectCategory”][0].ToString();

//Replace any single quotes in the string with two single quotes for sql INSERT statement

objectCategory = objectCategory.Replace(“‘”, “””);

sqlConnection1.Open();

cmd.CommandText = “INSERT INTO Users (sAMAccountName, objectCategory) VALUES (‘” + sAMAccountName + “‘,'” + objectCategory + “‘)”;

reader = cmd.ExecuteReader();

sqlConnection1.Close();

} } } }

 

That’s it.  This will iterate through all of the objects in Active Directory, regardless of paging size set on Active Directory.

To learn how to import multi-value fields from AD, read this post:

How to Query Multi-Value Fields from Active Directory using SSIS