Importing Empty Fields from Active Directory

Further to the series of posts on importing data from Active Directory, I’ve run into a new issue.  For this client I built the exact same solution as described here Getting Around Active Directory Paging on SSIS Import, but got this lovely error message: “Index was out of range. Must be non-negative and less than the size of the collection.” It turns out there were empty values in some of the single-value fields.  I hadn’t run into this previously, but I found a neat solution.

In the original solution I outlined how to create a simple SSIS script task in C# to import single value fields from Active Directory. I’ve added to this code to create a solution to import empty single-value fields.

A For Each statement for single-value fields has been added  to the script to check if the field is empty before setting the variable value. Even though there is only one possible value for a single value field, the For Each statement still works nicely to check if it’s empty.  Here is the code snippet of the For Each statement:

//If the property is null, set the variable to blank, else set it to the value in the property string Mail = ""; ResultPropertyValueCollection valueCollectionMail = results.Properties["Mail"]; foreach (String sField in valueCollectionMail) { //Replace any single quotes with two single quotes for SQL Statement

Mail = sField.Replace("'", "''"); }

Here is the complete code.  for more details on how to create the SSIS package and set up the references for the script task, please see Getting Around Active Directory Paging on SSIS Import.

        public void Main()
 
{
 
 //Set up the AD connection;
 
using (DirectorySearcher ds = new DirectorySearcher())
 
{
 
//Edit the filter for your purposes;
 
ds.Filter = "(&(objectClass=user))";
 
ds.SearchScope = SearchScope.Subtree;
 
ds.PageSize = 1000;
 
//This will page through the records 1000 at a time;
 
//Set up SQL Connection
 
string sSqlConn = Dts.Variables["SqlConn"].Value.ToString();
 
SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);
 
SqlCommand cmd = new SqlCommand();
 
SqlDataReader reader;
 
cmd.CommandType = CommandType.Text;
 
cmd.Connection = sqlConnection1;
 
//Read all records in AD that meet the search criteria into a Collection
 
using (SearchResultCollection src = ds.FindAll())
 
{
 
//For each record object in the Collection, insert a record into the SQL table
 
foreach (SearchResult results in src)
 
{
    string sAMAccountName = results.Properties["sAMAccountName"][0].ToString();
    string objectClass = results.Properties["objectClass"][0].ToString();

    //If the property is null, set the variable to blank, otherweise set it to the value in the property
    string Mail = "";
    ResultPropertyValueCollection valueCollectionMail = results.Properties["Mail"];
    foreach (String sField in valueCollectionMail)
    {
        Mail = sField.Replace("'", "''"); //Replace any single quotes with two single quotes for SQL Statement
    }

    //If the property is null, set the variable to blank, otherweise set it to the value in the property
    string displayName = "";
    ResultPropertyValueCollection valueCollectiondisplayName = results.Properties["displayName"];
    foreach (String sField in valueCollectiondisplayName)
    {
        displayName = sField.Replace("'", "''"); //Replace any single quotes with two single quotes for SQL Statement
    }

 
sqlConnection1.Open();

cmd.CommandText = "INSERT INTO AD_Users (sAMAccountName, objectClass, Mail, displayName) VALUES ('" + sAMAccountName + "','" + objectClass + "','" + Mail + "','" + displayName +"')";
 
reader = cmd.ExecuteReader();
 
sqlConnection1.Close();
 
} } } }

 

Here are links to the other posts in the Active Directory series:

Importing Data from Active Directory using SSIS Data Flows

How to Query Multi-Value Fields in Active Directory using SSIS

How to Query Multi Value fields in Active Directory using SSIS

Apparently what’s even more difficult than importing data from AD is figuring out how to import multi-value objects from Active Directory.  “Description” is an example of a standard AD multi-value field.  My client had many custom multi-value fields added to AD and needed to import the data from these fields into tables in a database.  You can accomplish this easily this by adding a bit of code to the C# code importing the single value attributes as outlined in my previous post Getting Around AD Paging on SSIS Import

This C# code is much simpler than trying to import each multi-value field using a Data Flow task.  Using Data Flow tasks can be done but it has some tricky problems like importing only those records with values in the multi-value field, working around paging, and how to deal with apparently empty objects that your query returns even though you specified that it only return those objects with values.  It’s also quite a bit slower as you need to populate variables and pass those variables to loops to iterate thru the multi-values for one account at a time.

Here is the code for importing one multi-value attribute into a table.  This code should be placed at an appropriate spot within the  “foreach (SearchResults” loop outlined in the Getting Around AD Paging on SSIS Import post.

 

string propertyName = “Description”; //or whichever multi-value field you are importing

ResultPropertyValueCollection valueCollection = results.Properties[propertyName];

//Iterate thru the collection for the user and insert each value from the multi-value field into a table

foreach (String sMultiValueField in valueCollection)

{

string sValue = sMultiValueField.Replace(“‘”, “””); //Replace any single quotes with double quotes

sqlConnection1.Open();

cmd.CommandText =

“INSERT INTO User_Descriptions (sAMAccountName,Description) VALUES (‘” + sAMAccountName + “‘,'” + sValue + “‘)”;

reader = cmd.ExecuteReader();

sqlConnection1.Close();

}

The nice thing about this code is that you can iterate through any records, even if the multi-value field is empty.  It won’t fail, it just won’t return a record.  This means you can add this same chunk of code multiple times edited for several different multi-value fields within the same script task, and have all your tables updated using the same script.  The package is very easy to maintain, with no package variables, no complex package logic, just a simple script.  Very elegant!

Get around Active Directory Paging on SSIS import

I have a client who is importing certain users from Active Directory.  The paging on their AD is set to 20,000 records.  When trying to pull data using a SQL statement, the query fails because it hits the maximum number of records and is unable to return more.   You could work around a problem like this by editing your query filter to ensure that you always retrieve fewer than 20,000 records at a time, for example using the whenCreated field.  However, there is no guarantee that whatever filter you use will always limit your return value to a maximum of 20,000 records.  And you now need to build a loop construct to retrieve all the records since you want more than 20,000 records.

This is much easier to solve than you might think, judging from the number of forum questions out there on the subject (and how long it took me to piece it together).   Here are the steps.

Create an SSIS package.

Add a string variable, scoped to the package, called SqlConn.  Populate it with the connection string to the database you want to populate with the AD records.

Add a script task to your package.  Open the script task, making sure that the ScriptLanguage is C# and not VB.

image

Click on the Edit Script button.  On the right hand side you should see the Project Explorer window.  Right click on the name of the Project File at the top of the tree and select Add Reference.

SNAGHTML6315ef17

On the .NET tab scroll down and find System.DirectoryServices. Select it and click OK.

image

Make sure you see the reference appear in the References folder in the Project Explorer window.

image

Add these statements at the beginning of your script.

using System.DirectoryServices;

using System.Data.SqlClient;

Paste this script to replace the public void Main().  Edit the ds.Filter and Insert string values to meet your table requirements.  Be sure to only select single value attributes of the object.   If you try to use this method to import multi-value attributes such as “Description” from AD it won’t work.  I’ll be writing about that next.

public void Main()

{

//Set up the AD connection;

using (DirectorySearcher ds = new DirectorySearcher())

{

//Edit the filter for your purposes;

ds.Filter = “(&(objectClass=user)(|(sAMAccountName=A*)(sAMAccountName=D0*)))”;

ds.SearchScope = SearchScope.Subtree;

ds.PageSize = 1000;

//This will page through the records 1000 at a time;

//Set up SQL Connection

string sSqlConn = Dts.Variables[“SqlConn”].Value.ToString();

SqlConnection sqlConnection1 = new SqlConnection(sSqlConn);

SqlCommand cmd = new SqlCommand();

SqlDataReader reader;

cmd.CommandType = CommandType.Text;

cmd.Connection = sqlConnection1;

//Read all records in AD that meet the search criteria into a Collection

using (SearchResultCollection src = ds.FindAll())

{

//For each record object in the Collection, insert a record into the SQL table

foreach (SearchResult results in src)

{

string sAMAccountName = results.Properties[“sAMAccountName”][0].ToString();

string objectCategory = results.Properties[“objectCategory”][0].ToString();

//Replace any single quotes in the string with two single quotes for sql INSERT statement

objectCategory = objectCategory.Replace(“‘”, “””);

sqlConnection1.Open();

cmd.CommandText = “INSERT INTO Users (sAMAccountName, objectCategory) VALUES (‘” + sAMAccountName + “‘,'” + objectCategory + “‘)”;

reader = cmd.ExecuteReader();

sqlConnection1.Close();

} } } }

 

That’s it.  This will iterate through all of the objects in Active Directory, regardless of paging size set on Active Directory.

To learn how to import multi-value fields from AD, read this post:

How to Query Multi-Value Fields from Active Directory using SSIS

Importing data from Active Directory Using SSIS

This is a neat little trick.  My client wanted to filter a report by employee manager, but didn’t have a good way to maintain the list of employees and their managers.  There were a few possibilities, but we wanted to use a system that already maintained the relationship between employee and manager.  We decided to pull the information out of Active Directory, using an SSIS package.

Addendum Sept. 12, 2012:  The approach outlined below works fine for a simple data set where the query will not return more records than the AD paging file size will allow (typically 1000 records), and you are only querying single value attributes.  However, I recommend you use this Script Task approach instead of using a Data Flow.  It is more powerful and flexible yet still simple method of importing any size data set from Active Directory, regardless of paging file size.  Read the step-by-step here:  Get around Active Directory Paging on SSIS import

Find the Active Directory Server

The first thing I needed to know was which server Active Directory resided on.  I didn’t want to have wait for I.T. to get back to me with that answer.  There is a very easy way to find this out if you are logged into the same domain.  Just run a command prompt and use the command

ping %USERDNSDOMAIN%

This will return the active directory server name and IP address.

Create the ADO.NET Connection in SSIS

Create your SSIS solution and your package.  Now create the connection to the Active Directory server.  There is a little trick to this.  In order to use the connection in a data flow you need to create an ADO.NET connection.  During the create process, in the Provider drop down, change the Provider to OLE DB Provider for Microsoft Directory Services.  Then enter your server name and test as normal.  If you were to create an OLE DB connection instead of ADO.NET and change the provider, it would not work for Data Flow connections.  It must be an ADO.NET connection type. 

image

Use ADO NET Source in your Data Flow

On the data flow tab, drag an ADO.NET Source onto the design surface.  Open up the ADO.NET Source Editor and choose the connection you created in the previous step.  Choose “SQL Command” for Data Access mode.  Now you will write a query to pull the desired information from Active Directory.  Your query will look something like this:

SELECT displayName, Mail, Title, physicalDeliveryOfficeName,  telephoneNumber, Manager, name, sAMAccountName
  FROM ‘LDAP://yourservername’
WHERE objectClass=’user’

or this, depending on how you like to structure your LDAP queries.

<LDAP://yourservername>; (objectClass=user); displayName, Mail, Title, physicalDeliveryOfficeName, telephoneNumber, Manager, name, sAMAccountName

You must put single quotes around the server.  For more information on what attributes are available to be imported from Active Directory I found this to be helpful:  http://www.kouti.com/tables/userattributes.htm

You can test your query in SSMS like this:

SELECT displayName, Mail, Title, physicalDeliveryOfficeName, telephoneNumber, Manager, name, sAMAccountName

FROM Openquery (adsi, ‘SELECT displayName, Mail, Title, physicalDeliveryOfficeName, telephoneNumber, Manager, name, sAMAccountName

FROM “LDAP://yourservername” WHERE objectClass = ”user” ‘)

For this SQL Query to work you will need to add your AD server as a linked servier on your SQL server and name it ADSI.   In SSMS open up Server Objects and right click on Linked Servers.  Select New Linked Server.   On the General tab select “OLE DB Provider for Microsoft Directory Services” as the Provider.  Type in your AD server name in the Product Name and the Data source.  Leave the other fields blank and click OK. 

When running your query, if you get an error like this:  “Cannot fetch a row from OLE DB provider “ADsDSOObject” for linked server “adsi”,  there is a good chance your query is returning more rows than the AD paging file size allows.  In that case you can do one of two things: 1) filter your query further to return fewer rows (try filtering on a specific sAMAccountName to be sure that the query syntax isn’t the problem) or 2) use a C# script which is not affected by paging file size as outlined here Get around Active Directory Paging on SSIS import .

Data Formatting

Each field will be of NTEXT type.  You can leave them as NTEXT if this is an acceptable format for your destination, or you can add Derived Columns which convert the values in an expression, like this:

(DT_WSTR,50)sAMAccountName

image

 

 

Add your Destination

Add your data flow destination and map the fields as you wish.  In this case I created a table in the data warehouse to receive the Active Directory data.

You can now finish your package and deploy it.  This is a simple but very powerful method.  I found it very useful and I know I’m going to be using it again.

Addendum Sept. 12, 2012: This approach works fine for a simple data set where the query will not return more records than the AD paging file size will allow (typically 1000 records), and you are only querying single value attributes. However, I recommend you use this Script Task approach instead of using a Data Flow. It is more powerful and flexible yet still simple method of importing any size data set from Active Directory, regardless of paging file size. Read the step-by-step here: Get around Active Directory Paging on SSIS import

 Credits

In preparing this post, I found the following articles to be useful:

http://social.technet.microsoft.com/wiki/contents/articles/processing-active-directory-information-in-ssis.aspx

http://www.kouti.com/tables/userattributes.htm