Batch GeoCoding in SSIS

I’ve had to do some geocoding of addresses. It used to be free to process a lot of addresses, but now that this is a more mainstream activity you generally have to purchase an Enterprise License.  This can cost upwards of $10,000.  If you have a very small number of addresses you can use a free service like Bing Maps, which currently allows you to process 50 records at a time, and run 5 jobs per 24 hours.  This means a maximum of 250 per day.  If you want to check out how this works, here’s how.

Install the free Codeplex SSIS Batch Geocoder into your Visual Studio environment

http://ssisbatchgeocoder.codeplex.com/releases/view/66866

Get a Bing Maps Key

You need to have or set up a Microsoft account to get a key.  Follow the instructions to get a Basic Key.  If you have a large number of addresses you will want to look into purchasing an Enterprise License.

http://www.microsoft.com/maps/

Set up an SSIS project

Create a Visual Studio SSIS project.  If you can’t see the toolbox, in the Visual Studio Menu choose View – Other Windows – SSIS Toolbox.  If you can’t see the Variables window, in the Visual Studio Menu choose View – Other Windows – Variables.

Add these two string variables to your project: BingMapsKey and JobDescription. Set the value of the BingMapsKey variable to the key you obtain from Bing Maps.  Set the JobDescription variable to “Geolookup from address”

Add a Data Flow Task.  Open up the Data Flow Task. On the Data Flow Task add these four items:

image

Configure the OLE DB Source to connect to the table that contains the addresses.

Configure the Derived Columns like this:

image

Configure the SSIS Batch Geocoder like this, using the two variables as the Bing Maps Key and the Job Description, and mapping any other relevant columns from your data.

image

Configure the OLE DB Destination to wherever you want your output to reside.  Choose whichever output fields meet your needs.  The latitude and longitude output can be stored in a field of SQL data type “Geography”.

In preparing this post I found these articles helpful.

http://ssisbatchgeocoder.codeplex.com/documentation

http://blog.programmableweb.com/2012/06/21/7-free-geocoding-apis-google-bing-yahoo-and-mapquest/

How To Set and Use Variables in SSIS Execute SQL Task

I couldn’t find any step-by-step resources on this so I thought I’d blog it.  This is done in SQL Server 2008 R2. In this scenario I will use a SQL Statement to populate an SSIS variable, and then use that variable in another SQL Statement.

Create your variables

If the variables window isn’t visible already, from the menu select View –> Other Windows –> Variables.

SNAGHTML1fb1c4da

This will bring up your variables window on the left hand side of your screen.  You should be able to tab between it and the Toolbox.  You may have to pin it to your screen first.

image

 

I have already created some variables in the screenshot above.  To create a variable, first decide on the scope of the variable, whether it is the entire package, a container or a specific task.  I normally scope all of my variables to the package, since you can’t see the variables in the window unless you have selected the scoped object.  I find it easier to manage variables having them all in one place.  I have not had any compelling reason not to do it this way, although I’m sure there are cases where you may want them scoped differently. 

Before creating the variable you must select the item for it’s scope. In this case select the canvas of the Control Flow which will permit you to scope the variable to the entire package.  Then create the variable by clicking on the Add Variable button at the top of the Variables window.

image

 

Give your variable a Name, a Data Type and a Value.  In this case we are working with EndDate, and although you would want to set it as Date, there are some compatibility issues when using the Date data type for a variable and then pulling into a SQL parameter.  I have found it is best to set the date as a string and then manipulate it from there.  Set your EndDate attributes as shown, with the Scope being the name of your package.

image

Populate an SSIS variable using a SQL statement

From the Toolbox, drag an Execute SQL Task onto your Control Flow.  Double click on it and set the following properties:

On the General tab:

  1. Set the ResultSet to Single row
  2. Select your Connection to the database where you will run your SQL Statement
  3. Enter your SQL Statement

image

On the Result Set tab:

  1. In the Result Name key in the name of the field being return in your SQL Statement.  In this case I named the field EndDate
  2. In the Variable Name select the user variable you created.

image

Use the Variable in a SQL Statement

From the Toolbox, drag an Execute SQL Task onto your Control Flow. Double click on it and set the following properties:

On the General tab:

  1. Select your Connection to the database where you will run your SQL Statement
  2. Enter your SQL Statement, placing a ? where each variable is to be used.

image

In this case I need to add two Parameters to the Parameter Mapping tab, since I have used two question marks in my SQL Statement.  Even if you are using the same Variable multiple times, you still need to enter them separately on the Parameter Mapping tab, once for each time they appear in the SQL Statement.  This is the only way to identify which parameter is being used for which question mark.  Parameters are made much easier in 2012.

Here’s the tricky bit which isn’t at all self explanatory.

On the Parameter Mapping tab: 

  1. Select the user variable in the Variable Name drop down which corresponded to the first variable (?) being used in your SQL Statement.
  2. Set the Data Type to NVARCHAR, since we are using a string variable. (This is the spot where the data type compatibility didn’t mesh, which is why we opted for string).  
  3. Set the ParameterName to 0.  This indicates the order in which the parameter is appearing in the SQL statement.  You can enter them in the pane in any order you want, but the ParameterName must contain numbers starting from 0 to however many parameters you have minus one.  So if you have two parameters you must have one ParameterName = 0 and one ParameterName = 1.  You can reverse the order, but they must both exist or the task will fail.

image