10/22/2011

The Baker’s Dozen: 13 Transact SQL Programming Tips - Part I


Even with all the new features in the Microsoft SQL Server Business Intelligence (BI), sometimes the only way to accomplish a task is with good old fashioned T-SQL code. (Fortunately, “code” is the acronym for this great magazine!) In this latest installment of The Baker’s Dozen, I’ll present 13 T-SQL programming tips that could help you accomplish different database tasks.
What’s on the Menu?
Getting right to the point, here are the 13 items on the menu for this article:
  • A T-SQL example that allocates data in one table based on percentages from another table, where you might learn an unexpected lesson about data types.
  • The Baker’s Dozen Spotlight: A T-SQL example that uses the MERGE statement in a data warehousing scenario.
  • A T-SQL example that performs the equivalent of a MEDIAN function.
  • A T-SQL example that queries SQL Server system tables to retrieve specifics about snapshots.
  • A T-SQL example to demonstrate the difference between identity columns and GUID columns.
  • A T-SQL example to show different ways to perform queries using partial text searches, and how to use SQL Server tools to evaluate performance.
  • The Baker’s Dozen Potpourri: An example of T-SQL Ranking and Aggregation and the dangers of trying to accomplish everything in one query.
  • An example of using PIVOT where the spreading element is dynamic.
  • Determining a Percentile over a range of values.
  • A potential “gotcha” when performing a WHERE with an OUTER JOIN.
  • Manually setting an identity column.
  • Performing a rollback on a TRUNCATE.
  • Creating a Date Dimension.
The Demo Database for the Examples
With just a few exceptions, the examples use the AdventureWorks2008R2 database. You can find AdventureWorks2008R2 on the CodePlex site. If you’re still using SQL Server 2008 and not 2008R2, the examples will still work - you’ll just need to change any 2008R2 references to 2008.
Tip 1: Performing an Allocation
Suppose you receive budget data at the region level, and you need to allocate it downward to the market or account level, based on each market/account’s percentage of share of the region. This scenario occurs in data warehouse scenarios where a developer must allocate a measure based on some weighted %. The developer needs to be careful, not only to implement the allocation method correctly, but also to ensure that the sum of the allocated numbers equal the original sum of the measure being allocated.
Listing 1 shows an example, using the AdventureWorks2008R2 database. The example uses the tables Purchasing.PurchaseOrderHeader and Purchasing.PurchaseOrderDetails, and allocates the freight from the order header table down to the product line items in the order detail table, for each purchase order. Stated simply, if order A has $100 in freight, and order A has two line items (1 and 2) with line items order dollars of $500 and $1,500 respectively, then line item 1 would receive $25 of the freight and line item 2 would receive $75 of the freight. This is because line item 1 had 25% of the total order dollars and line item 2 had 75% of the line item dollars.
This certainly seems simple enough - determine each line item allocation ratio (LineItemRatio) by taking the line item dollars (OrderQty * UnitPrice) and dividing by the SubTotal of order dollars in the order header table.
(OrderQty * UnitPrice)    / POH.SubTotal as LineItemRatio 
After that, you would take the LineItemRatio and multiply by the Freight, to determine the allocated freight for each line item. Then you’d sum all the allocated freights (which would sum to $1,583,807.6975) and compare that to the sum of freight in the order header table ($1,583,978.2263). But that’s off by roughly $170!
Now, one might argue that the difference is insignificant - after all, $170 is roughly a hundredth of a percent of the 1.5 million in total freight. However, it’s possible that accountants might require (with good reason) that the numbers either match or are within a few pennies. So are we off by $170 because of simple rounding?
It’s actually a bit more complicated. The culprit here is the money data type. The UnitPrice in the order detail table is stored as a money data type, which has a fixed scale of four decimal positions. Therefore, any ratio that we derive from a money data type will also contain four decimal positions. This means that the sum for allocated freights (for any one order) will differ from the original order freight by several cents or possibly a few dollars. When you aggregate that difference across thousands of orders, you have the explanation for the difference of $170.
The solution is to cast the result of the numerator (OrderQty * UnitPrice) as a decimal or a floating point data type, or simply multiply the numerator by 1.0 to force a cast, like so:
(OrderQty * UnitPrice) * 1.0     / POH.SubTotal as LineItemRatio  Cast( (OrderQty * UnitPrice) as float)     / POH.SubTotal as LineItemRatio 
Figure 1 shows a partial result with the allocation ratio going well beyond four decimals. When we apply this logic, the difference across all orders is down to 0.000077 of a penny! I challenge anyone to come closer than that!
Click for a larger version of this image. 
 

Figure 1: Results of allocation.


Listing 1: T-SQL code to perform an allocation
use AdventureWorks2008R2goselect sum(Freight) as HeaderFreightSum from                      Purchasing.PurchaseOrderHeader
select *, LineItemRatio * Freight as AllocatedFreight from    (select POH.PurchaseOrderID,             cast(POH.OrderDate as DATEas OrderDate,             SubTotal, Freight, LineTotal, POD.ProductID,             (OrderQty * UnitPrice) * 1.0  / POH.SubTotal                                    as LineItemRatio           FROM Purchasing.PurchaseOrderHeader  POH            JOIN Purchasing.PurchaseOrderDetail POD ON                poh.PurchaseOrderID =                 pod.PurchaseOrderID ) TempAliasORDER BY PurchaseOrderID COMPUTE sum( LineItemRatio * Freight)

Using the Visual Studio New Project Dialog Box


Continuing on our odyssey exploring the features of Visual Studio 2010, we turn our attention to the New Project dialog box. You noticed a difference no doubt, but may not be aware of just how much it has changed. Sit back, relax, open up Visual Studio 2010 and follow along as we dive into the details.
Press CTRL + SHIFT + N to bring up the New Project dialog box shown in Figure 1. How many times have you come here only to do exactly the same thing you did in Visual Studio 2008? Most people do. Look again at this dialog and notice, in particular, the left-hand side that shows project template organization.
Click for a larger version of this image.

Figure 1: The New Project dialog box in Visual Studio 2010.
Recent Templates
Much like Pavlov’s famous experiment, when we see the New Project dialog we immediately just dig through the Installed Templates, find our project template of choice, and move on. Stop and ask yourself a simple question: “How many templates do I actually use?” I’ve asked thousands of developers the number of templates they use every day, week, month, and year. The answers are pretty consistent across the board-hardly any of them use more than 3-5 project templates in any given year.
It makes sense when you think about it. If you are a web developer then you will most likely stick with one of the web project templates over and over again. Granted you may use that template several hundred times but it is only one project template. So why should you dig though a sea of installed templates just to get to the one that you use all the time? The simple answer is you shouldn’t. Figure 2 shows the new Recent Templates section.
Click for a larger version of this image.

Figure 2: Recent Templates.
The concept is both simple and elegant: just show the last five most recently used templates. That’s it. No more digging through all the installed templates to find the ones you use most often. I’ve found that most people, once they learn of this feature, just stay in this area most of the time except for the rare times when they need a template not listed here already.
Searching Project Templates
Even with the Recent Templates section you will still need to occasionally dig into areas like the Installed Templates to find one that you need. Don’t go blindly looking for the templates, just search for them instead! Figure 3 shows the new search area that you can use to find project templates.
Click for a larger version of this image.

Figure 3: Search Templates.
The usage is pretty straightforward but requires a little orientation. Let’s say you are searching for web projects. Simply type in the word web and see the results show in Figure 4.
Click for a larger version of this image.

Figure 4: Search Templates result.
Unfortunately it shows all languages and we just want to see the Visual Basic templates. No problem! Just type vb (case doesn’t matter) anywhere in the search to have it filter by language as shown in Figure 5.
Click for a larger version of this image.

Figure 5: Search Templates result filtered by language.
All languages have shorthand syntax to make it easy to search them like C#, C++, and F#, for example. There is one problem with this example however: it isn’t really necessary. You could pretty much get the same result just by expanding the Visual Basic node and going to the Web section in the New Project dialog box as shown in Figure 6.
Granted it’s not an exact match. Visual Web Part isn’t in the list in Figure 6 but it’s pretty close. So while there is some value to using search with installed templates, I believe the real value is when you are dealing with a set of unknown templates.
Click for a larger version of this image.

Figure 6: Web section in Visual Basic project templates.
Online Templates
By now hopefully you have heard of the Visual Studio Gallery which can be found at:
Essentially this is a collection of Microsoft and community-created content to enhance your Visual Studio experience. Fortunately we just use Online Templates in the New Project dialog box as shown in Figure 7 to get access to the project templates from the Visual Studio Gallery.
Click for a larger version of this image.

Figure 7: Online Templates.
As you can see the templates here are organized into broad categories under the Templates node. Lucky for us we don’t need to root around in here looking for specific templates. Just search for your desired template! This is where the Search really shines in my opinion.
Let’s put in C# WPF (again, case doesn’t matter) in the search area and see the result we get shown in Figure 8.
Click for a larger version of this image.

Figure 8: Online Templates search result.
Now we have a filtered list of templates that mention WCF in them or are related to WCF in some way. With that said, be careful about filtering by language in this area. There tends to be a lot of false positives when narrowing down by language. In this example I encountered three non-C# templates in the search result.
Sorting Templates
Last on the list of new features in the New Project dialog box is the Sort By drop-down box. The options here change depending on context. Figure 9 shows the options for Installed Templates available to us.
Click for a larger version of this image.

Figure 9: Sort Options for Installed Templates.
There are many people who don’t like the default sort order for templates. Personally I never had a problem with it but if you want to change the order you can now sort ascending or descending as well.
Also, based on your context, this list will change. Figure 10 shows the sort options for Online Templates.
Click for a larger version of this image.

Figure 10: Sort Options for Online Templates.
More, and different, choices present themselves so we can arrange the templates by the most appropriate information. Generally speaking, I tend to stick with Highest Ranked or Most Downloads as a good indicator of the better templates to use.
Final Thoughts
Clearly there are many new features for you to explore in the Visual Studio 2010 New Project dialog box. It’s time to go beyond what you were used to in prior versions of Visual Studio and leverage these great productivity enhancements. Enjoy!
Mr. Zain Naboulsi Jr

10/20/2011

Passing Variables to and from an SSIS task

by

In which Patrick Index casts a jaundiced eye on SSIS, and decides that, for all its faults, it has a number of uses as an ETL tool. In the first of a series of articles 'from the trenches', Patrick describes how to pass variables to, and from, an SSIS task.

Passing Variables to and from an SSIS task

Like it? SSISht!

Like it or loath it, SSIS is here to stay.  I suppose it’s nice and graphical, and it is also aesthetically  pleasing when you finally get a screen full of green tasks – I tend to leave my screen displaying for a while when this happens, so that everyone can see it whilst I go and make a coffee. SSIS is much richer than DTS.  Additionally you quite often see jobs for SSIS specialists; it would seem that companies are using it as their de-facto ETL tool, standing apart from SQL Server.
SSIS  is, by its very nature, frustrating to work with because it is a mish-mash of dissimilar development environments, and I don’t find that the syntax is always intuitive
There doesn’t seem to be a great deal of material on the web and it can be hard to find good examples to use as guidelines.  So, in the spirit of building up a knowledge base, and hopefully persuading Tony to set up a dedicated section on Simple-Talk for SSIS, I have constructed an example to demonstrate passing variables into and out of an ‘Execute SQL Task and Script Task’.

Passing Variables to and from an ‘Execute SQL Task and Script Task’.

The two tasks do fundamentally the same thing, which is to try and date-stamp a file.  The final variable value “FullPath” could then be easily used by a File System Task to copy/move or delete a file perhaps.
I suppose most SQL Server developers would be more comfortable knocking up this fairly trivial code in SQL, but the difficulty is in passing and catching the input variables in the task.   This example demonstrates the problem.
I have set up a package with three String variables called
  • FileName, which has a data type of String and an initial value of “Import.txt”
  • FolderName, which has a data type of String and an initial value of “c:\”
  • FullPath, which has a data type of String and no initial value
… and an ‘Execute SQL Task and a Script’  Task.
The package is called, rather imaginatively, “Package3”. The scope of the variables is at the package level.  One thing to note when you set up variables (choose SSIS-VARIABLES from the top menu) is to make sure you have clicked on the package and not a task when you create or add a variable.  If you create a variable while being clicked on a task (therefore with task scope) then the variable will disappear from the list when you click up to the package.  Sorry to be an old dog but I initially found this a bit confusing.
The simplest way to inspect your variables is to set a break-point on the task (right click on the task and choose EDIT BREAKPOINTS) for the OnPostExecute event of the task.  This will then allow you to  inspect the value of the variable after the task has completed its execution.  The red dots on the tasks indicate that there are already breakpoints set up on the task.

Doing it the 'Execute SQL Task' way

In the ‘Execute SQL Task Editor’ in the ‘Parameter Mapping’ section, (double-click on the task and choose Parameter mapping), I have set it up so that the two variables i.e. User::FolderName and User::FileName are added to the dialogue box.  Each has a Direction of “Input” which seems logical and a data type of VARCHAR.  The parameter names that I have used are just the ordinal positions of 0 and 1, which is what the context help suggests.  In other words, the value of User::FolderName is the first input parameter and User::FileName is the second input parameter.  The parameter lengths are 50. In other words, we are dealing with a varchar(50) parameter.  The initial default values for these, when you set them up, are -1 which tells me nothing I am afraid.
For the Result Set options on the Execute SQL Task, I have put in the aliased name of the output column from my query, which, giving my poetic instincts full rein, I have called FullPathFromQuery, and I want to pass this to my variable User::FullPath.
‘So what about the query?’ you may ask.  Well, if we go back to the General option on the Execute SQL Task Editor, you can see the code and I will list it again here
DECLARE @YesterdaysDate varchar(8)
DECLARE @Filename varchar(50)
DECLARE @Folder varchar(50)
DECLARE @Etc varchar(50)
SET @Folder = ?
SET @Filename = ?
--SET @Etc = ?
SELECT @YesterdaysDate = CONVERT(varchar(8),DATEADD(dd,-1,getdate()),112)
SELECT @Folder + SUBSTRING(@Filename,1,CHARINDEX('.txt',@Filename)-1) + '_' + @YesterdaysDate + '.txt' AS FullPathFromQuery
For such trivial code you would not want to set up a stored procedure I suspect, so the option of passing parameters to a stored procedure is not really there.
The only way to pick up these input variable values is to use question marks “?” in the order that they are passed.  This query as it stands will clearly not parse in query analyser window so you can only really test your code by running it in the task: This is not ideal.
You must also set the ResultSet option to be “Single row”.
If you run this to the break point one can see that the variable User::FullPath has been assigned the value  C:\\Import_200870805.txt ( I ran this on 6th Aug  2008) so the code must be working.

Using a Script Task instead

Perhaps a simpler way to do this is just to use the Script Task.  The trivial code again demonstrates how to pick up and assign values to and from variables in the vb.net code.  You need to tell the task which variables it is going to use by adding them to the ReadOnlyVariables  and ReadWriteVariables options in the Script Task editor window and also be aware that the variables names are case-sensitive.
The code which shows the passing and assignment of the variables in the script is shown below.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

      Public Sub Main()
           

        Dim strFolder As String
        Dim strFilename As String
        Dim strTomorrow As String
        Dim strNewFullPath As String

        'do path in script
        strFolder = Dts.Variables("FolderName").Value.ToString
        strFilename = Dts.Variables("FileName").Value.ToString()
        strTomorrow = CStr(Format(Today().AddDays(+1), "yyyyMMdd"))
  strNewFullPath = strFolder & strFilename.Substring(0,    strFilename.IndexOf(".txt")) & "_" & strTomorrow & ".txt"

        'display new value of path
        'MsgBox(strNewFullPath)

        Dts.Variables("FullPath").Value = strNewFullPath
        Dts.TaskResult = Dts.Results.Success

      End Sub

End Class
If you put a breakpoint on the task the value of the variable can be inspected to give tomorrows date, and it should look like this…

So which approach is best? 

People tell me that using question marks was how it was in DTS, but I have to say I didn’t find setting up the Execute SQL Task particularly intuitive.  The script task for me seems like the simplest solution of the two, surprisingly.
So the example is not going to win a Nobel Peace Prize for complexity but it does demonstrate the awkwardness of SSIS in the real world.  Hopefully the article will encourage readers to publish their experiences using the various tasks and we can build up a knowledge base of real world examples for everyone to reference on the simple-talk site.  I have recently been wrestling with the XML task and data source which will be the subject of my next article.

Working with Property Expressions in SQL Server Integration Services

by

Property Expressions in SSIS packages are assigned to a property at run time. This allows parameters to be passed to a package every time they are run, making them far more versatile. Despite their obvious usefulness, they are not well known. Hopefully, Bob's article puts things right.
SQL Server Integration Services (SSIS) provides a number of advanced features that support the efficient development and deployment of SSIS packages. One of these features is the property expression, which lets you dynamically update a property value at runtime. For example, you can use a property expression to define the connection string of an OLE DB connection manager. That way, the connection to the data source can be generated dynamically at run time, letting you connect to different data sources without needing to reconfigure your package each time the data source changes.
In this article, I explain how to use property expressions in your SSIS packages and provide an example package that includes several property expressions. This package—PropertyExpressions.dtsx—includes a simple control flow and data flow that retrieve metadata from a local instance of SQL Server. Figure 1 shows the package’s control flow.
Figure 1: Control Flow in the PropertyExpressions.dtsx SSIS package
The control flow components perform the following tasks:
  • Execute SQL task: Queries the sys.databases catalog view in the master database to retrieve a list of database names and add them to the Databases variable.
  • Foreach Loop container: Loops through the list of databases in the Databases variable and, with each loop, assigns the current database name to the Database variable. The Foreach Loop container runs the following control flow tasks with each loop:
    • Execute SQL task: Queries the sys.databases catalog view in the master database to retrieve the database ID for the current database (in the Database variable).
    • Data Flow task: Retrieves a list of tables in the current database.
Notice also in Figure 1 that the PropertyExpressions.dtsx package includes two connection managers:
  • SqlSrv2005: Connects to the master database in the local instance of SQL Server 2005.
  • DatabaseTables: Connects to the C:\Data\master - tables.csv file.
Now let’s take a look at the data flow associated with the Data Flow task (shown in Figure 2). The data flow includes two components:
  • OLE DB source: Queries the sys.tables catalog view in the current database (in the Database variable) to retrieve the name, created date, and modified date for each table in the current database.
  • Flat File destination: Inserts the table-related data into a .csv file specific to the current database.
Figure 2: Data Flow in the PropertyExpressions.dtsx SSIS package
Figure 3: User-Defined Variables in the
PropertyExpressions.dtsx SSIS package
To support the various tasks in the PropertyExpressions.dtsx package, I created three variables (shown in Figure 3). Notice that I’ve defined the Databases variable as an object, the Database variable as a string, and the DatabaseID variable as an integer.

As you can see, PropertyExpressions.dtsx is a very basic SSIS package, but it contains all the components necessary to demonstrate how to use property expressions. Now let’s take a look at how that is done.
Note: I created the PropertyExpressions.dtsx package in SSIS 2005. However, I upgraded the package to SSIS 2008 to ensure that the property expressions features are implemented the same in both versions. I noticed no difference between the two.

Note also that this article assumes that you’re familiar with how to create an SSIS package. For more information about any of the specific components used in the PropertyExpressions.dtsx package, refer to SQL Server Books Online. In addition you can download the SSIS 2005 version of the PropertyExpressions.dtsx package here

Creating Property Expressions

The process of creating a property expression is very straightforward. You can add an expression to any read/write property in most of a package’s objects, which includes the package itself, the containers and tasks in the control flow, connection managers, event handlers, and log providers. As you’ll see later in the article, you can also add expressions to some of the properties in data flow components, but this functionality is much more limited than the control flow.

Accessing a Component’s Properties

Figure 4: SqlSrv2005 connection manager properties
The first step in adding a property expression is to access a component’s properties. Each component supports a set of properties that you can view in the Properties window in Business Intelligence Development Studio (BIDS). Figure 4 shows the properties for the SqlSev2005 connection manager in the PropertyExpressions.dtsx package.
If the Properties window is not displayed, right-click the component whose properties you want to view, and then click Properties. If the Properties window is already open, simply select the component and the properties will be displayed in the window.
Notice in Figure 4 that the SqlSrv2005 connection manager’s properties include the Expressions property. By default, no expressions are defined on a component’s properties. If a property expression has been defined, you can view it by clicking the plus (+) sign to the left of the Expression property. Any expressions that have been defined are then displayed immediately beneath the Expressions listing.
In this case, I have defined an expression on the InitialCatalog property, which determines which database to connect to when accessing the specified instance of SQL Server. I’ll go into more detail about the expression itself later in the article.

Building your Property Expressions

To add an expression to a property on a specific component, click the ellipses button to the right of the Expressions property in the Properties window. This launches the Property Expressions Editor (shown in Figure 5). The Property Expression Editor lists any property expressions that have been defined on the selected component and lets you modify those expressions as well as add or delete expressions.

Figure 5: Property Expressions Editor in SSIS
The Property column in the Property Expressions Editor lists properties for which property expressions have been defined. For example, Figure 5 lists the InitialCatalog property, which has been defined with an expression. When you add a new property expression, you must first select a property from the drop-down list in the Property column (in the first available blank row). The list includes only those properties on which properties can be define and have not yet been defined.
Note: A read-write property on any given component can be associated with only one expression. In other words, you cannot define two separate expressions on the same property in the same component.
After you select the property from the Property drop-down list, you can then define an expression (or edit an existing one) directly in the Expression column.
If you need help in building an expression, click the ellipses button to the right of the Expression column. This launches Expression Builder (shown in Figure 6).

Figure 6: Expression Builder dialog box in SSIS
The Expression Builder dialog box provides several tools to help you create your expressions:
  • The upper-left window lists the system and user-defined variables that can be used in your expressions.
  • The upper-right window lists the functions and operators that you can include in your expressions.
  • The Expression window provides a workspace for creating your expressions. You can drag variables, functions, and operators from the two upper windows into the workspace.
  • The Evaluate Expression button launches the logic necessary to evaluate the expression in the Expression window. If the expression can be evaluated, the results of that evaluation are displayed in the Evaluated value window. If the expression cannot be evaluated, a message window appears and displays details about the expression. Sometimes the message is useful in determining why your expression could not be evaluated. Often, however, it is not.
The expression shown in Figure 6 is the one I created on the InitialCatalog property of the SqlSrv2005 connection manager. To create the expression, I simply dragged the Database user-defined variable to the Expression window. Expression Builder automatically added the necessary at (@) symbol, brackets, and User namespace, which are required when referencing a variable within a property expression.
That was all I needed to do to create the expression. Because I used the Database variable and the variable is populated with a new database name each time the Foreach Loop container runs, the SqlSrv2005 connection manager will connect to the current database whenever it is called from within the container.
I also created the following property expression on the ConnectionString property of the DatabaseTables connection manager:
"C:\\Data\\" + @[User::Database] + " - tables.csv"
The expression uses the Database variable in order to include the database name in the file name. To do this, I concatenated two strings with the variable. Note that the string values must be enclosed in double quotes. For the backslashes in the path name, I had to escape each one with an additional backslash. As a result, each time the data flow runs (for each database) a file is created for that database. For example, the path name and file name for the file created for the AdventureWorks database will be C:\Data\AdventureWorks - tables.csv.

Creating Property Expressions on Data Flow Components

The process of creating a property expression is the same for any type of property that supports an expression. In addition, the properties for most objects remain consistent for each instance of that object. For example, no matter how many instances of an Execute SQL task you add to the control flow, the properties will be the same.
This is not the case for the Data Flow task. Because you can add components to the data flow, the properties available to the task can vary. However, not all of those properties—few as a matter of fact—support property expressions. For specifics about which data flow component properties support expressions, see the topic “Data Flow Properties that Can Be Set by Using Expressions” in SQL Server Books Online.
To configure expressions on a property in a data flow component, you take the same steps as you would any other property in the Data Flow task. If a component has been added to the data flow and the component includes properties that can be configured with expressions, those properties are displayed along with the base properties in the Property drop-down list in the Property Expression Editor (see Figure 5).
For example, the data flow in the PropertyExpressions.dtsx package includes a Flat File Destination component that inserts data into a .csv file created for each database. The component includes the Header property, which can be configured with an expression. As a result, that property is included in the list of available properties in the Property Expression Editor, when viewed for the Data Flow task. This means you can define on an expression on that property.
For instance, suppose you want to include a note at the beginning of each .csv file that identifies the name of the database. You can define an expression similar to the following:
"Tables in the " + @[User::Database] + " database"
In this case, I simply concatenate two strings with the current value in the Database variable. As each .csv file is created, the note is added to the file.

Using Enumerated Constants in Property Expressions

Some properties in SSIS objects require value from an enumerator member list. This means that, although the property value is displayed in BIDS with a friendly name (the enumerated constant), the value is actually stored as a numeral. For example, an SSIS package supports the CheckpointUsage property. In BIDS, the property value will be displayed as Never, IfExists, and Always. However, the values are actually stored as 0, 1, and 2, respectively.
When you define an expression on a property that uses enumerated constants, you must use the numeric value in your expression rather than the friendly name. The challenge to this, of course, is that most people don’t have these numbers memorized. For some properties that use enumerated constants, such as the ResultSetType property in the Execute SQL task, you can find the numeric values in the topic “Using Enumerated Constants in Property Expressions” in SQL Server Books Online. For properties not included in this topic, you have to dig a little deeper. For example, I had to refer to Microsoft TechNet to find the numeric values for the IsolationLevel property of the Data Flow task ( DtsContainer.IsolationLevel Property ).
After you’ve identified the correct numeric value for a specific enumerated constant, you can use that value in your expressions. For example, I created the following expression on the IsolationLevel property of the Data Flow task in the PropertyExpressions.dtsx package:
@[User::DatabaseID] < 5 ? 256 : 65536
In this expression, I compare the value of the DatabaseID variable to the number 5. The question mark-colon (? :) symbols serve as a condition operator that returns one of two expressions based on the evaluation of a Boolean expression. The Boolean expression in this case is @[User::DatabaseID] < 5. If the DatabaseID value is less than 5, the expression evaluates to true, otherwise it evaluates to false. If true, the IsolationLevel property is set to 256 (ReadUncommitted), otherwise the property is set to 65536 (RepeatableRead).
Note: I use the DatabaseID value to determine which databases are system databases. By default, the first four databases created in SQL Server 2005 and 2008 are master, model, msdb, and tempdb. This lets me set the transaction level differently for the primary system databases compared to other databases.

Moving Ahead with Property Expressions

The example property expressions shown up to this point have been fairly basic, yet the concepts demonstrated in those expressions should represent many of the elements you’re likely to use in your expressions. However, you can create more advanced expression by using the various types of variables, operators, and functions available to SSIS expressions. For more details about SSIS expressions, see the topic “Integration Services Expression Concepts” in SQL Server Books Online. There you will be directed to additional information and examples. In the meantime, you should now have the information you need to get started adding property expressions to any of your SSIS packages.

Working with Precedence Constraints in SQL Server Integration Services

by

In SSIS, tasks are linked by precedence constraints.  A task will only execute if  the condition that is set by the precedence constraint preceding the task is met. By using these constraints, it is possible to ensure different execution paths depending on the success or failure of other tasks. This means that you can use  tasks with precedence constraints to determine the workflow of an SSIS package. We challenged Rob Sheldon to provide a straightforward  practical example of how to do it.
The control flow in a SQL Server Integration Services (SSIS) package defines the workflow for that package. Not only does the control flow determine the order in which executables (tasks and containers) will run, the control flow also determines under what conditions they’re executed. In other words, certain executables will run only when a set of defined conditions are met.
You configure the workflow by using precedence constraints. Precedence constraints link the individual executables together and determine how the workflow moves from one executable to the next. Figure 1 shows the control flow for the PrecedenceConstraints.dtsx package. The precedence constraints are the green and red arrows (both solid and dotted) that connect the tasks and container to each other. (There can also be blue arrow, as you’ll learn later in the article.)
Figure 1: Control Flow in the PrecedenceConstraints.dtsx SSIS package
As you would expect, the arrows define the direction of the workflow as it moves from one executable to the next. For example, after the first Execute SQL task runs, the precedence constraints direct the workflow to the next Execute SQL task and the Sequence container. One or both of these executables will run, depending on how the precedence constraints have been configured.
When a precedence constraint connects executables, the originating executable (the first to run) is referred to as the precedence executable. Multiple precedence constraints can originate from the precedence executable. In the first Execute SQL task in Figure 1, two precedence constraints originate from that precedence executable.
The task or container that is on the downstream end of the precedence constraint is referred to as the constrained executable. The constrained executable will run only if the conditions defined on the precedence constraint are met. If the conditions are not met, the constrained executable will not run. As a result, by configuring the precedence constraints, you can create complex workflows, while minimizing the need to configure duplicate tasks and containers.
Note that I created the PrecedenceConstraints.dtsx package in SSIS 2005. However, I upgraded the package to SSIS 2008 to ensure that precedence constraints are implemented the same in both versions. You can download the SSIS 2005 version of the PrecedenceConstraints.dtsx package in the speech bubble.
If you want to run the PrecedenceConstraints.dtsx package, you should first run the following Transact-SQL code against the AdventureWorks sample database:
IF EXISTS(
  SELECT table_name FROM information_schema.tables
  WHERE table_name = 'Employees')
DROP TABLE Employees
GO
CREATE TABLE Employees
(
  EmployeeID INT PRIMARY KEY,
  FirstName NVARCHAR(50) NOT NULL,
  LastName NVARCHAR(50) NOT NULL,
  Jobtitle NVARCHAR(50) NOT NULL
)
GO
IF EXISTS(
  SELECT table_name FROM information_schema.tables
  WHERE table_name = 'EmployeeLog')
DROP TABLE EmployeeLog
GO
CREATE TABLE EmployeeLog
(
  LogID INT IDENTITY PRIMARY KEY,
  LogFile VARCHAR(50) NULL,
  LogDateTime DATETIME NOT NULL DEFAULT GETDATE()
)
The code creates the Employees table and EmployeeLog table, both of which are necessary to run the PrecedenceConstraints.dtsx package. The package itself archives employee data, logs information about the package execution, truncates the Employees table if necessary, retrieves data through the HumanResources.vEmployee view, and loads it into the Employees table. The package uses different types of precedence constraints to control the workflow for these various operations. We’ll look at the workflow and precedence constraints in closer detail as we work through the article.

Defining Workflow by Success or Failure

By default, when a precedence constraint connects two executables, the constrained executable will run after the precedence executable successfully runs. However, if the precedence executable fails, that part of the workflow is interrupted, and the constrained executable does not run. You can override this behavior by setting the Value property on the precedence constraint. The property supports three options:
  • Success: The precedence executable must run successfully for the constrained executable to run. This is the default value. The precedence constraint is set to green when the Success option is selected.
  • Failure: The precedence executable must fail for the constrained executable to run. The precedence constraint is set to red when the Failure option is selected.
  • Completion: The constrained executable will run after the precedence executable runs, whether the precedence executable runs successfully or whether it fails. The precedence constraint is set to blue when the Completion option is selected.
If you refer back to Figure 1, you’ll see that the two precedence constraints originate from the Data Flow task, one green and one red. If the Data Flow task runs successfully, the first Send Mail task will run because the green precedence constraint connects to that Send Mail task. Because the Value property of the precedence constraint is set to Success, the precedence constraint will evaluate to true and the workflow will continue along that path.
However, if the Data Flow task fails, the red precedence constraint will evaluate to true because its Value property is set to Failure. As a result, the second Send Mail task will run. This way, you can configure each Send Mail task differently so that you’re sending a unique email based on whether the Data Flow task succeeds or fails.
Note: The two Send Mail tasks and the SMTP connection manager are included here for demonstration purposes only. If you want to tests these tasks, you will need to point the connection manager to an actual SMTP server and configure the tasks appropriately. Otherwise, you should disable the tasks or they will fail when you try to run the package. (Even if they do fail, however, the package will still run and load the data as expected.)

Defining Workflow by Expressions

Although defining workflow by execution outcome (success, failure, or completion) can be useful, the workflow logic is still limited to that outcome. However, you can further refine your workflow by adding expressions to the precedence constraints. Any expression you add must be a valid SSIS expression and must evaluate to true or false.
To add an expression, double-click the precedence constraint to open the Precedence Constraint Editor dialog box, as shown in Figure 2. (The editor shown in this figure is the one for the precedence constraint that connects the first and second Execute SQL tasks.)
Figure 2: Defining an expression in the Precedence Constraint Editor dialog box
When adding an expression to a precedence constraint, the first step you must take is to select one of the following options from the Evaluation operation drop-down list:
  • Constraint: The precedence constraint is evaluated solely on the option selected in the Value property. For example, if you select Constraint as the Evaluation operation option and select Success as the Value option (the default settings for both properties), the precedence constraint will evaluate to true only if the precedence executable runs successfully. When the precedence constraint evaluates to true, the workflow continues and the constrained executable runs. (When the Constraint option is selected, the Expression property is greyed out.)
  • Expression: The precedence constraint is evaluated based on the expression defined in the Expression text box. If the expression evaluates to true, the workflow continues and the constrained executable runs. If the expression evaluates to false, the constrained executable does not run. (When the Expression option is selected, the Value property is greyed out.)
  • Expression and Constraint: The precedence constraint is evaluated based on both the Value property and the expression. Both must evaluate to true for the constrained executable to run. For example, in the PrecedenceConstraints.dtsx package, the first Execute SQL task must run successfully and the expression must evaluate to true for the precedence constraint to evaluate to true and the constrained executable to run.
  • Expression or Constraint: The precedence constraint is evaluated based on either the Value property or the expression. At least one of these properties must evaluate to true for the constrained executable to run.
After you’ve selected an option from the Evaluation operation list (and set the Value property, if appropriate), you’re next step is to define the expression. The expression I’ve used in this case (shown in Figure 2), determines whether the @EmployeeCount property equals 0 (@EmployeeCount == 0).
To better understand how this works, let’s take a quick look at the first Execute SQL task. The task runs the following SELECT statement to retrieve the number of rows in the Employees table:
SELECT COUNT(*) FROM Employees
The task then assigns the statement’s results (a scalar integer value) to the @EmployeeCount variable, which I defined when I set up the SSIS package.
The precedence constraint expression then uses the variable value to determine whether it equals 0. If it does, the expression evaluates to true. If not, it evaluates to false. Because the Value property precedence constraint is also set to Success, the first Execute SQL task must run successfully and the expression must evaluate to true for the precedence constraint as a whole to evaluate to true. If it does, the second Execute SQL task runs.
The precedence constraint that connects the first Execute SQL task to the Sequence container uses similar logic. However, the expression itself is slightly different:
@EmployeeCount > 0
In this case, the @EmployeeCount value must be greater than 0 for the expression to evaluate to true. If it does evaluate to true and the first Execute SQL task runs successfully, the Sequence container and the tasks within it will run.
Notice that the expressions in the two precedence constraints that originate from the first Execute SQL task are mutually exclusive. That is, only one of the two expressions can ever evaluate to true during a specific execution. That does not mean that you cannot have multiple precedence constraints with expressions that all evaluate to true during a single execution, but it does mean that you want to use caution when implementing expression to insure that they reflect exactly the logic you’re trying to implement in your workflow. In this case, I want to ensure that only one precedence constraint evaluates to true during a single execution.
Note: SSIS expressions are an entity unto themselves and unique to SSIS packages and their components. It is beyond the scope of this article to get into the details of expressions, but it is important to get them right. Be sure to refer to the topic “Integration Services Expression Reference” in SQL Server Books Online if you have any questions about SSIS expressions.

Defining Workflow by Logical AND or Logical OR

Two other important configuration options in a precedence constraint are the Logical OR and Logical AND settings. These settings apply only to constrained executables and only if those executables have more than one precedence constraint directed to it. For example, the Data Flow task in the PrecedenceConstraints.dtsx package (shown in Figure 1) has two precedence constraints pointing to it: one from the second Execute SQL task and one from the Sequence container.
If you refer back to Figure 2, you’ll find the following two options at the bottom of the Precedence Constraint Editor dialog box:
  • Logical AND: All precedence constraints that point to the constrained executable must evaluate to true in order for that executable to run. This is the default option. If it is selected, the arrow is solid.
  • Logical OR: Only one precedence constraint that points to the constrained executable must evaluate to true in order for that executable to run. If this option is selected, the arrow is dotted.
In the case of the precedence constraints that point to the Data Flow task, it is the second option that is selected, as indicated by the dotted lines. As a result, the workflow can originate from either the second Execute SQL task or from the Sequence container, and only one of the precedence executables has to run successfully.
The Logical OR and the Logical AND options let you define multiple execution paths, yet share those elements that are common to each path, such as the Data Flow task in the PrecedenceConstraints.dtsx package. The logical AND/OR settings, along with the Value property and the use of expressions, provide you with the options necessary to define intricate workflows while minimizing duplicate efforts.
Despite its simplicity, the PrecedenceConstraints.dtsx package described in this article demonstrates all these elements and should provide you with the foundation you need to use precedence constraints to their fullest. However, you can find additional information about precedence constraints in the topic “Setting Precedence Constraints on Tasks and Containers” in SQL Server Books Online.

Adding the Script Task to Your SSIS Packages

by

Script tasks are a great way of extending SSIS functionality, when none of the built-in components are quite right for the task you need to perform. But how to go about creating a script task? No worries, once again Robert Sheldon is on hand to provide easy instructions on how to do it.

One of the most effective ways to extend your SQL Server Integration Services (SSIS) control flow is to use a Script task to write custom code that perform tasks you cannot perform with the built-in components. For example, you can use the Script task to access Active Directory information or to create package-specific performance counters. You can also use the Script task to combine functions that might normally require multiple tasks and data flow components.
In this article, I demonstrate how to implement a Script task into the control flow of a basic SSIS package. The purpose of the package is to retrieve data from a comma-separated values (CSV) file, insert the data into a SQL Server table (though no data is actually loaded), and then delete the file. The package includes a Script task that will first determine if the file has been updated since the last data load and, if so, whether the file currently contains data. You can download the SSIS package file, along with the two data files (one with data, and one without) from the speech-bubble at the head of the article. You'll have to rename the data file that you use to PersonData.CSV.
NOTE: The Script task is different from the Script component. The Script component can be used as a source, transformation, or destination to enhance the data flow, but it cannot be used in the control flow, just like the Script task is not available to the data flow. However, many of the basic concepts I cover in this article apply to the Script component.

Setting Up Your SSIS Package

Before adding the Script task to your SSIS package, you should add and configure any components that are necessary to support the task, such as variables, connections managers, and other tasks. For the example in this article, however, I created a CSV file named PersonData.csv before adding any components to the package. To make it simpler for you to create the file, I’m including the bcp command I used to generate the file on my system:
bcp "SELECT TOP 100 * FROM AdventureWorks2008R2.Person.Person ORDER BY BusinessEntityID" queryout C:\DataFiles\PersonData.csv -c -t, -S localhost\SqlSrv2008R2 –T
Notice that the command retrieves Person data from the AdventureWorks2008R2 database and adds that data to the C:\DataFiles\PersonData.csv file. I also created a second bcp command to create an empty file with the same name. I did this in order to fully test the SSIS package. In the SELECT statement in the second command, I include a WHERE clause that references a nonexistent BusinessEntityID value:
bcp "SELECT * FROM AdventureWorks2008R2.Person.Person WHERE BusinessEntityID = 123456789" queryout C:\DataFiles\PersonData.csv -c -t, -S localhost\SqlSrv2008R2 –T
Because I’m referencing a nonexistent BusinessEntityID value, the command creates an empty CSV file. You can use either file to test your SSIS package, should you decide to try the example I’m demonstrating in this article.
NOTE: I tested the SSIS package against both files. However, because the files share the same name, I had to create them one at a time, deleting the original, as necessary, after I ran the SSIS package.
The Script task that I’ll be demonstrating will reference two SSIS user-defined variables, so after I created the CSV file, I added the following two variables to my package:
  • IsEmpty: A Boolean variable with a package-level scope. The variable will be used by the Script task to specify whether the source CSV file contains data. I’ve set the initial value to False, but the Script task will set the final value, so you can set the initial value to either True or False.
  • LastUpdate: A DateTime variable with a package-level scope. I’ve set the value as an arbitrary date that precedes the date that I created the CSV files. In theory, the LastUpdate variable stores the timestamp of the last time the package ran and updated the database. In reality, this date would probably come from a table or some other system that logged the updates. For this article, however, it serves our purposes to set a hard-coded date.
My next step was to create a Flat File connection manager named PersonData. The connection manager connects to the C:\DataFiles\PersonData.csv file. Other than naming the connection manager and providing a description, I retained the default settings for its other options.
NOTE: This article assumes that you know how to run bcp commands as well as add and configure SSIS components, such as variables, connection managers, and tasks. If you’re not familiar how to use bcp or work with these components, you should first review the relevant topics in SQL Server Books Online or in another source.
The two variables and Flat File connection manager are the only SSIS components necessary to support the Script task that I’m going to demonstrate. So let’s look at how to add and configure that task.

Adding the Script Task to Your Package

After you add the Script task to your SSIS package, you can configure it by opening the Script Task Editor. On the editor’s General page, you should provide a name and description for the task. (I named the task Check file status.) Next, go to the editor’s Script page to configure the script-related properties, as shown in Figure 1.
Script page of the Script Task Editor
Figure 1: Script page of the Script Task Editor
The first property that you need to set is ScriptLanguage. You can create your scripts in one of two languages: Visual Basic 2008 or Visual C# 2008. I used C# for the script that I created.
The next property on the Script page is EntryPoint. This is the method (specific to the selected script language) that the SSIS runtime calls as the entry point into your code. The Main method, in most cases, should work fine. However, if you choose another method, it must be in the ScriptMain class of the Visual Studio for Applications (VSTA) project.
The next two properties on the Script page are ReadOnlyVariables and ReadWriteVariables. As the names imply, you enter the name of any SSIS variables you want to use in your script. (Separate the names with commas for multiple variables of either type.) For instance, I added the LastUpdate variable to the ReadOnlyVariables property and the IsEmpty variable to the ReadWriteVariables property. As a result, my C# script will be able to retrieve the date from the LastUpdate variable and set the file status in the IsEmpty variable.
That’s all there is to configuring the Script task properties in preparation for creating the script itself, so once you’ve configured the properties, click the Edit Script button on the editor’s Script page to open the VSTA integrated development environment (IDE) window, shown in Figure 2. All script modifications are made in the VSTA development environment.
Default C# code in the VSTA IDE window
Figure 2: Default C# code in the VSTA IDE window
As Figure 2 shows, when you first open the VSTA window, you’ll see the default C# script, which includes the language necessary to work with the Main method of the ScriptMain class. Because Figure 2 shows only part of the script, I’m included the entire default code here for your convenience:
/*
   Microsoft SQL Server Integration Services Script Task
   Write scripts using Microsoft Visual C# 2008.
   The ScriptMain is the entry point class of the script.
*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;

namespace ST_96fb03801a81438dbb2752f91e76b1d5.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        /*
            The execution engine calls this method when the task executes.
            To access the object model, use the Dts property. Connections, variables, events,
            and logging features are available as members of the Dts property as shown in the following examples.

            To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
            To post a log entry, call Dts.Log("This is my log text", 999, null);
            To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);

            To use the connections collection use something like the following:
            ConnectionManager cm = Dts.Connections.Add("OLEDB");
            cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";

            Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
           
            To open Help, press F1.
      */

        public void Main()
        {
            // TODO: Add your code here
            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}
For the most part, you need to be concerned only with adding code to the Main method, specifically, to the section that is marked with the comment // TODO: Add your code here. (Comments are either preceded by double slashes for a single line or enclosed slashes and asterisks—/* and */—for multiple lines.) Usually, the only exception to where you enter code is at the beginning of the script, where you include the necessary using statements to define the relevant namespaces. For instance, the script includes the using System; statement so we can access classes in the System namespace, like those that reference components such as events, interfaces, and data types.
NOTE: A full explanation of how to use the C# language within a Script task is beyond the scope of this article. For specifics about the language, refer to a more complete resource, such as MSDN.
Other than the using statements and the Main method, you should, for the most part, leave the rest of the code alone, except for perhaps deleting comments. (Unless you’re a C# pro—then have add it.) Now let’s look how to modify the script to check the status of our flat file.

Writing Your C# Script

The first step I often take when working the code in the Script task is to get rid of the comments. In this case, I removed the comments before the Main method. You can also delete the opening comments, but I left them in just to provide a few reminders about the environment in which we’re working. So let’s look at how I’ve modified the script, and then I’ll explain the changes I’ve made. The following code shows how I’ve updated the original script and expanded the Main method:
   Microsoft SQL Server Integration Services Script Task
   Write scripts using Microsoft Visual C# 2008.
   The ScriptMain is the entry point class of the script.
*/

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.IO; //added to support file access

namespace ST_5bd724e0deb3452e8646db6ec63913b0.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion


        public void Main()
        {
            // Define C# variable to reference SSIS user variable.
            DateTime LastLoggedUpdate = (DateTime)(Dts.Variables["LastUpdate"].Value);

            // Define variable for connection string.
            string PersonDataConnection = (string)(Dts.Connections["PersonData"].AcquireConnection(null) as String);

            // Create file object based on file connection.
            FileInfo PersonDataFile = new FileInfo(PersonDataConnection);

            // Retrieve properties from file object.
            DateTime LastModified = PersonDataFile.LastWriteTime;
            long PersonFileSize = PersonDataFile.Length;

            // If the file was modified since the last logged update,
            // set IsEmpty variable and set the task result to Success.
            // Otherwise, fail the task.
            if(LastModified > LastLoggedUpdate)
            {
                if(PersonFileSize > 0)
                {
                    Dts.Variables["IsEmpty"].Value = false;
                }
                else
                {
                    Dts.Variables["IsEmpty"].Value = true;
                }
                Dts.TaskResult = (int)ScriptResults.Success;
            }
            else
            {
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }
    }
}
Let’s start with the using statements at the beginning of the script. You might have noticed that I added the using System.IO; statement. The System.IO namespace lets us access the language components we need in order to retrieve information about the flat file. I did not modify the script outside the Main method in any other way (except for deleting comments), so now let’s look at that method.
The first item I added after the Main method declaration is a DateTime variable named LastLoggedUpdate:
// Define C# variable to reference SSIS user variable.
DateTime LastLoggedUpdate = (DateTime)(Dts.Variables["LastUpdate"].Value);
I’ve set the variable value to equal that of the SSIS LastUpdate variable that I defined on the package. To retrieve the value of the SSIS variable, I use the Dts object’s Variables property, which returns a Variable object. I then specify the name of the variable, enclosed in double-quotes and brackets, and tag on the Value property (available through the Variable object). This lets me retrieve the variable’s actual value. Note that I also cast the LastUpdate value to the DateTime data type by preceding the Dts variable construction by the name of the data type, just like I do when I declare the LastLoggedUpdate variable. I can now reference the LastLoggedUpdate variable within the Main method, and it will return the data currently stored the LastUpdate SSIS variable.
Next I declare a string variable named PersonDataConnection to hold the connection string I retrieve through the PersonData connection manager:
// Define variable for connection string.
string PersonDataConnection = (string)(Dts.Connections["PersonData"].AcquireConnection(null) as String);
Notice that my declaration once again begins with the data type, followed by the name of the variable. I then set the variable’s value to equal the connection string. I retrieve the connection string by using the Dts object’s Connections property. This is followed by the name of the connection manager, enclosed in double-quotes and brackets, and then by the AcquireConnection method. The method takes one argument—the handle to a transaction type. In this case, we can specify NULL , which indicates that the container supports transactions but is not going to participate. In other words, you don’t need to worry about this. Just pass in NULL for this type of connection. Notice also that I’m explicitly converting the connection object to a string to pass into the PersonDataConnection variable.
The next variable I declare is PersonDataFile, which is defined with type FileInfo:
// Create file object based on file connection.
FileInfo PersonDataFile = new FileInfo(PersonDataConnection);
In this case, the variable’s value is based on a new instance of the FileInfo class. Because the FileInfo constructor takes the PersonDataConnection variable as an argument, you can use the methods and properties available to the FileInfo class to access information about the PersonData.csv file. That means you can access those properties and methods through the PersonDataFile variable, which is what I do in the next two variable declarations:
// Retrieve properties from file object.
DateTime LastModified = PersonDataFile.LastWriteTime;
long PersonFileSize = PersonDataFile.Length;
First, I declare a DateTime variable named LastModified and set its value to equal the value of the LastWriteTime property of the PersonDataFile variable, which is a FileInfo object. This will provide me with a timestamp of the last time the file was modified. I declare the second variable with the long data type and name the variable PersonFileSize. I then set the variable value to equal that of the file object’s Length property.
After I’ve declared the necessary variables, I’m ready to implement the logic needed to check the status of the PersonData.csv file. In the next section of code, I include two if…else statements, one embedded in the other:
// If the file was modified since the last logged update,
// set IsEmpty variable and set the task result to Success.
// Otherwise, fail the task.
if(LastModified > LastLoggedUpdate)
{
     if(PersonFileSize > 0)
     {
          Dts.Variables["IsEmpty"].Value = false;
     }
     else
     {
          Dts.Variables["IsEmpty"].Value = true;
     }
     Dts.TaskResult = (int)ScriptResults.Success;
}
else
{
     Dts.TaskResult = (int)ScriptResults.Failure;
}
Let’s start by looking at the outer if…else construction. Essentially, what this is saying is, “If the last modified date is more recent that the last time data was loaded into the database, run the script in the if section. Otherwise, skip to the end of the script and show the Script task as having failed.
The if statement begins by specifying the condition that determines whether to run the code in the if section or in the else section. If the condition evaluates to True—in this case, the LastModified date is more recent that the LastLoggedUpdate date—the code in the rest of the if section should run. If the condition does not evaluate to true, the code in the if section does not run and the code in the else section runs, which sets the Dts object’s TaskResult property to Failure. (The TaskResult property tells the runtime whether the task succeeded or failed.)
The embedded if…else construction checks whether the value in the PersonFileSize variable is greater than 0, in other words, whether the file contains any data. If the file does contain data, the code in the if section runs, otherwise the code in the else section runs. As a result, if the file contains data, the SSIS IsEmpty variable is set to false. If the file contains no data, the variable is set to true. Notice that after the embedded if…else construction, I’ve set the value of the TaskResult property to show that the task has successfully run.
That’s all there is to the script. Normally, you would also include code to handle exceptions, but what I’ve shown you here should provide you with an overview of the script’s basics elements. You can now close the VSTA window and then click OK to close the Script Task Editor. Be sure to save your changes.

Adding Other Tasks to Your Package

After I completed configuring the Script task and writing the C# script, I added a Data Flow task to the control flow. The data flow should, in theory, retrieve the data from the PersonData.csv file and insert it into a SQL Server database. However, for the purposes of this exercise, the Data Flow task serves only as a placeholder. It will still run like any other Data Flow task, but no data will actually be moved. Even so, you can still verify whether your control flow is set up correctly.
Next, I connected a precedence constraint from the Script task to the Data Flow task. I then added a File System task to the control flow and configured it to delete the PersonData.csv file. Next I connected a precedence constraint from the Script task to the File System task and one from the Data Flow task to the File System task. I then configured the two precedence constraints connecting to the File System task with the Logical OR option, which means that only one constraint must evaluate to True for the task to run. (By default, all constraints connected to a task must evaluate to True for the task to run.) Figure 3 shows what the control flow looked like after I added all the components.
Adding a Data Flow task and File System task to the control flow
Figure 3: Adding a Data Flow task and File System task to the control flow
Notice how the precedence constraints connecting to the File System task are dotted lines. This indicates that the constraints have been configured with the Logical OR option. Also notice that an expression is associated with each of the precedence constraints leading out of the Script task (as indicated by the fx label). Both constraints are configured so that the tasks down the line will run only if the Script task runs successfully and the expression evaluates to True. I defined the following expression on the precedence constraint that connects to the Data Flow task:
@[User::IsEmpty] == false
This means that the IsEmpty variable must be set to False in order for the expression to evaluate to True. The expression defined on the precedence constraint that leads from the Script task to the File System task is as follows:
@[User::IsEmpty] == true
This means, of course, that the IsEmpty variable must be set to True for the expression to evaluate to True. And that about does it for setting up the SSIS package. The only other step I took was to add a breakpoint to the Script task, which I’ll explain in the following section.

Running Your SSIS Package

Before I ran the SSIS package, I added a breakpoint on the OnPostExecute event to the Script task. As a result, when I ran the package, it stopped running as it was about to complete the Script task. Figure 4 shows what the package looks like when it stopped running.
Using a breakpoint to view variable values
Figure 4: Using a breakpoint to view variable values
When the package stopped running, I added a watch (shown in the bottom pane in Figure 4) on each of the two variables I created early on. The watches show the variable values at the time the package reached the breakpoint. Notice that the IsEmpty variable is set to False. Had the PersonData.csv file contained no data, the variable would have been set to True.
Next, I resumed running the package until it executing all applicable tasks. As Figure 5 shows, every control flow task ran. That’s because the IsEmpty variable evaluated to False and the Data Flow task ran and then the File System task ran.
Running the SSIS package when IsEmpty is false
Figure 5: Running the SSIS package when IsEmpty is false.
If the IsEmpty variable had evaluated to True, the Data Flow task would not have run, which is what happened when I added an empty file to the C:\DataFiles folder. This time around, only the Script task and File System task ran, as shown in Figure 6.
Running the SSIS package when IsEmpty is true
Notice that the value of the IsEmpty variable shown in the Watch window is set to True. As a result, the file would have been deleted, but no attempts would have been made to load data into the database.

And It Doesn’t End There

In the example above, the SSIS package performed in two different ways, depending on whether the file contained data. But there is a third scenario: the file was not updated since the last data load. If that happens, the Script task fails and the package stops running, which is what we’d expect given the way the script is written in the Script task. Another thing that the example doesn’t reflect is what would happen if the script threw an exception. Given that I’ve included no exception handling, I would again expect the task to fail. What this points to is that the example I’ve shown you here is only a simple script that contains relatively few elements. A script can be far more complex and take many more actions than what I’ve demonstrated here. However, you should now at least have enough information to get started creating your own scripts and using the Script task to extend your control flow so you can perform the tasks that need to be performed.