10/20/2011

Working with Precedence Constraints in SQL Server Integration Services

by

In SSIS, tasks are linked by precedence constraints.  A task will only execute if  the condition that is set by the precedence constraint preceding the task is met. By using these constraints, it is possible to ensure different execution paths depending on the success or failure of other tasks. This means that you can use  tasks with precedence constraints to determine the workflow of an SSIS package. We challenged Rob Sheldon to provide a straightforward  practical example of how to do it.
The control flow in a SQL Server Integration Services (SSIS) package defines the workflow for that package. Not only does the control flow determine the order in which executables (tasks and containers) will run, the control flow also determines under what conditions they’re executed. In other words, certain executables will run only when a set of defined conditions are met.
You configure the workflow by using precedence constraints. Precedence constraints link the individual executables together and determine how the workflow moves from one executable to the next. Figure 1 shows the control flow for the PrecedenceConstraints.dtsx package. The precedence constraints are the green and red arrows (both solid and dotted) that connect the tasks and container to each other. (There can also be blue arrow, as you’ll learn later in the article.)
Figure 1: Control Flow in the PrecedenceConstraints.dtsx SSIS package
As you would expect, the arrows define the direction of the workflow as it moves from one executable to the next. For example, after the first Execute SQL task runs, the precedence constraints direct the workflow to the next Execute SQL task and the Sequence container. One or both of these executables will run, depending on how the precedence constraints have been configured.
When a precedence constraint connects executables, the originating executable (the first to run) is referred to as the precedence executable. Multiple precedence constraints can originate from the precedence executable. In the first Execute SQL task in Figure 1, two precedence constraints originate from that precedence executable.
The task or container that is on the downstream end of the precedence constraint is referred to as the constrained executable. The constrained executable will run only if the conditions defined on the precedence constraint are met. If the conditions are not met, the constrained executable will not run. As a result, by configuring the precedence constraints, you can create complex workflows, while minimizing the need to configure duplicate tasks and containers.
Note that I created the PrecedenceConstraints.dtsx package in SSIS 2005. However, I upgraded the package to SSIS 2008 to ensure that precedence constraints are implemented the same in both versions. You can download the SSIS 2005 version of the PrecedenceConstraints.dtsx package in the speech bubble.
If you want to run the PrecedenceConstraints.dtsx package, you should first run the following Transact-SQL code against the AdventureWorks sample database:
IF EXISTS(
  SELECT table_name FROM information_schema.tables
  WHERE table_name = 'Employees')
DROP TABLE Employees
GO
CREATE TABLE Employees
(
  EmployeeID INT PRIMARY KEY,
  FirstName NVARCHAR(50) NOT NULL,
  LastName NVARCHAR(50) NOT NULL,
  Jobtitle NVARCHAR(50) NOT NULL
)
GO
IF EXISTS(
  SELECT table_name FROM information_schema.tables
  WHERE table_name = 'EmployeeLog')
DROP TABLE EmployeeLog
GO
CREATE TABLE EmployeeLog
(
  LogID INT IDENTITY PRIMARY KEY,
  LogFile VARCHAR(50) NULL,
  LogDateTime DATETIME NOT NULL DEFAULT GETDATE()
)
The code creates the Employees table and EmployeeLog table, both of which are necessary to run the PrecedenceConstraints.dtsx package. The package itself archives employee data, logs information about the package execution, truncates the Employees table if necessary, retrieves data through the HumanResources.vEmployee view, and loads it into the Employees table. The package uses different types of precedence constraints to control the workflow for these various operations. We’ll look at the workflow and precedence constraints in closer detail as we work through the article.

Defining Workflow by Success or Failure

By default, when a precedence constraint connects two executables, the constrained executable will run after the precedence executable successfully runs. However, if the precedence executable fails, that part of the workflow is interrupted, and the constrained executable does not run. You can override this behavior by setting the Value property on the precedence constraint. The property supports three options:
  • Success: The precedence executable must run successfully for the constrained executable to run. This is the default value. The precedence constraint is set to green when the Success option is selected.
  • Failure: The precedence executable must fail for the constrained executable to run. The precedence constraint is set to red when the Failure option is selected.
  • Completion: The constrained executable will run after the precedence executable runs, whether the precedence executable runs successfully or whether it fails. The precedence constraint is set to blue when the Completion option is selected.
If you refer back to Figure 1, you’ll see that the two precedence constraints originate from the Data Flow task, one green and one red. If the Data Flow task runs successfully, the first Send Mail task will run because the green precedence constraint connects to that Send Mail task. Because the Value property of the precedence constraint is set to Success, the precedence constraint will evaluate to true and the workflow will continue along that path.
However, if the Data Flow task fails, the red precedence constraint will evaluate to true because its Value property is set to Failure. As a result, the second Send Mail task will run. This way, you can configure each Send Mail task differently so that you’re sending a unique email based on whether the Data Flow task succeeds or fails.
Note: The two Send Mail tasks and the SMTP connection manager are included here for demonstration purposes only. If you want to tests these tasks, you will need to point the connection manager to an actual SMTP server and configure the tasks appropriately. Otherwise, you should disable the tasks or they will fail when you try to run the package. (Even if they do fail, however, the package will still run and load the data as expected.)

Defining Workflow by Expressions

Although defining workflow by execution outcome (success, failure, or completion) can be useful, the workflow logic is still limited to that outcome. However, you can further refine your workflow by adding expressions to the precedence constraints. Any expression you add must be a valid SSIS expression and must evaluate to true or false.
To add an expression, double-click the precedence constraint to open the Precedence Constraint Editor dialog box, as shown in Figure 2. (The editor shown in this figure is the one for the precedence constraint that connects the first and second Execute SQL tasks.)
Figure 2: Defining an expression in the Precedence Constraint Editor dialog box
When adding an expression to a precedence constraint, the first step you must take is to select one of the following options from the Evaluation operation drop-down list:
  • Constraint: The precedence constraint is evaluated solely on the option selected in the Value property. For example, if you select Constraint as the Evaluation operation option and select Success as the Value option (the default settings for both properties), the precedence constraint will evaluate to true only if the precedence executable runs successfully. When the precedence constraint evaluates to true, the workflow continues and the constrained executable runs. (When the Constraint option is selected, the Expression property is greyed out.)
  • Expression: The precedence constraint is evaluated based on the expression defined in the Expression text box. If the expression evaluates to true, the workflow continues and the constrained executable runs. If the expression evaluates to false, the constrained executable does not run. (When the Expression option is selected, the Value property is greyed out.)
  • Expression and Constraint: The precedence constraint is evaluated based on both the Value property and the expression. Both must evaluate to true for the constrained executable to run. For example, in the PrecedenceConstraints.dtsx package, the first Execute SQL task must run successfully and the expression must evaluate to true for the precedence constraint to evaluate to true and the constrained executable to run.
  • Expression or Constraint: The precedence constraint is evaluated based on either the Value property or the expression. At least one of these properties must evaluate to true for the constrained executable to run.
After you’ve selected an option from the Evaluation operation list (and set the Value property, if appropriate), you’re next step is to define the expression. The expression I’ve used in this case (shown in Figure 2), determines whether the @EmployeeCount property equals 0 (@EmployeeCount == 0).
To better understand how this works, let’s take a quick look at the first Execute SQL task. The task runs the following SELECT statement to retrieve the number of rows in the Employees table:
SELECT COUNT(*) FROM Employees
The task then assigns the statement’s results (a scalar integer value) to the @EmployeeCount variable, which I defined when I set up the SSIS package.
The precedence constraint expression then uses the variable value to determine whether it equals 0. If it does, the expression evaluates to true. If not, it evaluates to false. Because the Value property precedence constraint is also set to Success, the first Execute SQL task must run successfully and the expression must evaluate to true for the precedence constraint as a whole to evaluate to true. If it does, the second Execute SQL task runs.
The precedence constraint that connects the first Execute SQL task to the Sequence container uses similar logic. However, the expression itself is slightly different:
@EmployeeCount > 0
In this case, the @EmployeeCount value must be greater than 0 for the expression to evaluate to true. If it does evaluate to true and the first Execute SQL task runs successfully, the Sequence container and the tasks within it will run.
Notice that the expressions in the two precedence constraints that originate from the first Execute SQL task are mutually exclusive. That is, only one of the two expressions can ever evaluate to true during a specific execution. That does not mean that you cannot have multiple precedence constraints with expressions that all evaluate to true during a single execution, but it does mean that you want to use caution when implementing expression to insure that they reflect exactly the logic you’re trying to implement in your workflow. In this case, I want to ensure that only one precedence constraint evaluates to true during a single execution.
Note: SSIS expressions are an entity unto themselves and unique to SSIS packages and their components. It is beyond the scope of this article to get into the details of expressions, but it is important to get them right. Be sure to refer to the topic “Integration Services Expression Reference” in SQL Server Books Online if you have any questions about SSIS expressions.

Defining Workflow by Logical AND or Logical OR

Two other important configuration options in a precedence constraint are the Logical OR and Logical AND settings. These settings apply only to constrained executables and only if those executables have more than one precedence constraint directed to it. For example, the Data Flow task in the PrecedenceConstraints.dtsx package (shown in Figure 1) has two precedence constraints pointing to it: one from the second Execute SQL task and one from the Sequence container.
If you refer back to Figure 2, you’ll find the following two options at the bottom of the Precedence Constraint Editor dialog box:
  • Logical AND: All precedence constraints that point to the constrained executable must evaluate to true in order for that executable to run. This is the default option. If it is selected, the arrow is solid.
  • Logical OR: Only one precedence constraint that points to the constrained executable must evaluate to true in order for that executable to run. If this option is selected, the arrow is dotted.
In the case of the precedence constraints that point to the Data Flow task, it is the second option that is selected, as indicated by the dotted lines. As a result, the workflow can originate from either the second Execute SQL task or from the Sequence container, and only one of the precedence executables has to run successfully.
The Logical OR and the Logical AND options let you define multiple execution paths, yet share those elements that are common to each path, such as the Data Flow task in the PrecedenceConstraints.dtsx package. The logical AND/OR settings, along with the Value property and the use of expressions, provide you with the options necessary to define intricate workflows while minimizing duplicate efforts.
Despite its simplicity, the PrecedenceConstraints.dtsx package described in this article demonstrates all these elements and should provide you with the foundation you need to use precedence constraints to their fullest. However, you can find additional information about precedence constraints in the topic “Setting Precedence Constraints on Tasks and Containers” in SQL Server Books Online.

No comments:

Post a Comment