10/13/2011

VS 2010 with Multi-Monitor Support

Using Multiple Monitors

VS 2008 hosts all documents/files/designers within a single top-level window – which unfortunately means that you can’t partition the IDE across multiple monitors.
VS 2010 addresses this by now allowing editors, designers and tool-windows to be moved outside the top-level window and positioned anywhere you want, and on any monitor on your system. This allows you to significantly improve your use of screen real-estate, and optimize your overall development workflow.
Taking advantage of the multi-monitor feature is really easy to-do. Simply click on a document tab or tool-window and drag it to either a new location within the top-level IDE window – or outside of the IDE to any location on any monitor you want:

You can later drag the document/window back into the main window if you want to re-dock it (or right click and choose the re-dock option).
Visual Studio remembers the last screen position of documents when saved – which means that you can close projects and re-open them and have the layout automatically startup where you last saved it.

Some Multi-Monitor Scenarios

Below are some screen-shots of a few of the scenarios multi-monitor enables (obviously there are many more I’m not covering). Pretend each window in the screenshots below is on a different monitor to get the full idea…
Code source file support:
Demonstrates how code files can be split up across multiple monitors. Below I’ve kept a .aspx file in the main IDE window and then moved a code-behind file and a separate class file to a separate screen:

Tool window support:
Demonstrates how any tool window/pane within VS10 can be split across multiple monitors. Below I’ve moved the test runner tool windows to a separate screen:

Designer support:
Demonstrates how a designer within VS can be split across multiple monitors. Below I’ve moved the WPF/Silverlight WYSWIYG designer and the property grid to a separate screen (the code behind file is still in the main window). Note how the VS10 property grid now supports inline color editors, databinding, styles, brushes, and a whole bunch more for WPF and Silverlight applications (I’ll cover this in later blog posts):

Summary

If you work on a system that has multiple monitors connected to it, I think you are going to find the new multi-monitor support within VS10 a big productivity boost.
If you don’t already have multiple monitors connected to your computer, this might be a good excuse to get some… :-)
Hope this helps,

10/12/2011

Implementing Checkpoints in an SSIS Package

Ref: Robert Sheldon

Like any other application, an SSIS package can fail. Once the problem has been resolved, you'll usually restart the package, but if you implement SSIS checkpoints, then you have another option: you can restart from the point of failure.

A SQL Server Integration Services (SSIS) package can fail for many reasons . A data source might be offline, a server might go down, the network might be unavailable, or countless other problems might arise. Whatever the reason, the end result is the same—your SSIS package stops running.
In some cases, this might not be too big a deal; you can simply restart the package after the problem has been resolved and everything is back to normal. However, if your package is loading terabytes of data or performing numerous other operations, a failure can represent a significant hit on your schedule and resources.
One way to help protect against such a happening is to implement checkpoints in your SSIS package. Checkpoints let you restart a package from the point of failure. For instance, if your package fails after the retrieved data has been staged, you can rerun the package starting at the point of failure without having to retrieve and stage the data again. In other words, the package starts right where it left off. By using checkpoints, you can avoid having to repeat tasks that involve large files or reload or re-aggregate large amounts of data.

How Checkpoints Work

When you configure your SSIS package to use checkpoints, SSIS maintains a record of the control flow executables that have successfully run. In addition, SSIS records the current values of user-defined variables.
SSIS stores the checkpoint information in an XML file whose name and location you specify when you configure your package to use checkpoints. When the package runs, SSIS takes the following steps to implement checkpoints:

Creates the XML file.
Records the current values of user-defined variables.
Records each executable that successfully runs.
Takes one of the following two actions:
- Saves the file if the package stops running on an executable configured to use checkpoints.
- Deletes the file if the entire package runs successfully.

If you configure a package to use checkpoints and that package fails during execution, SSIS will reference the checkpoint file if you try to rerun the package. SSIS will first retrieve the current variable values as they existed prior to package failure and, based on the last successful executable to run, start running the package where it left off, that is, from the point of failure onward, assuming you’ve addressed the issue that caused the failure in the first place.

Configuring Checkpoints

To implement checkpoints in your package, you must configure several properties at the package level:

CheckpointFileName: Specifies the full path and filename of your checkpoint file.
CheckpointUsage: Specifies when to use checkpoints. The property supports the following three options:
- Never: A checkpoint file is not used.
- IfExists: A checkpoint file is used if one exists. This option is the one most commonly used if enabling checkpoints on a file.
- Always: A checkpoint file must always be used. If a file doesn’t exist, the package fails.
SaveCheckpoints: Specifies whether the package saves checkpoints. Set to True to enable checkpoints on the package.

After you’ve configure these three package properties, you must set the FailPackageOnFailure property to True on each container or task that should participate in the checkpoint process. An executable that is enabled to use checkpoints serves as a starting point should the package fail and you want to rerun it. For example, suppose you set the FailPackageOnFailure property to True on a Data Flow task. If the package fails on that task and you restart the package, the package will start there. However, if the package fails at a task for which the FailPackageOnFailure property is set to False, the XML file is not saved and checkpoints are not used when you restart the package.
You should consider carefully which executables to include in the checkpoint process. For example, a Foreach Loop container can make using checkpoints a challenge. If you configure the container itself to use checkpoints, SSIS has no way of knowing which tasks inside the container have already run at the time of failure. And if you configure the tasks within the container to use checkpoints, variable values are not always recorded properly in the XML file at the time of failure. For that reason, many SSIS developers recommend against using checkpoints for loop iterations.
You must also be careful using checkpoints when running parallel operations in which all operations must succeed or fail as a logical unit, such as when you have parallel data flows that must succeed or fail as a single operation. You could, of course, put everything within a Sequence container and configure the container to use checkpoints, but if the container then represents the bulk of the control flow, you’ve gained little advantage because most of the work will have to be repeated anyway. Only if there are subsequent tasks, and you configure checkpoints on those tasks, do you have anything to gain.
Ideally, if you plan to implement checkpoints in your package, you should take them into account when you design your package. For example, you might want to separate the part of the data flow that stages data from that which loads data so you can use checkpoints more effectively. Let’s look at a simple example to demonstrate how this works.

Checkpoints in Action

To demonstrate how to use checkpoints, I created a basic SSIS package in SQL Server Business Intelligence Development Studio (BIDS). Then I added the following two components:

OLE DB connection manager. The connection manager establishes a connection with the AdventureWorks2008R2 sample database on a local instance of SQL Server 2008 R2.
User-defined variable. I created an int variable named number and set the initial value to 1.

After I configured these two components, I added several tasks to the control flow, which are shown in Figure 1. The control flow, in theory, creates the People table, retrieves person data, sets the variable value, loads the person data, and then deletes the table. In reality, the package doesn’t really retrieve or load data, but it contains enough tasks for the package to run successfully and be able to demonstrate how checkpoints work.

Figure 1: Setting up the control flow in an SSIS package

The first task I added is an Execute SQL task. The task uses the OLE DB connection manager to connect to the AdventureWorks2008R2 database. I added the following Transact-SQL code to the task in order to create the People table:

IF OBJECT_ID('dbo.People') IS NOT NULL
DROP TABLE dbo.People
GO
CREATE TABLE dbo.People
(
   PersonID INT PRIMARY KEY,
   FirstName VARCHAR(50) NOT NULL,
   LastName VARCHAR(50) NOT NULL
)

The next task in the control flow is a Data Flow task. Although I labeled the task to indicate that it retrieves person data, the task does nothing at all. If this were an actual package, the data flow would load the person data into a staging table or raw files. However, because checkpoints work only at the executable level, it doesn’t matter whether we configure the data flow, just as long as we can control whether the task succeeds or fails when the package runs. More on that later.
Following the Data Flow task, I added a Script task. The task does nothing but modify the value of the number user-defined variable. When I configured the task’s properties, I added number to the Script Task editor as a read/write variable. Then, to the Main method of the script itself, I inserted the following C# line of code:

Dts.Variables["Number"].Value = 2;

As you can see, the code simply sets the value of the number variable to 2. Although this serves no functional purpose in the current package, it does let me verify whether the current variable value is being properly recorded in the checkpoint file.
After the Script task, I added another Data Flow task that supposedly loads the person data into the People table. Again, this is just a dummy task that takes no action, but it is an executable that can either succeed or fail when the package runs.
The final component I added to the control flow is another Execute SQL task. The task uses the following Transact-SQL code to drop the People table:

IF OBJECT_ID('dbo.People') IS NOT NULL

DROP TABLE dbo.People

That’s all there is to the package. It’s simple, does very little, but provides everything we need to test checkpoints. If you were to run the package now, the package should run with no problem and each component should turn to green. So now let’s set up the package to use checkpoints. The first step then is to configure the three package properties described earlier:

CheckpointFileName: I specified a path and filename for my checkpoint file.
CheckpointUsage: I selected the IfExists value.
SaveCheckpoints: I selected the True value.

Next, I set the FailPackageOnFailure property to True for the last three components in the control flow (the Script task, second Data Flow task, and second Execute SQL task). Now, if the package fails on any of these three components, the checkpoint file will be saved and will include a list of all the executables that had run successfully up to the point of failure.
Because I did not configure the first Data Flow task to use checkpoints, checkpoints will not be used if that task fails, and you have to run the package from the beginning. If your data retrieval and staging process includes a significant amount of data, you might consider breaking that component into separate data flows and then implementing checkpoints on the individual components, but you should do this only if you can separate the data into discrete units that do not have to succeed or fail as a whole.
If you were to run the package as we currently have it configured, all the tasks should still successfully execute. So we need to make the package fail on one of the three tasks configured to use checkpoints. The easiest way to do this is to set the ForceExecutionResults property value to Failure on the task we want to fail. (This is a great trick to remember whenever you’re testing your SSIS packages.)
In this case, I configure the property on the second Data Flow task so it will fail. Now when I run the package, the package will stop at the second Data Flow task, as shown in Figure 2. Notice that the first three tasks ran successfully, as I would have expected, and the Data Flow task has failed.

Figure 2: Running an SSIS package configured to fail

Because the package failed at a task configured to use checkpoints, the XML checkpoint file is retained in the designated folder. If you view the file’s content, you should see a set of XML elements and attributes similar to those shown in Figure 3.

Figure 3: Viewing the XML checkpoint file after package failure

Notice first that the file contains a <DTS:Variables> element and embedded within that is a single <DTS:Variable> element. The <DTS:Variable> element describes the number user-defined variable I added to the package when I first created it. One of the elements within the <DTS:Variable> element is <DTS:VariableValue>, and as you can see, the value has been set to 2, which was the current value when the package failed (after the Script task had sent the value to 2). This confirms that SSIS has properly maintained the variable value.
Next in the XML file, you’ll find three <DTS:Container> elements. Each element corresponds to one of the tasks that ran successfully. (The GUID identifies the executable associated with the XML element.)
Because our SSIS package has been configured to use checkpoints, SSIS will reference the XML file if you rerun the package. SSIS will see that the three tasks ran successfully and start at the next task in the control flow.
However, before rerunning the package, you must address the problem that caused the package failure. In this case, that means reconfiguring the ForceExecutionResults property on the second Data Flow task by setting the property value to None, the default value. Now when you rerun the package, it will use the checkpoint file and start where it left off, as shown in Figure 4.
Description: Macintosh HD:Users:bob:Documents:DataFiles:Writing:Freelance:Current:SimpleTalk:Active:Checkpoints:ST_checkpoints_fig04.jpg

Description: Macintosh HD:Users:bob:Documents:DataFiles:Writing:Freelance:Current:SimpleTalk:Active:Checkpoints:ST_checkpoints_fig04.jpg

Figure 4: Rerunning an SSIS package that uses checkpoints

As you can see, the first three control flow components do not run because the checkpoint file shows that they have already run. So the package starts running at the second Data Flow task, the task that had originally failed.
Once the package completes running successfully, SSIS deletes the XML checkpoint file. If you were to rerun the package again, SSIS would start at the beginning.

Implementing Checkpoints

Clearly, checkpoints can be a handy tool if you want to avoid having to rerun components that manage large amounts of data. But you have to be careful not to implement them in ways that result in data not being managed correctly. For example, you should be wary implementing checkpoints on a Foreach Loop container or if you want to process multiple components as a unit. And you shouldn’t implement checkpoints unnecessarily. For instance, in our example above, implementing checkpoints on the first Execute SQL task would have provided little benefit. That said, checkpoints can be quite useful when working with large datasets. So consider using them when necessary, but use them wisely.

Using Generic handler to create an appointment

I have some articles regarding using generic handler to generate files or something else for downloading. In this topic I used it to create an appointment.

Sample code:

10/10/2011

Debugging Data Flow in SQL Server Integration Services

Ref: Robert Sheldon

SQL Server Business Intelligence Development Studio (BIDS) provides several tools you can use to troubleshoot the data flow of a SQL Server Integration Services (SSIS) package. The tools let you sample a subset of data, capture data flow row counts, view data as it passes through data paths, redirect data that generates errors, and monitor package execution. You can use these tools for any package that contains a data flow, regardless of the data’s source or destination or what transformations are being performed.
The better you understand the debugging tools, the more efficiently you can troubleshoot your data flow. In this article, I demonstrate how each debugging tool works. To do so, I set up a test environment that includes a comma-separated text file, a table in a SQL Server database, and an SSIS package that retrieves data from the text file and inserts it into the table. The text file contains data from the Person.Person table in the AdventureWorks2008R2 database. To populate the file, I ran the following bcp command:

bcp "SELECT TOP 10000 BusinessEntityID, FirstName, LastName FROM AdventureWorks2008R2.Person.Person ORDER BY BusinessEntityID" queryout C:\DataFiles\PersonData.txt -c -t, -S localhost\SqlSrv2008R2 –T

After I created the file, I manipulated the first row of data in the file by extending the LastName value in the first row to a string greater than 50 characters. As you’ll see later in the article, I did this in order to introduce an error into the data flow so I can demonstrate how to handle such errors.
Next I used the following Transact-SQL script to create the PersonName table in the AdentureWorks2008R2 database:

USE AdventureWorks2008R2
GO
IF OBJECT_ID('dbo.PersonName') IS NOT NULLDROP TABLE dbo.PersonName
GO
CREATE TABLE dbo.PersonName(
NameID INT PRIMARY KEY,
FullName NVARCHAR(110) NOT NULL
)

After I set up the source and target, I created an SSIS package. Initially, I configured the package with the following components:

A connection manager to the AdventureWorks2008R2 database.
A connection manager to the text file with the source data.
An Execute SQL task that truncates the PersonName table.
A Data Flow task that retrieves data from the text file, creates a derived column, and inserts the data into the PersonName table.

Figure 1 shows the data flow components I added to the package, including those components related to troubleshooting the data flow.

NOTE: You can download the SSIS package from the speech bubble at the top of the article.

Figure 1: Setting up the data flow in the sample SSIS package

The data flow components specific to processing the Person data are the OLE DB Source, Derived Column, and OLE DB Destination components. The Derived Column transformation concatenates the first and last names into a single column named FullName. The other components in the data flow are specific to debugging and are discussed in detail in the rest of the article.

Working with a Data Sample

When you’re developing an SSIS package that retrieves large quantities of data, it can be helpful to work with only a subset of data until you’ve resolved any issues in the data flow. SSIS provides two data flow components that let you work with a randomly selected subset of data. The Row Sampling Transformation component lets you specify the number of rows you want to include in your random data sample, and the Percentage Sampling Transformation component lets you specify the percentage of rows.
Both components support two data outputs: one for the sampled data and one for the unsampled data. Each component also lets you specify a seed value so that the samples are the same each time you run the package. (The seed value is tied to the operating system’s tick count.) When you don’t specify a seed value, the data sample is different each time you run the data flow.
If you refer back to Figure 1, you’ll see that I added a Row Sampling Transformation component right after the Flat File Source component. Figure 2 shows the Row Sampling Transformation Editor. Notice that I configured the component to retrieve 1000 rows of sample data, but I did not specify a seed value.

Figure 2: Selecting a data sample from the data flow

If you want, you can name the outputs for the sample and non-sample data. In this case, I’ve left the default names and used the Sampling Selected Output data path to connect to the next component in the data flow. Now the data flow will include only the random 1000 rows.

Verifying Row Counts

When data passes through a data flow, the SSIS design surface displays the number of rows passing along each data path. The count changes as data moves through the pipeline. After the package has finished executing, the number displayed is the total number of rows that passed through the data path in the last buffer. If there were multiple buffers, the final number would not provide an accurate count.
However, you can add a Row Count Transformation component to the data flow. The transformation provides a final count that adds together the rows from all buffers and stores the final count in a variable. This can be useful when you want to ensure that a particular point in the data flow contains the number of rows you would expect. You can then compare that number to the number of rows in your source or destination.
To retrieve the row count from the variable, you can use whatever method you like. For instance, you can create an event handler that captures the variable value and saves it to a table in a SQL Server database. How you retrieve that value is up to you. The trick is to use the Row Count Transformation component to capture the total rows and save them to the variable.
In my sample SSIS package, I created a string variable named RowCount, then, after the Derived Column component, I added a Row Count Transformation component. Figure 3 shows the component’s editor. The only step I needed to take to configure the editor was to add the variable name to the VariableName property.

Figure 3: Verifying the row counts of data passing along a data path

When the package runs, the final count from that part of the data flow will be saved to the RowCount variable. I verified the RowCount value by adding a watch to the control flow, but in an actual development environment, you’d probably want to retrieve the value through a mechanism such as an event viewer, as mentioned above, so you have a record you can maintain as long as necessary.

Adding Data Viewers to the Data Path

When troubleshooting data flow, it can be useful to view the actual data as it passes through a data path. You can do this by adding one or more data viewers to your data flow. SSIS supports several types of data viewers. The one most commonly used is the grid data viewer, which displays the data in tabular format. However, you can also create data viewers that display histograms, scatter plot charts, or column charts. These types of data viewers tend to be useful for more analytical types of data review, but for basic troubleshooting, the grid data viewer is often the best place to start.
To create a grid data viewer, open the editor for the data path on which you want to view the data, then go to the Data Viewers page, as shown in Figure 4.

Figure 4: Editing the properties of the data flow path

The Data Flow Path editor is where you add your data viewers, regardless of the type. To add a data viewer, click the Add button to launch the Configure Data Viewer dialog box, shown in Figure 5. Here you select the type of viewer you want to create and provide a name for that viewer.

Figure 5: Creating a grid data viewer on a data path

After you select the Grid option from the Type list and provide a name, go to the Grid tab, shown in Figure 6. This is where you determine what columns you want to include in the grid. At this point, we’re interested only the BusinessEntityID and FullName columns because those are the columns in our target table.

Figure 6: Configuring a grid data viewer

After you specify the columns to include in the grid, click OK. You’ll be returned to the Data Flow Path Editor. The new grid data viewer should now be displayed in the Data Viewers list. In addition, a small icon is added next to the data path (shown in Figure 1).
When you debug a package in which a data viewer has been defined, the package will stop running at the viewer’s data path and a window will appear and display the data in that part of the data flow. Figure 7 shows the grid data viewer I configured on my data flow.

Figure 7: Viewing sample data through a grid data viewer

Notice that the data viewer displays the BusinessEntityID and FullName values for each row. You can scroll down the list, detach the viewer from the data flow, resume the data flow, or copy the data to the clipboard. The data itself and the ultimate outcome of the package are unaffected.

Configuring Error-Handling on the Components

Many data flow components let you specify how to handle data that might generate an error. By default, if data causes an error, the component fails; however, you can configure some components to redirect problem rows. For instance, if you refer back to Figure 1, you’ll see that the Flat File Source has an additional data path output, which is red. You can use the red data path to capture any bad rows outputted by the component, when the component is properly configured.
I connected the red data path to a Flat File Destination component so I can store rows the generate errors to a text file. When you connect an error output to another component, the Configure Error Output dialog box appears, as shown in Figure 8. Notice that for each column, you can configure what action to take for either errors or truncations. An error might be something like corrupt data or an incorrect data type. A truncation occurs if a value is too long for the configured type. By default, each column is configured to fail the component whether there is an error or truncation.

Figure 8: Configuring a data flow component to redirect rows

You can override the default behavior by specifying that the row be redirected. In this case, I chose to redirect all columns whether there was an error or truncation. To do so, I changed the Error and Truncation options for each row and column to Redirect row. Next, I configured the Flat File Destination component with a new data source that points to a text file that will be used to capture the outputted rows, if there are any errors or truncations. As you’ll recall from earlier in the article, I modified the last name in the first row of the source file by making the last name too long. As a result, I would expect the first row to fail and be redirected to the new error file.
When you configure the destination component and connection manager, you’ll notice that one column is created for the outsourced row, one column for the numeric error code, and one column for the identifier of the source column that generates the error. When a row is redirected to the error output, it is saved to the error file, along with the error number and column identifier. The values in the redirected row are separated by commas, but treated as one value.

Monitoring Package Execution

The final tools for troubleshooting the data flow are related to the package execution and SSIS design surface. When a package is running, you can watch the data flow to see what is happening with each component. Row counts are displayed next to the data paths and the components change colors as they’re being executed. By observing these colors, you can observe the state of execution:

White. Component has not yet been executed.
Yellow. Component is currently extracting, transforming, or loading data.
Green. Component has completed its operation.
Red. Component generated errors and package execution stopped.

Of course, if a component turns red, you have a problem. But sometimes a component will turn yellow and hang there. In which case, you still have a problem. However, if everything is running fine, the components will first turn yellow and then green, as shown in Figure 9.

Figure 9: Viewing the data flow progress on the design surface

Notice that the number of rows that passed through the data paths during the last buffer show up on the design surface. As you can see, one row has been redirected to the error file. Also, there are 9,999 rows in the data path that leads to the Row Sampling transformation, but only 1,000 rows after the transformation.
If an execution is not successful (red or hanging yellow), you should refer to the Progress tab for information about the package execution. There you can find details about each component and the data that is flowing through those components. Figure 10 shows the Progress tab after I finished running my package.

Figure 10: Viewing the Progress tab during package execution

Notice that the Progress tab shows details about the Data Flow task and its data pipeline. The details shown here are only part of the displayed information. You need to scroll down to view the rest. However, as you can see, there are several warning messages, along with all the information messages. In this case, the warning messages indicate that the unsampled data is not being used, as we already knew. But some warnings can be useful information to have. In addition, the Progress tab also displays error messages, along with all the other events that are fired during execution.

The Data Flow Debugging Tools

You might not need to use all the tools that SSIS provides for debugging your data flow, but whatever tools you do implement can prove quite useful when trying to troubleshoot an issue. By working with data samples, monitoring row counts, using data viewers, configuring error-handling, and monitoring package execution, you should be able to pinpoint where any problems might exist in your data flow. From there, you can take the steps necessary to address those problems. Without the SSIS troubleshooting tools, locating the source of the problem can take an inordinate amount of time. The effort you put in now to learn how to use these tools and take advantage of their functionality can pay off big every time you run an SSIS package.