12 June 2012
Annette continues her popular series for SSIS beginners by showing how a data flow task can be used in a package to move data from a SQL Server database to an Excel file, and insert an additional column into the Excel file that’s based on derived data.
In my previous article, "SSIS Basics: Setting Up Your Initial Package", I showed you how to create an SSIS package and configure connection managers, data sources, and data source views. In this article, I will show you how to use some of those data connections to retrieve data from a SQL Server database and load the data into an Excel file. I will also show you how to add a computed column based on data derived from the data flow. In addition, I will demonstrate how to run the package.
SSIS supports many control flow items that manage a package’s workflow, but the one I think to be the most important and most often used is the Data Flow task. For this reason, I focus on that task in this article. In future articles, I’ll cover other control flow items.
Note:
If you want to try out the examples in this article, you’ll need to create an OLE DB connection manager that points to the AdventureWorks database and a Flat File connection manager that points to an Excel file. You can find details about how to set up these connection managers in my previous article, which is referenced above. I created the examples shown in this article and the last one on a local instance of SQL Server 2008 R2.
If you want to try out the examples in this article, you’ll need to create an OLE DB connection manager that points to the AdventureWorks database and a Flat File connection manager that points to an Excel file. You can find details about how to set up these connection managers in my previous article, which is referenced above. I created the examples shown in this article and the last one on a local instance of SQL Server 2008 R2.
Adding a Data Flow Task
Our goal in creating this package is to move data from a SQL Server database to an Excel file. As part of that goal, we also want to insert an additional column into the Excel file that’s based on derived data.To carry out our goal, we must add a Data Flow task to our control flow. The task lets us retrieve data from our data source, transform that data, and insert it into our destination, the Excel file. The Data Flow task is one of the most important and powerful components in SSIS and as such has it’s own workspace, which is represented by the Data Flow tab in SSIS Designer, as shown in Figure 1.
Configuring the Data Flow
You configure a Data Flow task by adding components to the Data Flow tab. SSIS supports three types of data flow components:- Sources: Where the data comes from
- Transformations: How you can modify the data
- Destinations: Where you want to put the data
To add components to the Data Flow task, you need to open the Toolbox if it’s not already open. To do this, point to the View menu and then click ToolBox, as shown in Figure 4.
Adding an OLE DB Source
The first component we’re going to add to the data flow is a source. Because we’re going to be retrieving data from a SQL Server database, we’ll use an OLE DB source. To add the component, expand the Data Flow Sources category in the Toolbox. Then drag an OLE DB source from to the Data Flow window. Your data flow should now look similar to Figure 7.There are several other features about the OLE DB source noting:
- A database icon is associated with that source type. Other source types will show different icons.
- A reversed red X appears to the right of the name. This indicates that the component has not yet been properly configured.
- Two arrows extend below the component. These are called data paths. In this case, there is one green and one red. The green data path marks the flow of data that has no errors. The red data path redirects rows whose values are truncated or that generate an error. Together these data paths enable the developer to specifically control the flow of data, even if errors are present.
Next, you must select one of the following four options from the Data access mode drop-down list:
- Table or view
- Table name or view name variable
- SQL command
- SQL command from variable
CREATE VIEW uvw_GetEmployeePayRate
AS
SELECT H.EmployeeID ,
RateChangeDate ,
Rate
FROM HumanResources.EmployeePayHistory H
JOIN ( SELECT EmployeeID ,
MAX(RateChangeDate) AS [MaxDate]
FROM HumanResources.EmployeePayHistory
GROUP BY EmployeeID
) xx ON H.EmployeeID = xx.EmployeeID
AND H.RateChangeDate = xx.MaxDate
GO
After you ensure that Table or view is selected in the Data access mode drop-down list, select the uvw_GetEmployeePayRate view from the Name of the table or the view drop-down list. Now go to the Columns page to select the columns that will be returned from the data source. By default, all columns are selected. Figure 9 shows the columns (EmployeeID, RateChangeDate, and Rate) that will be added to the data flow for our package, as they appear on the Columns page. Now click on the Error Output page (shown in Figure 10) to view the actions that the SSIS package will take if it encounters errors.
Now return to the Connection Manager page and click the Preview button to view a sample dataset in the Preview Query Results window, shown in Figure 11. Previewing the data ensures that what is being returned is what you are expecting.
Adding a Derived Column Transformation
The next step in configuring our data flow is to add a transformation component. In this case, we’ll add the Derived Column transformation to create a column that calculates the annual pay increase for each employee record we retrieve through the OLE DB source.To add the component, expand the Data Flow Transformations category in the Toolbox window, and drag the Derived Column transformation (shown in Figure 12) to the Data Flow tab design surface.
- Objects you can use as a starting point. For example you can either select columns from your data flow or select a variable. (We will be working with variables in a future article.)
- Functions and operators you can use in your derived column expression. For example, you can use a mathematical function to calculate data returned from a column or use a date/time function to extract the year from a selected date.
- Workspace where you build one or more derived columns. Each row in the grid contains the details necessary to define a derived column.
To select the column, expand the Columns node, and drag the Rate column to the Expression column of the first row in the derived columns grid, as shown in Figure 15.
You can change the Derived Column Name value by simply typing a new name in the box. For this example, I’ve renamed the column NewPayRate.
For the Derived Column value, you can choose to add a new column to your data flow (which is the default value, <add as new column>) or to replace one of the existing columns in your data flow. In this instance, we’ll add a new column, but there may be times when overwriting a column is required.
The data type is automatically created by the system and can’t be changed at this stage.
Our next step is to refine our expression. Currently, because only the Rate column is included in the expression, the derived column will return the existing values in that column. However, we want to calculate a new pay rate. The first step, then, is to add an operator. To view the list of available operators, expand the list and scroll through them. Some of the operators are for string functions and some for math functions.
To increase the employee’s pay rate by 5%, we’ll use the following calculation:
[Rate] * 1.05
To do this in the Expression box, either type the multiplication operator (*), or drag it from the list of operators to our expression (just after the column name), and then type 1.05, as shown in Figure 17. Once you are happy with the expression, click on OK to complete the process. You will be returned to the Data Flow tab. From here, you can rename the Derived Column transformation to clearly show what it does. Again, there are two data paths to use to link to further transformations or to connect to destinations.
Adding an Excel Destination
Now we need to add a destination to our data flow to enable us to export our results into an Excel spreadsheet.To add the destination, expand the Data Flow Destinations category in the Toolbox, and drag the Excel destination to the SSIS Designer workspace, as shown in Figure 18.
Next, right-click the Excel destination, and click Edit. This launches the Excel Destination Editor dialog box, shown in Figure 20. On the Connection Manager page, under OLE DB connection manager, click on the New button then under Excel File Path click on the Browse button and select the file you created in the previous article and click on OK, then under Name of the Excel Sheet select the appropriate sheet from the file.
Running an SSIS Package in BIDS
Now all we need to do is execute the package and see if it works. To do this, click the Execute button. It’s the green arrow on the toolbar, as shown in Figure 24.The Data Flow tab also shows the number of rows that are processed along each step of the way. That number is displayed next to the data path. For our example package, 290 rows were processed between the Employees source and the PayRate transformation, and 290 rows were processed between the transformation and the Excel destination. Figure 25 shows the data flow after the three components ran successfully. Note that the number of processed rows are also displayed.
Summary
In this article of the "SSIS Basics" series, I’ve shown you how to add the data flow to your SSIS package in order to retrieve data from a SQL Server database and load it into an Excel file. I’ve also shown you how to add a derived column that calculates the data to be inserted into the file. In addition, I’ve demonstrated how to run the package.In future articles, I plan to show you how to deploy the package so you can run it as part of a scheduled job or call in other ways. In addition, I’ll explain how to use variables in your package and pass them between tasks. I also aim to cover more control flow tasks and data flow components, including those that address conditional flow logic and for-each looping logic. There is much much more that can be done using SSIS, and I hope over the course of this series to cover as much information as possible
No comments:
Post a Comment