10/22/2011

The Baker’s Dozen: 13 Transact SQL Programming Tips - Part I


Even with all the new features in the Microsoft SQL Server Business Intelligence (BI), sometimes the only way to accomplish a task is with good old fashioned T-SQL code. (Fortunately, “code” is the acronym for this great magazine!) In this latest installment of The Baker’s Dozen, I’ll present 13 T-SQL programming tips that could help you accomplish different database tasks.
What’s on the Menu?
Getting right to the point, here are the 13 items on the menu for this article:
  • A T-SQL example that allocates data in one table based on percentages from another table, where you might learn an unexpected lesson about data types.
  • The Baker’s Dozen Spotlight: A T-SQL example that uses the MERGE statement in a data warehousing scenario.
  • A T-SQL example that performs the equivalent of a MEDIAN function.
  • A T-SQL example that queries SQL Server system tables to retrieve specifics about snapshots.
  • A T-SQL example to demonstrate the difference between identity columns and GUID columns.
  • A T-SQL example to show different ways to perform queries using partial text searches, and how to use SQL Server tools to evaluate performance.
  • The Baker’s Dozen Potpourri: An example of T-SQL Ranking and Aggregation and the dangers of trying to accomplish everything in one query.
  • An example of using PIVOT where the spreading element is dynamic.
  • Determining a Percentile over a range of values.
  • A potential “gotcha” when performing a WHERE with an OUTER JOIN.
  • Manually setting an identity column.
  • Performing a rollback on a TRUNCATE.
  • Creating a Date Dimension.
The Demo Database for the Examples
With just a few exceptions, the examples use the AdventureWorks2008R2 database. You can find AdventureWorks2008R2 on the CodePlex site. If you’re still using SQL Server 2008 and not 2008R2, the examples will still work - you’ll just need to change any 2008R2 references to 2008.
Tip 1: Performing an Allocation
Suppose you receive budget data at the region level, and you need to allocate it downward to the market or account level, based on each market/account’s percentage of share of the region. This scenario occurs in data warehouse scenarios where a developer must allocate a measure based on some weighted %. The developer needs to be careful, not only to implement the allocation method correctly, but also to ensure that the sum of the allocated numbers equal the original sum of the measure being allocated.
Listing 1 shows an example, using the AdventureWorks2008R2 database. The example uses the tables Purchasing.PurchaseOrderHeader and Purchasing.PurchaseOrderDetails, and allocates the freight from the order header table down to the product line items in the order detail table, for each purchase order. Stated simply, if order A has $100 in freight, and order A has two line items (1 and 2) with line items order dollars of $500 and $1,500 respectively, then line item 1 would receive $25 of the freight and line item 2 would receive $75 of the freight. This is because line item 1 had 25% of the total order dollars and line item 2 had 75% of the line item dollars.
This certainly seems simple enough - determine each line item allocation ratio (LineItemRatio) by taking the line item dollars (OrderQty * UnitPrice) and dividing by the SubTotal of order dollars in the order header table.
(OrderQty * UnitPrice)    / POH.SubTotal as LineItemRatio 
After that, you would take the LineItemRatio and multiply by the Freight, to determine the allocated freight for each line item. Then you’d sum all the allocated freights (which would sum to $1,583,807.6975) and compare that to the sum of freight in the order header table ($1,583,978.2263). But that’s off by roughly $170!
Now, one might argue that the difference is insignificant - after all, $170 is roughly a hundredth of a percent of the 1.5 million in total freight. However, it’s possible that accountants might require (with good reason) that the numbers either match or are within a few pennies. So are we off by $170 because of simple rounding?
It’s actually a bit more complicated. The culprit here is the money data type. The UnitPrice in the order detail table is stored as a money data type, which has a fixed scale of four decimal positions. Therefore, any ratio that we derive from a money data type will also contain four decimal positions. This means that the sum for allocated freights (for any one order) will differ from the original order freight by several cents or possibly a few dollars. When you aggregate that difference across thousands of orders, you have the explanation for the difference of $170.
The solution is to cast the result of the numerator (OrderQty * UnitPrice) as a decimal or a floating point data type, or simply multiply the numerator by 1.0 to force a cast, like so:
(OrderQty * UnitPrice) * 1.0     / POH.SubTotal as LineItemRatio  Cast( (OrderQty * UnitPrice) as float)     / POH.SubTotal as LineItemRatio 
Figure 1 shows a partial result with the allocation ratio going well beyond four decimals. When we apply this logic, the difference across all orders is down to 0.000077 of a penny! I challenge anyone to come closer than that!
Click for a larger version of this image. 
 

Figure 1: Results of allocation.


Listing 1: T-SQL code to perform an allocation
use AdventureWorks2008R2goselect sum(Freight) as HeaderFreightSum from                      Purchasing.PurchaseOrderHeader
select *, LineItemRatio * Freight as AllocatedFreight from    (select POH.PurchaseOrderID,             cast(POH.OrderDate as DATEas OrderDate,             SubTotal, Freight, LineTotal, POD.ProductID,             (OrderQty * UnitPrice) * 1.0  / POH.SubTotal                                    as LineItemRatio           FROM Purchasing.PurchaseOrderHeader  POH            JOIN Purchasing.PurchaseOrderDetail POD ON                poh.PurchaseOrderID =                 pod.PurchaseOrderID ) TempAliasORDER BY PurchaseOrderID COMPUTE sum( LineItemRatio * Freight)

No comments:

Post a Comment