Even
with all the new features in the Microsoft SQL Server Business
Intelligence (BI), sometimes the only way to accomplish a task is with
good old fashioned T-SQL code. (Fortunately, “code” is the acronym for
this great magazine!) In this latest installment of The Baker’s Dozen, I’ll present 13 T-SQL programming tips that could help you accomplish different database tasks.
What’s on the Menu?
Getting right to the point, here are the 13 items on the menu for this article:
- A T-SQL example that allocates data in one table based on percentages from another table, where you might learn an unexpected lesson about data types.
- The Baker’s Dozen Spotlight: A T-SQL example that uses the MERGE statement in a data warehousing scenario.
- A T-SQL example that performs the equivalent of a MEDIAN function.
- A T-SQL example that queries SQL Server system tables to retrieve specifics about snapshots.
- A T-SQL example to demonstrate the difference between identity columns and GUID columns.
- A T-SQL example to show different ways to perform queries using partial text searches, and how to use SQL Server tools to evaluate performance.
- The Baker’s Dozen Potpourri: An example of T-SQL Ranking and Aggregation and the dangers of trying to accomplish everything in one query.
- An example of using PIVOT where the spreading element is dynamic.
- Determining a Percentile over a range of values.
- A potential “gotcha” when performing a WHERE with an OUTER JOIN.
- Manually setting an identity column.
- Performing a rollback on a TRUNCATE.
- Creating a Date Dimension.
With
just a few exceptions, the examples use the AdventureWorks2008R2
database. You can find AdventureWorks2008R2 on the CodePlex site. If
you’re still using SQL Server 2008 and not 2008R2, the examples will
still work - you’ll just need to change any 2008R2 references to 2008.
Tip 1: Performing an Allocation
Suppose
you receive budget data at the region level, and you need to allocate
it downward to the market or account level, based on each
market/account’s percentage of share of the region. This scenario occurs
in data warehouse scenarios where a developer must allocate a measure
based on some weighted %. The developer needs to be careful, not only to
implement the allocation method correctly, but also to ensure that the
sum of the allocated numbers equal the original sum of the measure being
allocated.
Listing 1 shows an example, using the AdventureWorks2008R2 database. The example uses the tables Purchasing.PurchaseOrderHeader and Purchasing.PurchaseOrderDetails,
and allocates the freight from the order header table down to the
product line items in the order detail table, for each purchase order.
Stated simply, if order A has $100 in freight, and order A has two line
items (1 and 2) with line items order dollars of $500 and $1,500
respectively, then line item 1 would receive $25 of the freight and line
item 2 would receive $75 of the freight. This is because line item 1
had 25% of the total order dollars and line item 2 had 75% of the line
item dollars.
This certainly seems simple enough - determine each line item allocation ratio (LineItemRatio) by taking the line item dollars (OrderQty * UnitPrice) and dividing by the SubTotal of order dollars in the order header table.
(OrderQty * UnitPrice) / POH.SubTotal as LineItemRatio
After that, you would take the LineItemRatio and
multiply by the Freight, to determine the allocated freight for each
line item. Then you’d sum all the allocated freights (which would sum to
$1,583,807.6975) and compare that to the sum of freight in the order
header table ($1,583,978.2263). But that’s off by roughly $170!
Now,
one might argue that the difference is insignificant - after all, $170
is roughly a hundredth of a percent of the 1.5 million in total freight.
However, it’s possible that accountants might require (with good
reason) that the numbers either match or are within a few pennies. So
are we off by $170 because of simple rounding?
It’s actually a bit more complicated. The culprit here is the money data type. The UnitPrice in
the order detail table is stored as a money data type, which has a
fixed scale of four decimal positions. Therefore, any ratio that we
derive from a money data type will also contain four decimal positions.
This means that the sum for allocated freights (for any one order) will
differ from the original order freight by several cents or possibly a
few dollars. When you aggregate that difference across thousands of
orders, you have the explanation for the difference of $170.
The solution is to cast the result of the numerator (OrderQty * UnitPrice) as a decimal or a floating point data type, or simply multiply the numerator by 1.0 to force a cast, like so:
(OrderQty * UnitPrice) * 1.0 / POH.SubTotal as LineItemRatio Cast( (OrderQty * UnitPrice) as float) / POH.SubTotal as LineItemRatio
Figure 1 shows
a partial result with the allocation ratio going well beyond four
decimals. When we apply this logic, the difference across all orders is
down to 0.000077 of a penny! I challenge anyone to come closer than
that!
Figure 1: Results of allocation.
No comments:
Post a Comment