8/06/2011

JOIN statement tips

I've read The SQL Server Query Optimizer from simple-talk.com. I collected two important notes for writing query.

How the Query Optimizer Works

The Query Processing Process 
- Parsing and binding – the query is parsed and bound. Assuming the query is valid, the output of this phase is a logical tree, with each node in the tree representing a logical operation that the query must perform, such as reading a particular table, or performing an inner join. This logical tree is then used to run the query optimization process, which roughly consists of the following two steps;

- Generate possible execution plans – using the logical tree, the Query Optimizer devises a number of possible ways to execute the query i.e. a number of possible execution plans. An execution plan is, in essence, a set of physical operations (an index seek, a nested loop join, and so on), that can be performed to produce the required result, as described by the logical tree;

- Cost-assessment of each plan – While the Query Optimizer does not generate every possible execution plan, it assesses the resource and time cost of each plan it does generate. The plan that the Query Optimizer deems to have the lowest cost of those it’s assessed is selected, and passed along to the Execution Engine;

- Query execution, plan caching – the query is executed by the Execution Engine, according to the selected plan. The plan may be stored in memory, in the plan cache. 

Best practice of JOIN statement

We have the left-deep tree could be:
JOIN( JOIN( JOIN(A, B), C), D)
And the bushy tree could be:
JOIN(JOIN(A, B), JOIN(C, D))

- Left-deep trees are also called linear trees or linear processing trees, and you can see how their shapes lead to that description. Bushy trees, on the other hand, can take any arbitrary shape, and so the set of bushy trees actually includes the sets of both left-deep and right-deep trees.
Left-deep and bushy trees

- Table below shows how the number of possible join orders increases as we increase the number of tables, for both left-deep and bushy trees, and I’ll explain how it’s calculated in a moment.
Tables Left-Deep Trees Bushy Trees
1 1 1
2 2 2
3 6 12
4 24 120
5 120 1,680
6 720 30,240
7 5040 665,280
8 40,320 17,297,280
9 362,880 518,918,400
10 3,628,800 17,643,225,600
11 39,916,800 670,442,572,800
12 479,001,600 28,158,588,057,600

- The number of left-deep trees is calculated as n!, or n factorial, where n is the number of tables in the relation. A factorial is the product of all positive integers less than or equal to n; so, for example, for a 5-table join, the number of possible join orders is 5! = 5 x 4 x 3 x 2 x 1 = 120.

- The number of possible join orders for a bushy tree is more complicated, and can be calculated as:
(2n-2)!/(n-1)!

- The important point to remember here is that the number of possible join orders grows very quickly as the number of tables increase, as highlighted by Table 2. For example, in theory, if we had a 6-table join, a query optimizer would potentially need to evaluate 30,240 possible join orders. Of course, we should also bear in mind that this is just the number of permutations for the join order. On top of this, the Query Optimizer also has to evaluate a number of possible physical join operators, data access methods (e.g. Table Scan, Index Scan or Index Seek), as well as optimize other parts of the query, such as aggregations, subqueries and so on.

- So how does the Query Optimizer analyze all these possible plan combinations? The answer is: it doesn’t. Performing an exhaustive evaluation of all possible combinations, for a complex query, would take too long to be useful, so the Query Optimizer must find a balance between the optimization time and the quality of the resulting plan. Rather than exhaustively evaluate every single combination, the Query Optimizer tries to narrow the possibilities down to the most likely candidates, using heuristics (some of which we’ve already touched upon) to guide the process.

Design Pattern - Chain of Responsibility

Definition
Avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request. Chain the receiving objects and pass the request along the chain until an object handles it.

UML Class Diagram

Chain of Responsibility: when and where use it
- An important goal of object-oriented design is to keep objects loosely coupled by limiting their dependencies and keeping the relationships between objects specific and minimal.  Loosely coupled objects have the advantage that they are easier to maintain and easier to change compared to systems where there is tight coupling between objects (i.e. hard-coded references to specific classes).  

- The Chain of Responsibility design pattern offers an opportunity to build a collection of loosely coupled objects by relieving a client from having to know which objects in a collection can satisfy a request by arranging these objects in a chain.  This pattern requires a way to order the search for an object that can handle the request. This search is usually modeled according to the specific needs of the application domain.  Note that Chain-of-Responsibility is not commonly used in business application development. 

Sample
- Object class

- Handler interface

- Concrete Handler




- Invoke class

Ref: Gang of Four tutorial