7/22/2011

The EXCEPT and INTERSECT Operators in SQL Server

Ref:

The UNION,  EXCEPT and INTERSECT operators of SQL enable you to combine more than one SELECT statement  to form a single result set. The UNION operator returns all rows. The INTERSECT operator returns all rows that are in both result sets. The EXCEPT operator returns the rows that are only in the first result set but not in the second. Simple? Rob Sheldon explains all, with plenty of examples.
Quite often, you’re faced with the task of comparing two or more tables, or query results, to determine which information is the same and which isn't. One of the most common approaches to doing such a comparison is to use the UNION or UNION ALL operator to combine the relevant columns of the results that you want to compare. As long as you adhere to the restrictions placed on either of those operators, you can combine data sets whether they come from different databases or even different servers. With the UNION operator, you end up with a result containing every distinct row from the two results combined. However, it becomes more difficult to use UNION to return only the common data that is held in both results, or the different data that exists in one table but not the other(s). To get the results you need, you must use UNION ALL with a GROUP BY clause, though the logic isn’t always obvious on how to do this. And it isn’t any easier to use a JOIN operator to get the result you want. .
Enter the INTERSECT and EXCEPT operators. Beginning with SQL Server 2005, you can use these operators to combine queries and get the results you need. For instance, you can use the INTERSECT operator to return only values that match within both data sets, as shown in the following illustration .
The illustration shows how the INTERSECT operator returns data that is common to both results set; the common data is represented by the area where the two circles intersect. The illustration also shows how the EXCEPT operator works; only data that exists in one of the data sets outside the intersecting area is returned. For instance, if Set A is specified to the left of the EXCEPT operator, only values that are not in Set B are returned. In the illustration above, that would be the data in the left circle, outside the section where the two data sets intersect. The following bullets sum up which operator to use to return different combinations of data:
  • To return the data in Set A that doesn’t overlap with B, use A EXCEPT B.
  • To return only the data that overlaps in the two sets, use A INTERSECT B.
  • To return the data in Set B that doesn’t overlap with A, use B EXCEPT A.
  • To return the data in all three areas without duplicates, use A UNION B.
  • To return the data in all three areas, including duplicates, use A UNION ALL B.
  • To return the data in the non-overlapping areas of both sets, use (A UNION B) except (A INTERSECT B), or perhaps (A EXCEPT B) UNION (B EXCEPT A)
The differences between the INTERSECT and EXCEPT operators and how to use each of them will become clearer as we work through the examples in the article. Just to give you a basic idea of how they work, we’ll start with a rather unrealistic example. To demonstrate those, however, we must first create two test views (using SQL Server 2005–compatible syntax). The first view contains a single column that describes what you might have had for lunch:
CREATE VIEW Lunch
AS
  SELECT 'Beer' AS item
  UNION SELECT 'Olives'
  UNION SELECT 'Bread'
  UNION SELECT 'Salami'
  UNION SELECT 'Calamari'
  UNION SELECT 'Coffee';
GO
The second view also contains a single column and describes what you might have had for dinner:
CREATE VIEW Dinner
AS
  SELECT 'Wine' AS item
  UNION SELECT 'Olives'
  UNION SELECT 'Bread'
  UNION SELECT 'Steak'
  UNION SELECT 'Aubergines'
  UNION SELECT 'Salad'
  UNION SELECT 'Coffee'
  UNION SELECT 'Apple';
GO
Now we can use these two views to demonstrate how to use the UNION, INTERSECT, and EXCEPT operators. I’ve also included a couple examples that use the JOIN operator to demonstrate the differences.
The first example uses the UNION operator to join the two views in order to return everything you’ve eaten today:
SELECT item FROM Lunch
UNION
SELECT item FROM Dinner;
Now we return the same data by using a full outer join:
SELECT DISTINCT COALESCE(Lunch.item, Dinner.item) AS item
FROM Lunch
  FULL OUTER JOIN Dinner
  ON Dinner.item = Lunch.item
Notice that the join requires more complex syntax; however, both statements return the same results, as shown in the following table:
item
Apple
Aubergines
Beer
Bread
Calamari
Coffee
Olives
Salad
Salami
Steak
Wine

Now let’s look at how you would return only the food you ate (or drank) for lunch, but did not have for dinner:
SELECT item FROM Lunch
EXCEPT
SELECT item FROM Dinner;
In this case, I used the EXCEPT operator to return the lunch-only items. I could have achieved the same results using the following left outer join:
SELECT Lunch.item
FROM Lunch
  LEFT OUTER JOIN Dinner
  ON Dinner.item = Lunch.item
WHERE dinner.item IS NULL;
Once again, you can see that the join is more complex, though the results are the same, as shown in the following table:
Item
Beer
Calamari
Salami

If you wanted to return those items you had for dinner but not lunch, you can again use the EXCEPT operator, but you must reverse the order of the queries, as shown in the following example:
SELECT item FROM Dinner
EXCEPT
SELECT item FROM Lunch;
Notice that I first retrieve the data from the Dinner view. To use the left outer join, you would again have to reverse the order of the tables:
SELECT dinner.item
FROM dinner
  LEFT OUTER JOIN Lunch
  ON Dinner.item = Lunch.item
WHERE Lunch.item IS NULL;
As expected, the results are the same for both SELECT statements:
item
Apple
Aubergines
Salad
Steak
Wine

In the next example, I use the INTERSECT operator to return only the food that was eaten at both meals:
SELECT item FROM Dinner
INTERSECT
SELECT item FROM Lunch;
As you can see, I simply connect the two queries with the INTERSECT operator, as I did with the EXCEPT operator. You can achieve the same results by using an inner join:
SELECT Dinner.item
FROM Dinner
  INNER JOIN Lunch
  ON Dinner.item = Lunch.item;
As the following results show, the only items you had at both meals were olives, bread, and coffee:
item
Bread
Coffee
Olives

Now let’s look at how you would return a list of food that you ate at one of the meals, but not both meals, in other words, the food you ate other than bread, olives, and coffee. In the following statement, I use a UNION operator to join two SELECT statements:
SELECT item
FROM
  (
    SELECT item FROM Lunch
    EXCEPT SELECT item FROM Dinner
  ) Only_Lunch
UNION
SELECT item
FROM
  (
    SELECT item FROM Dinner
    EXCEPT SELECT item FROM Lunch
  ) Only_Dinner; --Items you only ate once in the day.
Notice that first statement retrieves only the food you ate for lunch, and the second statement retrieves only the food ate for dinner. I achieve this in the same way I did in previous examples—by using the EXCEPT operator. I then used the UNION operator to join the two result sets. You can achieve the same results by using a full outer join:
SELECT COALESCE(Dinner.item, Lunch.item) AS item
FROM Dinner 
  FULL OUTER JOIN Lunch
  ON Dinner.item = Lunch.item
WHERE Dinner.item IS NULL OR Lunch.item IS NULL;
In both examples, the statements return the following results:
item
Apple
Aubergines
Beer
Calamari
Salad
Salami
Steak
Wine
From this point on, I developed the examples on a local instance of SQL Server 2008 and the AdventureWorks2008 database. Each example uses either the INTERSECT or EXCEPT operator to compare data between the Employee and JobCandidate tables, both part of the HumanResources schema. The comparison is based on the BusinessEntityID column in each table. The BusinessEntityID column in the Employee table is the primary key. In the JobCandidate table, the BusinessEntityID column is a foreign key that references the BusinessEntityID column in the Employee table. The column in the JobCandidate table also permits null values.
NOTE:
You can run these examples against the AdventureWorks database on an instance of SQL Server 2005. However, you must first change the BusinessEntityID column name to EmployeeID, and you must change the JobTitle column name to Title.

Working with the INTERSECT Operator

As I mentioned above, when you use the INTERSECT operator to combine queries, the data returned is only that which matches between the two data sets. That means, with regard to the Employee and JobCandidate tables, the result set will include only rows in which the BusinessEntityID value in the Employee table matches the BusinessEntityID value in the JobCandidate table.
In the following example, I create two queries that retrieve data from the Employee and JobCandidate tables and use the INTERSECT operator to combine those queries:
SELECT BusinessEntityID
FROM HumanResources.Employee
INTERSECT
SELECT BusinessEntityID
FROM HumanResources.JobCandidate;
The first SELECT statement, as you can see, retrieves the BusinessEntityID column from the Employee table, and the second SELECT statement retrieves the BusinessEntityID column from the JobCandidate table. The INTERSECT operator combines the two queries.
When you use the INTERSECT operator to combine queries (or EXCEPT, for that matter), the number of columns must be the same in both queries and the columns must be in the same order. In addition, the corresponding columns between the queries must be configured with compatible data types. The example above meets these conditions because each query returns only one column of the same data type (INT). When the INTERSECT operator is used to combine these the two queries, the statement returns the following results:
BusinessEntityID
212
274
As it turns out, the Employee table and JobCandidate table have only two BusinessEntityID values in common. If you were to examine the data in the JobCandidate table, you would find that the query results here are consistent with the data in that table. The table contains only 13 rows, and the BusinessEntityID column is NULL for all but two of the rows. The BusinessEntityID values in those rows are 212 and 274. And, as you would expect, the Employee table also contains a row with a BusinessEntityID value of 212 and a row with a value of 274.
Certainly, as the above example indicates, using the INTERSECT operator to combine the results of two queries together is a straightforward process. The key, as I’ve stated, is to make sure the SELECT lists in the two queries are in sync with each other.
However, that also points out one of the limitations of using the INTERSECT operator to combine queries—and that is, you cannot include columns in one query that are not included of the second query. If you do include multiple matching columns in each query, all the column values must match in order for a row to be returned. For example, suppose you’re retrieving data from two tables that each include columns for employee IDs, first names, and last names. If you want to match the two tables based on those three columns, the three values in the first table must match the three values in the second table for a row to be returned. (At this point, you might be asking yourself what you’re doing with all that redundant data in your database.)
Instead of taking this approach, you may decide to compare the IDs in both tables, but pull the first and last name from only one of the tables. Or you might decide that you want to pull information from one table that is not stored in the other table. However, because columns must correspond between the two queries when using an INTERSECT operator to combine them, you have to find a way to work around this limitation. One of the easiest ways to do that is to put your INTERSECT construction into a common table expression (CTE) and then join the expression to one of the tables to pull the additional data. For instance, the following example includes a CTE that contains the same INTERSECT construction you saw in the example above:
WITH
  cteCandidates (BusinessEntityID)
  AS
  (
    SELECT BusinessEntityID
    FROM HumanResources.Employee
    INTERSECT
    SELECT BusinessEntityID
    FROM HumanResources.JobCandidate
  )
SELECT
  c.BusinessEntityID,
  e.LoginID,
  e.JobTitle
FROM
  HumanResources.Employee AS e
  INNER JOIN cteCandidates AS c
    ON e.BusinessEntityID = c.BusinessEntityID
ORDER BY
  c.BusinessEntityID;
Notice that I’ve created a CTE named cteCandidates. As you would expect, the CTE returns the BusinessEntityID values that are contained in both the Employee and JobCandidate tables. In the primary SELECT statement, I then join the Employee table to the CTE in order to also retrieve the LoginID and JobTitle values from the Employee table. Because I put the INTERSECT join in the CTE, the statement can now return the following results:
BusinessEntityID LoginID JobTitle
212 adventure-works\peng0 Quality Assurance Supervisor
274 adventure-works\stephen0 North American Sales Manager
As you can see, I’ve gotten around the limitations of the INTERSECT operator and am now returning additional information from one of the tables. I could have also joined the CTE to a different table in order to include additional information. For example, I might have joined what I have here to the Person table to retrieve the employee’s first and last names. The point is, the CTE let’s you be quite flexible when working with the INTERSECT operator; you can still determine which rows match but also return all the data you need, regardless of the source table.

Working with the EXCEPT Operator

Once you’ve mastered the INTERSECT operator, you have, for the most part, mastered the EXCEPT operator. As I stated earlier, the EXCEPT operator returns only those rows that do not have a match in the joined table. The important point to remember when using the EXCEPT operator is that it makes a difference which side of the operator you place the query. Only data from the query to the left of the operator (before the operator) is returned. Let’s look at an example to demonstrate how this works.
In the following statement, I again combine two queries, one that retrieves data from the Employee table and one that retrieves data from the JobCandidate table:
SELECT BusinessEntityID FROM HumanResources.Employee
EXCEPT
SELECT BusinessEntityID
FROM HumanResources.JobCandidate;
This statement is nearly identical to the INTERSECT construction you saw in the first two examples, except, of course, for the use of the EXCEPT operator. However, because the query to the left of the operator is retrieving data from the Employee table, the final result set will include data only from that table, and not the JobCandidate table.
The Employee table, as it turns out, contains 290 rows. As you’ll recall from the previous examples, the two rows in the table with the BusinessEntityID values of 212 and 274 match the two rows in the JobCandidate table that also have BusinessEntityID values of 212 and 274. That means, these two rows should be excluded from the result set of the query above, which is exactly what happens. The query returns 288 rows that have BusinessEntityID values of 1 through 290. However, IDs 212 and 274 are not included in those results.
Now let’s look at what happens when you reverse the order of the queries, as I’ve done in the following example:
SELECT BusinessEntityID FROM HumanResources.JobCandidate
EXCEPT
SELECT BusinessEntityID
FROM HumanResources.Employee;
Notice that the query that retrieves data from the JobCandidate table now comes first, that is, sits to the left of the EXCEPT operator. The results from this statement, as you would expect, are quite different from the previous example. All that is returned is a single NULL value. In other words, according to the results, the JobCandidate table contains no BusinessEntityID values that are not contained in the Employee table. This is, of course, exactly the case.
As with the CTE example above, which uses the INTERSECT operator, you can also use CTEs with EXCEPT operators. But as the last example points out, if your CTE returns no data, your main query will also return no data (at least if you’re using an inner join), but that’s true with either operator. Overall, in fact, you’ll find there’s little difference between the INTERSECT and EXCEPT operators, in terms of how you use them. The difference, of course, is in the results. INTERSECT returns rows common to both queries, and EXCEPT returns only rows in the left query. Both operators, however, are useful additions to the types of joins that the UNION and UNION ALL operators provide. You can find more details about the INTERSECT and EXCEPT operators by referring to the topic “EXCEPT and INTERSECT (Transact-SQL)” in SQL Server Books Online. There you’ll find additional information about each operator and additional examples of how to use them.

No comments:

Post a Comment