Article Author: Robin Dewson
Introduction
With the inclusion of .NET within SQL Server 2005, many people thought that T-SQL would become semi-redundant, and no further updates would be applied. The functionality that was missing could perhaps be completed through the use of .NET. However, T-SQL code is still the heart and lungs of SQL Server. Set-based processing, in which SQL Server can build query plans, is by far the best option for many queries. With SQL Server 2005, T-SQL has gained some functionality that was previously missing, making it a good choice over its competitors. T-SQL has expanded its horizons with capabilities for pivoting data and ranking rows of data based on ranking criteria, as well as other new functionality required to work with the new XML data type. In this article, you’ll see how each of these enhancements works.
Common Table Expressions (CTEs)
You may have had times at which you’ve had to split a query in two in order to get the required results from a set of data. Typically, the first part of the query creates and populates some sort of temporary table and the second part uses the results from the first. This may occur when you wish to create some sort of grouping of information, and then wish to use that grouping for further analysis. As an example, you might create a temporary table that sums up each day’s worth of transactions within a bank account. The second part of the query takes the temporary table and uses it to calculate the daily interest accrued. Another example would be when you’ve created a derived table to populate a column for output via another query, and then had to complete a similar query on a different column created within the derived table for a different column within the final output result set.
A common table expression ( CTE ) is a bit like a temporary table. It’s transient, lasting only as long as the query requires it. Temporary tables are available for use during the lifetime of the session of the query running the code, or until they are explicitly dropped. The creation and use of temporary tables is a two- or three-part phase: table creation, population, and use. A CTE is built in the same code line as the SELECT , INSERT , UPDATE or DELETE statements that use it. (When I discuss the APPLY operator, I’ll demonstrate how to build a CTE within a user-defined function.)
The best way for you to understand a CTE is for me to demonstrate an example with some code. There are several new SQL Server demonstration databases that you can install as part of the SQL Server installation process. One of these is AdventureWorks , an example cycle production company. Within the AdventureWorks database, there are a number of products held in the Production.Product table. For this example, let’s say you want to know the maximum list price of stock you’re holding over all the product categories. Using the old code style, this would be a two-part process, as follows:
SELECT p.ProductSubcategoryID, s.Name,SUM AS ListPrice INTO #Temp1 FROM Production.Product p JOIN Production.ProductSubcategory s ON s.ProductSubcategoryID = p.ProductSubcategoryID WHERE p.ProductSubcategoryID IS NOT NULL GROUP BY p.ProductSubcategoryID, s.Name
SELECT ProductSubcategoryID,Name,MAX FROM #Temp1 GROUP BY ProductSubcategoryID, Name
HAVING MAX = (SELECT MAX FROM #Temp1)
DROP TABLE #Temp1
However, with CTEs, this becomes a bit simpler and more efficient. In the preceding code snippet, you’ve created a temporary table. This table has no index on it and therefore SQL Server will complete a table scan operation on it. In contrast, the following code snippet uses the raw tables, and therefore a query plan can be used. Also, there is no creation of a temporary table, which would have used up processing time.
The CTE is built up using the WITH statement, which defines the name of the CTE you’ll be returning and the columns contained within it. The columns returned will take the data types placed into it from the SELECT statement within the brackets. This table is built up, returned, and passed immediately into the following SELECT statement outside of the WITH block.
WITH ProdList (ProductSubcategoryID,Name,ListPrice) AS
(
SELECT p.ProductSubcategoryID, s.Name,SUM AS ListPrice FROM Production.Product p JOIN Production.ProductSubcategory s ON s.ProductSubcategoryID = p.ProductSubcategoryID WHERE p.ProductSubcategoryID IS NOT NULL GROUP BY p.ProductSubcategoryID, s.Name
)
SELECT ProductSubcategoryID,Name,MAX FROM ProdList GROUP BY ProductSubcategoryID, Name HAVING MAX = (SELECT MAX FROM ProdList)
PIVOT and UNPIVOT
I’m sure there have been several times when you’ve taken a result set of data, copied and pasted the results, and placed them into Excel to perform a PIVOT for the results. SQL Server 2005 now comes with the ability not only to PIVOT the data, but to also UNPIVOT the data as well.
PIVOT
The following code section shows how to list a product, each of the discounts applied to that product, and how much of that product/discount combination has been sold from AdventureWorks. As you would expect, the output produces one row for each grouping.
SELECT productID,UnitPriceDiscount,SUM FROM Sales.SalesOrderDetail WHERE productID IN (776,711,747) GROUP BY productID,UnitPriceDiscount ORDER BY productID,UnitPriceDiscount
Through the use of the PIVOT operator, it’s possible to pivot the table so that the discount produces the rows and the products produce the columns, therefore making a great cross-reference table. The code to generate the preceding results as a pivot table is as follows:
SELECT pt.Discount,ISNULL(711,0.00) As Product711, ISNULL(747,0.00) As Product747,ISNULL(776,0.00) As Product776
FROM
(SELECT sod.LineTotal, sod.ProductID, sod.UnitPriceDiscount as Discount FROM Sales.SalesOrderDetail sod) so
PIVOT
(
SUM
FOR so.ProductID IN (776, [711], [747])
) AS pt
ORDER BY pt.Discount
Let’s work through this code. First of all, you need to create a subquery that contains the columns of data that the PIVOT operator can use for its aggregation (it will also be used later for displaying in the output). No filtering has been completed at this point-any columns not used in the aggregation will be ignored. The code generates a CTE with an alias of so . From this table, you then instruct SQL Server to pivot the columns while completing an aggregation on a specific column. You also tell SQL Server which data you want in the new columns created from the PIVOT . In this code, you’re asking to use the ProductID column and three products, thus producing three columns-one for each product. This is the equivalent to using GROUP BY for the aggregation, and it’s also the equivalent of filtering data from the so CTE. However, within the so CTE there’s also a third column, UnitPriceDiscount . Without this, the output from the PIVOT would produce one row with three columns-one for each product. By including this column, you’re informing SQL Server that this is also a grouping column and that you want a row for each UnitPriceDiscount .
This then produces a temporary result set which you alias-in this case as pt . It’s possible to complete a SELECT * instead of SELECT pt.Discount so that you can see all the data that is returned. ProductID no longer exists, as you’ve now produced three columns of the specific product, and one row for each UnitPriceDiscount within the Sales.SalesOrderDetail table. When there’s no data to SUM , then a NULL value is placed in that cell.
To summarize this, a subquery is created that holds the columns to aggregate ( LineTotal ), filter ( ProductID ), and optionally produce a row for each change of value with UnitPriceDiscount . You then aggregate the required column, filter the data, and produce a set of output that you can then use as any other table.
UNPIVOT
The reverse of PIVOT is of course UNPIVOT . Quite simply, it does what it says on the box, and will unpivot data by placing column data into rows. You can prove this by unpivoting the data just pivoted using the query in the last section. The code that follows will rebuild the pivot and place the data into a temporary table. From that temporary table you can unpivot back.
UNPIVOT has similarities to PIVOT in that you build a CTE-in this case calling it upl -which you then use as the basis of unpivoting.
USE AdventureWorks
go
SELECT pt.Discount,ISNULL(711,0.00) As Product711, ISNULL(747,0.00) As Product747,ISNULL(776,0.00) As Product776
INTO #Temp1
FROM
(SELECT sod.LineTotal, sod.ProductID, sod.UnitPriceDiscount as Discount FROM Sales.SalesOrderDetail sod) so
PIVOT
(
SUM
FOR so.ProductID IN (776, [711], [747])
) AS pt
ORDER BY pt.Discount
GO
SELECT ProductID,DiscountAppl FROM (SELECT product711, Product747, Product776 FROM #Temp1) up1
UNPIVOT ( DiscountAppl FOR ProductID IN (Product711, Product747, Product776)) As upv2
WHERE DiscountAppl <> 0
ORDER BY ProductID
GO
DROP TABLE #Temp1
APPLY
Within T-SQL, there have been INNER , OUTER , and CROSS JOIN operations that allow you to link two tables and provide a set of results based on the join operations. Now you have the ability to create user-defined functions that can return a table of data instead of returning a value based on an int , a nvarchar , etc. Because this is a function, the table doesn’t physically exist, and therefore the JOIN operator isn’t valid, as there’s no table to join onto. This is where the APPLY operator comes into play.
There are only two types of APPLY – CROSS and OUTER . Let’s take a look at them now.
CROSS APPLY
If you wish to return rows from a user-defined function that returns a table of data, then you would use CROSS APPLY . Take the following semi-fictitious example. Within AdventureWorks, you have products that have no product subcategories-the value in this column is NULL . This might happen if a new product line has been imported but not yet assigned a product category. The following query is used to produce a brochure listing with all the products you currently retail. As you can see, these unassigned products won’t be listed. This may be a fair reflection on what you want to achieve. However, using the query that follows, there’s no method to return information from the ProductSubcategory table so that the listing could be perhaps broken into each product’s subcategory. (Okay, bear with me here if you’re saying "Why doesn’t he use a JOIN ?" Well, yes I could use a JOIN , but what if the subquery is actually written as a table-valued UDF? This is where APPLY comes into its own. Recall that a table-valued UDF is not a true table that you can apply a JOIN criteria to.)
USE AdventureWorks
GO
SELECT ProductID,Name
FROM Production.Product
WHERE EXISTS (SELECT * FROM Production.ProductSubcategory WHERE ProductSubcategoryID = Production.Product.ProductSubcategoryID)
By creating a user-defined table function, you can then use a CROSS APPLY to return the relevant rows. Now, this is where I can redeem myself if you were thinking I was being awkward in the previous paragraph. The joy of having a user-defined table-valued function is that the same code can be used over and over again in multiple queries. Using the preceding code, you’d have to write the EXISTS or JOIN statements each time. Not a major chore, but when you have complex user-defined table-valued functions, then you may see where this is leading.
The user-defined table-valued function would look like the code that follows. You take in as a parameter the product subcategory ID, create the table you’ll be returning, and finally build up the data to then populate it within the table to return-nice and simple example.
CREATE FUNCTION fn_ProductSubCats(productSubCatId AS INT) RETURNSProdSubs TABLE
( prodSubName nvarchar(50)
)
AS
BEGIN WITH ProdSubCat(prodSubName) AS ( SELECT Name FROM Production.ProductSubCategory WHERE ProductSubcategoryID =productSubCatId ) INSERT INTOProdSubs SELECT * FROM ProdSubCat RETURN
END
GO
You can then demonstrate the CROSS APPLY with the following code, which places into the result set the product subcategory name that you were unable to obtain in the first example above. As each row is processed from the Production.Product table, the ProductSubcategoryID is passed in and then used to return a table of results.
SELECT pp.prodSubName,ProductID,Name
FROM Production.Product
CROSS APPLY fn_ProductSubCats(Production.Product.ProductSubcategoryID) AS pp
You would therefore still only return the products that had been assigned a product category.
OUTER APPLY
If you wished to return all rows from the Production.Product table, so that you could see what stock you have whether it has been assigned a product category or not, then you would use OUTER APPLY . This is similar to OUTER JOIN , in which a NULL is returned in the relevant column when there is no value.
The following code snippet shows the simple change that will return all 504 rows from the Production.Product table.
SELECT pp.prodSubName,ProductID,Name
FROM Production.Product
OUTER APPLY fn_ProductSubCats(Production.Product.ProductSubcategoryID) AS pp
Exception Handling
In the past, the best error-handling method in T-SQL was the use of ERROR . This wasn’t seen as the most developer-friendly technique, as it meant that you had to set a local variable immediately after the code you wanted to test, and then test the local variable before progressing. You would then use a GOTO statement to take your code to some sort of error-handling block. This could mean using ERROR repeatedly after many different lines of T-SQL code. This was cumbersome and tiresome, although necessary. And that was only if the error was one that could be trapped by SQL Server and handled without your code terminating (as you would find when trying to put alpha characters into a numeric column). It also meant you had to find some way of saying where the problem occurred.
It’s now possible to place one single error handling instruction around any block of code to trap any errors that occur in that block. If an error occurs – even one that in the past would terminate the batch-it can be trapped and your code can move and execute a specific error-handling set of code. Before going further, let’s take a look at an example. The following code has an error in the Birthdate WHERE filter (December is spelled incorrectly). If you execute this code, the batch terminates.
USE AdventureWorks
GO
SELECT BirthDate FROM HumanResources.Employee WHERE BirthDate < ’13 Dev 1945’
SELECT ‘A second statement’
GO
If you execute the preceding code, an error occurs, as you would expect. However, if you surround the code with a BEGIN TRY similar to the code that follows, then you can have more controlled error handling. The BEGIN TRY defines the start of the exception-handling block and END TRY defines the end. When an error occurs, it will automatically hit the BEGIN CATCH . This example just shows a simple SELECT statement for the error handler, but in your own applications you might instead call a generic error-handling routine, or write some T-SQL code to put the error in an error table, and so on.
BEGIN TRY
SELECT BirthDate FROM HumanResources.Employee WHERE BirthDate < ’13 Dev 1945’
SELECT ‘A second statement’
END TRY
BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_MESSAGE()As ErrorDesc;
END CATCH;
SELECT ‘A second statement’
Note that there is nothing in T-SQL that corresponds to the C# or VB finally statement. Any statements that should be executed in all situations (whether or not an error occurred) should be placed after the BEGIN/END CATCH block, as I’ve done with the final SELECT statement in the preceding code.
Just as you can catch the number and error message, it’s also possible to capture other information, such as the severity of the error as defined with the message, the line number within your code, and the stored procedure that it occurred in-ideal for when you have a large procedure or a set of nested procedures.
Ranking Functions
With SQL Server 2005, it’s now possible to add the ability to rank rows of data to your T-SQL code. These ranking functions can only be used with the SELECT and ORDER BY statements. (Sadly, they can’t be used in a WHERE or GROUP BY clause, so this feature can’t be directly used to page results.)
<function_name>() ORDER BY <order_by_list>">OVER
There are four ranking functions, which I’ll discuss in detail:
- ROW_NUMBER . Allows you to provide sequential integer values to the result rows of a query.
- RANK . Provides an ascending, non-unique ranking number to a set of rows, giving the same number to a row of the same value as another. Numbers are skipped for the number of rows that have the same value.
- DENSE_RANK . Similar to RANK, but each row number returned will be 1 greater than the previous setting, no matter how many rows are the same.
- NTILE . Takes the rows from the query and places them into an equal (or as close to equal as possible) number of specified numbered groups, where NTILE returns the group number the row belongs to.
ROW_NUMBER
Using the ORDER BY statement in T-SQL will do as it says – order your data in the sequence required. Until now, it hasn’t been possible to generate an ascending sequence of numbers to give each row a unique number at the same time. ROW_NUMBER will now do this. This is ideal for giving your output a reference point. For example, "Please take a look at row 10 and you’ll see . . ."
The following is an example that shows how the ROW_NUMBER function can provide an ascending number for each row returned when inspecting the Employee view. Be aware, though, that you could run the same query that follows and return a different order of data. The function is non-deterministic, and since the ORDER BY within the OVER function doesn’t produce a unique sequence of data (because there might be several people with the same LastName , for example), then you would need to find some other way to achieve uniqueness, if getting the same order with each execution were mandatory.
USE AdventureWorks
GO
SELECT ROW_NUMBER() OVER AS RowNum, FirstName + ‘ ‘ + LastName FROM HumanResources.vEmployee WHERE JobTitle = ‘Production Technician – WC60’ ORDER BY LastName
It’s also possible to reset the sequence to give a unique ascending number within a section, or PARTITION of data, using the PARTITION BY option. The following example will reset the sequential number at each change of first letter in the LastName of the Employees .
USE AdventureWorks
GO
SELECT ROW_NUMBER() SUBSTRING(LastName,1,1">OVER ORDER BY LastName) AS RowNum, FirstName + ‘ ‘ + LastName FROM HumanResources.vEmployee WHERE JobTitle = ‘Production Technician – WC60’ ORDER BY LastName
RANK
Rank provides the ability to keep the same number for a row, where the specified sort produces the same value. An increasing count is kept internally so that on a change of value, the returned count will "jump." For example, if five values are the same, on the sixth row, you’ll see a ranking of 6 for that row and any subsequent row with the same value. It’s not the easiest function to put into words, so let’s use an example to demonstrate the function in action.
This example changes to the vEmployeeDepartment view, in which there are more rows with the same value. Here, I’m keeping the ROW_NUM function from before, so I can cross-check that the RANK function skips to the right number. When you run the query, you’ll find that there are five rows with a Department called Document Control . This means that with RANK , these rows will be assigned the number 1. When the Department changes to Engineering , the RANK will change to the value of 6.
USE AdventureWorks
GO
SELECT ROW_NUMBER() OVER AS RowNum, RANK OVER AS Ranking, FirstName + ‘ ‘ + LastName, Department FROM HumanResources.vEmployeeDepartment ORDER BY ROW_NUMBER() OVER
This time, the results should remain relatively static-they’re listed in the following code block.
RowNum Ranking Department
————————- —————————— —————————— ————————
1 1 Chris Norred Document Control
2 1 Tengiz Kharatishvili Document Control
3 1 Zainal Arifin Document Control
4 1 Sean Chai Document Control
5 1 Karen Berge Document Control
6 6 Terri Duffy Engineering
7 6 Roberto Tamburello Engineering
8 6 Michael Sullivan Engineering
9 6 Sharon Salavaria Engineering
10 6 Gail Erickson Engineering
11 6 Jossef Goldberg Engineering
12 12 Ken Snchez Executive
13 12 Laura Norman Executive
DENSE_RANK
DENSE_RANK is similar to RANK -however, instead of jumping numbers though, it keeps the ascendancy in sequence. The following code demonstrates this.
USE AdventureWorks
GO
SELECT ROW_NUMBER() OVER AS RowNum,
DENSE_RANK() OVER AS Ranking,
CONVERT,FirstName + ‘ ‘ + LastName), Department FROM HumanResources.vEmployeeDepartment ORDER BY ROW_NUMBER() OVER
The results can be seen below. Notice that this time, on change of Department , the Ranking becomes 2 when Document Control becomes Engineering , instead of moving to 6.
RowNum Ranking Department
————————- —————————— —————————— ——————-
1 1 Chris Norred Document Control
2 1 Tengiz Kharatishvili Document Control
3 1 Zainal Arifin Document Control
4 1 Sean Chai Document Control
5 1 Karen Berge Document Control
6 2 Terri Duffy Engineering
7 2 Roberto Tamburello Engineering
8 2 Michael Sullivan Engineering
9 2 Sharon Salavaria Engineering
10 2 Gail Erickson Engineering
11 2 Jossef Goldberg Engineering
12 3 Ken Snchez Executive
13 3 Laura Norman Executive
NTILE
There may be times when you need to split a set of work into different groups of people-for example, if you have a set of orders that need to be individually inspected before being dispatched. In such a case, NTILE could be used to split the set of results into equal (or almost equal) groups. The following code demonstrates NTILE in action:
USE AdventureWorks
GO
SELECT NTILE OVER AS NTile, FirstName + ‘ ‘ + LastName, Department FROM HumanResources.vEmployeeDepartment
This produces 10 groups of 29 rows each. A snippet of the output is shown in Figure 1, in which you’ll see the grouping alter between rows 29 and 30.

Figure 1. Results from running a query using NTILE
XML Support
XML support came about in SQL Server 2000 with the ability to use FOR XML within queries. This was somewhat limiting-XML documents, for example, had to be stored within a text data type. In this section, I’ll discuss the changes that have occurred in this area of SQL Server.
The XML Data Type
Like nvarchar , int , etc., there’s now a base data type called xml , which can hold XML documents or parts of XML documents. There are no differences between this data type and other data types. It can be used in constraints (both column and table) and can also have a default value placed in it as each row is created.
The FOR XML Clause
FOR XML is still with us, and like many areas of SQL Server 2005, it has been expanded and improved. If you wish to return data as an XML data type, you can now do this using the TYPE declaration after the FOR XML statement. There have also been enhancements for the AUTO mode, such as the ability to use the XML data type. RAW mode has also been enhanced to provide you with the ability to specify a row element name, and the ability to specify the root element for the resulting XML, as shown in the following query:
SELECT * FROM Sales.SalesOrderDetail WHERE SalesOrderDetailID < 6 FOR XML RAW, ROOT
The EXPLICIT mode now includes support for the CDATA directive and the elementxsinil column mode (which will place an element value of xsi:nil="true" in the XMLwhen a column to retrieve has a null value). The following code would put one of these element values in the first two rows of data, as the Color is NULL .
SELECT 99 AS Tag, NULL AS PARENT, Name AS [Product!99!Name], ProductNumber AS [Product!99!Number!ELEMENT], Color AS [Product!99!Color!ELEMENTXSINIL] FROM Production.Product WHERE ProductId BETWEEN 4 AND 318 FOR XML EXPLICIT
XQuery
Sometimes, you might wish to query data within the xml data type in a SQL Server set of T-SQL statements. This is where XQuery coding comes into play. Take a look at the example that follows: I define an xml data-type variable, and call it myXML</span> . I then populate this with a set of XML elements. You can then use an XQuery to list every element within the <span class='codeintext'>xml</span> data type or in the last <span class='codeintext'>SELECT</span> statement, but in this example I've simply listed every <span class='codeintext'>productID</span> returned, with all of the results in XML format.</p>
<!-- @START —>
DECLARE
myXML xml
SELECT myXML=(select *
FROM Production.Product
WHERE ProductId BETWEEN 4 AND 318
FOR XML AUTO,ELEMENTS)
SELECT myXML.query('/Production.Product/*')
SELECT myXML.query(’/Production.Product/ProductID’)
For each element node that you wish to go into, you have to follow the hierarchy within your XQuery, not forgetting that XML is case sensitive.
Conclusion
In this article, I’ve focused on the T-SQL enhancements within SQL Server. In my view, you’ll find that the most used, and by far the most useful of these enhancements is the exception handling, in which the acceptable but awkward ERROR (which still exists) has been superseded by the far superior TRY/CATCH error handling. With this and the other advancements, including rankings, pivoting, common table expressions, and XML, T-SQL certainly hasn’t been forgotten among all the other changes that have happened.

