We all appreciate that, in general, XML documents or fragments are held in strings as text markup. In SQL Server, XML variables and columns are instead tokenised to allow rapid access to the data within. This is fine, but can cause some odd problems, auch as 'entitization'. What, also, do you do if you need to preserve the formatting? As usual Rob Sheldon comes to our aid.
When you’re working with XML data, you might find that you want to convert a value from the XML data type to another type, or from another type to the XML type. SQL Server lets you do both, to a limited degree. You can convert XML data to any of the string or binary types, and you can convert any of the string or binary types to XML. The process you follow if fairly straightforward in either case. Mostly, you need to know when SQL Server will do implicit conversions or when you must explicitly cast data into another type. And you should be aware of a couple other subtleties when converting to or from XML data.
Converting String Data to XML
In SQL Server, you can convert data configured with any of the character or binary data types—such as
CHAR,
VARCHAR, and
VARBINARY—to the
XML data type. You can use the
CAST() or
CONVERT() function to explicitly cast the data to a different type, or you can let SQL Server implicitly convert the data. For instance, SQL Server will automatically convert a string value stored with the
NVARCHAR data type into an
XML value, as shown in the following example:
DECLARE @string NVARCHAR(MAX);DECLARE @xml XML;SET @string =
'<bookstore><book>Candide</book></bookstore>';SET @xml = @string;SELECT @xml;
In this set of statements, I first declare the
@string variable with the
NVARCHAR(MAX) data type and then the
@xml variable with the
XML data type. I assign a string value—an XML fragment—to the
@string data type, then set the value of
@xml to equal
@string. Because SQL Server can implicitly convert an
NVARCHAR value to an
XML value, the assignment is very straightforward, which I confirm by querying the
@xml variable. As expected, the
SELECT statement returns the following results:
<bookstore><book>Candide</book></bookstore>
The result set contains the XML fragment as it was originally assigned to the
@string variable. The same holds true for other character data types, in terms of performing implicit conversions. For instance, in the following example, I replace the
NVARCHAR(MAX) data type with
VARCHAR(100):
DECLARE @string VARCHAR(100);DECLARE @xml XML;SET @string =
'<bookstore><book>Candide</book></bookstore>';SET @xml = @string;SELECT @xml;
This example is identical to the preceding one, except for switching the character data types. As a result, the
SELECT statement returns the same XML fragment as before:
<bookstore><book>Candide</book></bookstore>
Again, SQL Server has implicitly converted the string data to XML. However, you can explicitly convert that data if you want. In the following example, I use the
CAST() function to convert the
@string value:
DECLARE @string VARCHAR(100);DECLARE @xml XML;SET @string =
'<bookstore><book>Candide</book></bookstore>';SET @xml = CAST(@string AS XML);SELECT @xml;
When you specify the
CAST() function, you must pass in an expression that references the data to be converted, in this case, the
@string variable, followed by the
AS keyword and the name of the target data type,
XML. All this is enclosed in parentheses, as I’ve done here. I then assign the value returned by the
CAST() function to the
@xml variable. As to be expected, the
SELECT statement returns the same results as the previous examples:
<bookstore><book>Candide</book></bookstore>
You can just as easily use the
CONVERT() function to achieve the same results. The main difference is how you structure the arguments that you pass into the function. For
CONVERT(), you must first specify the target data type (
XML), followed by the expression that references the source data (
@string), as shown in the following example:
DECLARE @string VARCHAR(100);DECLARE @xml XML;SET @string =
'<bookstore><book>Candide</book></bookstore>';SET @xml = CONVERT(XML, @string);SELECT @xml;
Notice that I follow the
XML data type argument with a comma, then I specify the
@string variable. Once again, the
SELECT statement returns the
XML fragment as it is assigned to the
@string variable:
<bookstore><book>Candide</book></bookstore>
You might be wondering why you would explicitly cast string or binary data to the
XML type if SQL Server handles such conversions implicitly. Let’s look at the
CAST() function first. One reason you might want to use
CAST()is if you plan to run your SQL script against a database management system in addition to SQL Server and that system doesn’t support implicit conversions. Because the
CAST() function conforms to ANSI specifications, you can use it with any database systems that conform to those standards.
That’s not the case with the
CONVERT() function, which is specific to Transact-SQL in SQL Server. If you want to run your script against another database system, you have to modify your code to conform to that system’s specifications. However, if you’re running your script only against SQL Server, you can take advantage of the
CONVERT() function’s support for additional options, which let you better refine the conversion process.
For instance, in the following example I use tabs between the parent and child elements in the XML fragment:
DECLARE @string VARCHAR(100);DECLARE @xml XML;SET @string =
'<bookstore> <book>Candide</book> </bookstore>';SET @xml = CONVERT(XML, @string);SELECT @xml;
The tabs serve only to add white space between the XML elements and do not affect the elements themselves. Then, as I do in the previous example, I use the
CONVERT() function to cast the string as
XML data. However, when I run the
SELECT statement, the data no longer includes the white space, as shown in the following results:
<bookstore><book>Candide</book></bookstore>
But I can preserve the white space in the original string by adding a third argument to the
CONVERT() function, as shown in the following example:
DECLARE @string VARCHAR(100);DECLARE @xml XML;SET @string =
'<bookstore> <book>Candide</book> </bookstore>';SET @xml = CONVERT(XML, @string, 1);SELECT @xml;
Notice that the function’s third argument is
1, which tells SQL Server to preserve the white space in the source data during the conversion process. Now the
SELECT statement returns the following results:
<bookstore><book>Candide</book></bookstore>
This time the tabs are intact. I could have used spaces instead and received similar results. Now let’s look at what happens when we add linefeeds to the XML fragment. In the following example, I modify the string to include an additional book along with linefeeds and tabs:
DECLARE @string VARCHAR(100);DECLARE @xml XML;SET @string = '<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>';SET @xml = CONVERT(XML, @string, 1);SELECT @xml;
Because I pass the third argument (
1) into the
CONVERT() function, as I did in the previous example, the conversion process preserves the white space and linefeeds, as shown in the results returned by the
SELECT statement:
<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>
Another thing worth noting about converting data to the
XML data type is that once you’ve converted the data, you can use the data type’s methods to retrieve data. For example, I modified the
SELECT statement in the previous example to include the
query() method:
DECLARE @string VARCHAR(100);DECLARE @xml XML;SET @string = '<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>';SET @xml = CONVERT(XML, @string, 1);SELECT @xml.query('/bookstore/book[2]');
As you can see, I pass in the second instance of the child element (
/bookstore/book[2]) as an argument to the
query() method. The
SELECT statement now returns the following results.
<book>Pride and Prejudice</book>
You can, of course, use the
query() method to perform more complex queries, and you can use other methods. The key is to save the string data to the
XML type and then call the method.
Converting XML to String Data
Although SQL Server implicitly converts string and binary data to the
XML data type, it doesn’t work the other way. All conversions from the
XML type to string and binary types must be done explicitly using the
CAST() or
CONVERT() function. If you try to perform an implicit conversion, your statement will fail. For instance, the following example tries to convert the XML fragment to
VARCHAR(100):
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml =
'<bookstore><book>Candide</book></bookstore>';SET @string = @xml;SELECT @string;
After I declare the
@xml and
@string variables, I assign the XML fragment to the
@xml variable. I then try to set the
@string variable to equal the
@xml variable, but when I try to run these statements, SQL Server returns the following error:
Implicit conversion from data type xml to varchar is not allowed. Use the CONVERT function to run this query.
The problem is easy enough to fix. In the following example, I include the
CAST() function to explicitly convert the
XML value to
VARCHAR(100):
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml =
'<bookstore><book>Candide</book></bookstore>';SET @string = CAST(@xml AS VARCHAR(100));SELECT @string;
Now the
XML value is converted into a string with no problem. When I retrieve the value of
@string, the
SELECT statement returns the XML fragment, as shown in the following results:
<bookstore><book>Candide</book></bookstore>
I could have just as easily used the
CONVERT() function to cast the
XML value to
VARCHAR:
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml =
'<bookstore><book>Candide</book></bookstore>';SET @string = CONVERT(VARCHAR(100), @xml);SELECT @string;
As to be expected, the conversion occurs without a hitch and the XML fragment is assigned to the
@string variable, which the
SELECT statement confirms. Now suppose you insert linefeeds and tabs into your XML fragment, as I do in the following example:
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml ='<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>';SET @string = CONVERT(VARCHAR(100), @xml, 1);SELECT @string;
Because the XML fragment contains these new elements, you might expect that you can simply add the third argument to the
CONVERT() function to preserve the tabs and linefeeds. However, although I take this approach, the value saved to the
@string variable does not preserve these elements, as shown in the results returned by the
SELECT statement:
<bookstore><book>Candide</book><book>Pride and Prejudice</book></bookstore>
The problem is not with how I convert the
@xml value, but with the way I assign the XML fragment to that variable. Notice that I assign the value simply by setting
@xml to equal the XML fragment, enclosed in single quotes. What is essentially happening here is that SQL Server is implicitly converting a string value to the
XML type, which means that the tabs and linefeeds are not being preserved during that assignment. The way to get around this is to use the
CONVERT() function to explicitly cast the string to the
XML type, as I do in the following example:
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml = CONVERT(XML,'<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>', 1);SET @string = CONVERT(VARCHAR(100), @xml, 1);SELECT @string;
As you can see, I include a third argument in the
CONVERT() function that specifies these elements be preserved. The
SELECT statement returns the following results:
<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>
Now the string includes the tabs and linefeeds as they exist in the original XML fragment. Let’s take a closer look at this process so you can better understand what’s happening behind the scenes when you convert XML data.
XML Entitization
As the last example demonstrates, you can use the
CONVERT() function to preserve characters that the XML parser normally ignores, such as spaces and tabs between elements. What this points to is that SQL Server handles certain components of an XML fragment—whether converting to or from the
XML type—differently from the basic elements, attributes, and their values.
To better understand what’s happening, take a look at another example. The following set of Transact-SQL statements are similar to the last example, except that I’ve included a second
SELECT statement to retrieve the
@xml value and I’ve modified the second instance of the
CONVERT() function so it doesn’t include the third argument (
1):
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml = CONVERT(XML,'<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>', 1);SELECT @xml;SET @string = CONVERT(VARCHAR(100), @xml);SELECT @string;
As you might expect, the new
SELECT statement returns the XML fragment with the linefeeds and tabs in place:
<bookstore>
<book>Candide</book>
<book>Pride and Prejudice</book>
</bookstore>
However, the second
SELECT statement returns several unexpected values:
<bookstore>
	<book>Candide</book>
	<book>Pride and Prejudice</book>
</bookstore>
NOTE: If I were to try to run this example without converting the original XML fragment, or converting the fragment without specifying the third argument in the CONVERT() function, the tabs and linefeeds would not be preserved, and the two SELECT statements would return the string without those tabs and linefeeds.
When the XML parser stores characters such as tabs and linefeeds within an XML document or fragment, it must do so in a way that preserves them throughout the parser’s normalization process, allowing these elements to be maintained whether being stored or retrieved or whether being converted to or from XML. Tabs and linefeeds are just two of the special characters that need to be preserved in this way, as the above example demonstrates, with tabs being saved as
	 and linefeeds saved as

. When you retrieve the
XML value directly, these special characters are automatically displayed in a readable format.
The process of preserving certain characters with these special symbols is known as
entitization. Unfortunately, the XML parser is not always consistent in entitizing characters, as demonstrated by the fact that only one linefeed is stored as

. Another issue is the way the parser handles special characters that appear in an element or attribute value. In such cases, SQL Server simply returns an error, rather than trying to entitize them. For instance, in the following example I replace the word “and” in the second book title with an ampersand (&):
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml = CONVERT(XML,'<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>', 1);SELECT @xmlSET @string = CONVERT(VARCHAR(100), @xml);SELECT @string;
When I run this statement, I receive the following error:
XML parsing: line 3, character 15, illegal name character
The XML parser doesn’t like characters such as ampersands, left brackets (<), and right brackets (>) within element and attribute values. To store XML that contains these characters, you must manually entitize them by replacing the character with their symbols. In this case, I replace the ampersand with
&:
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml = CONVERT(XML,'<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>', 1);SELECT @xml;SET @string = CONVERT(VARCHAR(100), @xml);SELECT @string;
The parser can now parse the XML fragment. In doing so, it preserves the entitized ampersand, as shown in the results returned by the first
SELECT statement:
<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>
As you can see, the second book title contains the entitized ampersand. Now SQL Server has no problem handling the element’s value when that value is stored, retrieved, or converted. The entitized ampersand is also preserved when you convert the
XML value to a string, which the second
SELECT statement confirms:
<bookstore>
	<book>Candide</book>
	<book>Pride & Prejudice</book>
</bookstore>
Of course, chances are pretty good you won’t want your string value to show the entitized characters. As before, you can try to get rid of them by adding the third argument to the second
CONVERT() function, as shown in the following Transact-SQL:
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml = CONVERT(XML,'<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>', 1);SELECT @xml;SET @string = CONVERT(VARCHAR(100), @xml, 1);SELECT @string;
The second
SELECT statement now returns the following results:
<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>
Almost there. The entitized tab and linefeed characters are returned as regular characters, but the ampersand within the element value is still entitized. To address this issue, you have to specifically remove it from your string. For example, in the following Transact-SQL I use the
REPLACE() function to replace the
& value with an ampersand:
DECLARE @xml XML;DECLARE @string VARCHAR(100);SET @xml = CONVERT(XML,'<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>', 1);SELECT @xml;SET @string =
REPLACE(CONVERT(VARCHAR(100), @xml, 1), '&', '&');SELECT @string;
For the first argument of the
REPLACE() function, I pass in the converted
@xml value, then specify the value to be replaced (
&) and the new value (
&). Now the second
SELECT statement returns the value we want:
<bookstore>
<book>Candide</book>
<book>Pride & Prejudice</book>
</bookstore>
As the results show, the string value no longer contains any entitized characters. As you work with XML data, you’ll get a better feel for when to manually entitize and de-entitize data. For more information about the entitization process in SQL Server, see the topic “
Serialization of XML Data” in SQL Server Books Online.
XML Data Conversion
For the most part, converting data from the
XML data type to another type or converting it from another type to the
XML type is a fairly painless process. Not surprisingly, there are a few
gotchyas along the way, entitization being one of them. For more details about converting XML data, see the topics “
Generating XML Instances” and “
CAST and CONVERT (Transact-SQL)” in SQL Server Books Online. Also worth checking out is how to incorporate the
FOR XML clause in a
SELECT statement to return results as XML. You can find details about that in my Simple-Talk article “
Using the FOR XML Clause to Return Query Results as XML.” As you can see, SQL Server provides a number of ways to work with XML data. This article, along with the first three in the series, should have provided you with a good overview of the various considerations to take into account when working with XML.