Contact  |  Site Map  |  Privacy      Call for more info
 
Latest articles

 

Performance of XML Indexes in MS SQL 2005 (and SQL Server 2008)

In this article we make a 'test drive' of XML indexes. We compare a traditional database schema with 2 master - Details tables versus one table where data is encoded in XML.

Master table is called Orders, and the details table is called OrderDetails. They are joined using the column id. You can download a script to create and populate all tables here

CREATE TABLE [dbo].[Orders] (

      [id]              [varchar](20)     NOT NULL ,

      [doc_number]      [int]             NOT NULL ,

      [doc_date]        [datetime]        NULL ,

      [branch]          [varchar](10)     NOT NULL ,

      [address]         [varchar](255)    NOT NULL ,

      [payment_date]    [datetime]        NULL ,

      [processed]       [int]             NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[OrderDetails] (

      [id]              [varchar](20)     NOT NULL ,

      [code]            [varchar](4)      NOT NULL ,

      [price]           [decimal](15,2)   NULL ,

      [quantity]        [int]             NULL ,

      [tax]             [decimal](2)      NOT NULL ,

      [exp_date]        [datetime]        NULL

) ON [PRIMARY]

GO

 

We populate table Orders with 94,300 rows and with about 1,200,000 rows in OrderDetails (approx. 100 rows in orderDetails in one Order). This is done by stored procedure RandTable2.

We also insert the same data into the table, where all information from the table Details is stored in one XML column 'x' (it is done by stored procedure FillXML_Attr):

 

CREATE TABLE [dbo].[OrdersXML_Attr] (

      [id]              [varchar](20)     NOT NULL ,

      [doc_number]      [int]             NOT NULL ,

      [doc_date]        [datetime]        NULL ,

      [branch]          [varchar](10)     NOT NULL ,

      [address]         [varchar](255)    NOT NULL ,

      [payment_date]    [datetime]        NULL ,

      [processed]       [int]             NULL ,

      [x]               [xml]             NULL

) ON [PRIMARY]

GO

 

Data is stored in the following format (attribute-style):

<Orders id="00000001239230086">
<
OrderDetails code="0475" price="7038058788495.00" quantity="4224" tax="70" exp_date="2000-12-18T00:00:00"
/>
<
OrderDetails code="0518" price="2453159225913.00" quantity="3236" tax="24" exp_date="2001-08-16T00:00:00"
/>
<
OrderDetails code="0585" price="5268848020769.00" quantity="53" tax="69" exp_date="2000-03-31T00:00:00"
/>
</Orders
>

There is also an alternative, data can be stored in Element-style format . We save that data in a table named OrdersXML_Elem (structure is identical to the table OrdersXML_Attr) , and the data in column 'x' looks like:

<Orders>
     <
id>00000001239175086</id>
     <
OrderDetails>
           <
code>0092</code>
           <
price>9315820827745.00</price>
           <
quantity>3447</quantity>
           <
tax>79</tax>
          <
exp_date>2001-07-11T00:00:00</exp_date>
     </
OrderDetails>
     <
OrderDetails>
          <
code>2359</code>
          <
price>7019946277559.00</price>
          <
quantity>1859</quantity>
          <
tax>43</tax>
          <
exp_date>2000-08-29T00:00:00</exp_date>
    </
OrderDetails>
</Orders>

Then we run some simple tests against these tables. First test retrieves all Orders containing items with code ?0123?. We made all test against the relational tables Orders/OrderDetails and against the tables with xml data. We performed these tests with one xml-index (VALUE) and with two indexes (VALUE+PATH).

-- Relational indexes
ALTER
TABLE OrdersXML_Attr ADD CONSTRAINT OrdersXML_Attr_XPK PRIMARY KEY (id)
GO
ALTER TABLE OrdersXML_Elem ADD CONSTRAINT OrdersXML_Elem_XPK PRIMARY KEY (id)
GO
CREATE UNIQUE INDEX Orders_XK1 on Orders (id)
GO
CREATE INDEX OrderDetails_XK1 on OrderDetails (id,code)
GO

-- XML indexes
CREATE PRIMARY XML INDEX OrdersXML_Attr_PRIMARY on OrdersXML_Attr (x)

GO

CREATE XML INDEX OrdersXML_Attr_VALUE ON OrdersXML_Attr(x)

USING XML INDEX OrdersXML_Attr_PRIMARY FOR VALUE

GO

CREATE PRIMARY XML INDEX OrdersXML_Elem_PRIMARY on OrdersXML_Elem (x)

GO

CREATE XML INDEX OrdersXML_Elem_VALUE ON OrdersXML_Elem(x)

USING XML INDEX OrdersXML_Elem_PRIMARY FOR VALUE

GO


Queries we used in our experiment:

(*) attribute-style query
select id, x.value('(/Orders/OrderDetails/@code)[1]','int') as o
  from OrdersXML_Attr
 where x is not null and
       x.exist('/Orders/OrderDetails [@code="0123"]')=1

(**) element-style query

select id, x.value('(/Orders/OrderDetails/code)[1]','int') as o

  from OrdersXML_Elem

 where x is not null and

       x.exist('/Orders/OrderDetails [code="0123"]')=1


(***) normal relational query

select distinct o.id, d.code, o.doc_number

  from Orders o, OrderDetails d

 where o.id = d.id and d.code = '0123'

 

In this table you can see results of our investigation. We provide 2 results in each cell, one for 'hot' execution (2nd or 3rd execution of the same query) and the second value for 'cold' execution (a very first executuon with void buffer cache, after execution of DBCC DROPCLEANBUFFERS)

Attribute style (*)

Element style(**)

Relational query (***)

index for VALUE

17.8 / 2266 ms

37054/ 59373 ms

1 / 250 ms

both indexes

30.6 / 3190 ms

37037/ 59936 ms


As you can see, results in attribute-style are 10-15 times worse the for the traditional, relational storage of data. Not so bad, at least, much better, then FULL TABLE SCAN. In any case we recommend to perform tests on your modelk data before you went too far. Bad surprise is degradation of performance when we use element-type style of storage. Absolute values are too big and grows proportionally of the table size so you must thing properly about using this type of storage. We will try to examine this situation in our future research.

It might be also interesting to compare the disk storage required to store 2 relational tables and data, occupied by XML:

 

Attribute style

OrdersXML_Attr

Element style

OrdersXML_Elem

Relational data

Orders/OrderDetails

Rows

94,300

94,300

94,300/1,224,974

Data space

394+3 Mb

(data+indexes)

135Mb in XML-text*

404+3 Mb

(data+indexes)

193 Mb in XML-text*

 8+3 / 76+127 Mb

 (*) we convert xml field 'x' to text using convert(varchar(max), x) and then count the length of all rows: sum(len(convert(varchar(max), x)))

Conclusion

As usual, XML is nbiot a silver bullet. It is not a good idea to replace everything with XML. But you can use it rwhere eally fits, where data structured are not stable, are badly defined, changes every day, come from external data sources etc.

 

Disclaimer:

In many articles you will find the execution times of different queries. Please do not interpret these results scientifically or as official benchmarks. You will not even find the configuration of the computer used for these experiments because it is rather unimportant.

What is important is that query A is running 2 times slower then query B. On your computer, it could be different: 1.5 or even 3 times slower. Read the full disclaimer