|
Performance of XML Indexes in MS SQL 2005 (and SQL Server 2008)
In this article we make a
'test drive' of XML indexes. We compare a traditional database schema with 2 master -
Details tables versus one table where data is encoded in XML.
Master table is called Orders, and
the details table is called OrderDetails. They are joined using the column
id. You can download a script to create and populate all tables
here
CREATE
TABLE [dbo].[Orders]
(
[id]
[varchar](20) NOT NULL ,
[doc_number]
[int]
NOT NULL ,
[doc_date]
[datetime] NULL ,
[branch]
[varchar](10) NOT NULL ,
[address]
[varchar](255) NOT NULL ,
[payment_date]
[datetime] NULL ,
[processed]
[int] NULL
) ON [PRIMARY]
GO
CREATE
TABLE [dbo].[OrderDetails]
(
[id]
[varchar](20) NOT NULL ,
[code]
[varchar](4) NOT NULL ,
[price]
[decimal](15,2) NULL ,
[quantity] [int]
NULL ,
[tax]
[decimal](2) NOT NULL ,
[exp_date]
[datetime] NULL
) ON [PRIMARY]
GO
We populate table Orders with 94,300 rows
and with about 1,200,000 rows in OrderDetails (approx. 100 rows in
orderDetails in one Order). This is done by stored procedure
RandTable2.
We also insert the same data into the table, where all
information from the table Details is stored in one XML column 'x' (it is done
by stored procedure FillXML_Attr):
CREATE
TABLE [dbo].[OrdersXML_Attr]
(
[id]
[varchar](20) NOT NULL ,
[doc_number]
[int]
NOT NULL ,
[doc_date]
[datetime] NULL ,
[branch]
[varchar](10) NOT NULL ,
[address]
[varchar](255) NOT NULL ,
[payment_date]
[datetime] NULL ,
[processed]
[int] NULL ,
[x]
[xml] NULL
) ON [PRIMARY]
GO
Data is stored in the following format
(attribute-style):
<Orders id="00000001239230086"> <OrderDetails code="0475"
price="7038058788495.00" quantity="4224" tax="70" exp_date="2000-12-18T00:00:00" /> <OrderDetails code="0518"
price="2453159225913.00" quantity="3236" tax="24" exp_date="2001-08-16T00:00:00" /> <OrderDetails code="0585"
price="5268848020769.00" quantity="53" tax="69" exp_date="2000-03-31T00:00:00" /> </Orders>
There is
also an alternative, data can be stored in Element-style format . We save
that data in a table named OrdersXML_Elem (structure is identical to the table
OrdersXML_Attr) , and the data in column 'x' looks
like:
<Orders> <id>00000001239175086</id> <OrderDetails>
<code>0092</code>
<price>9315820827745.00</price>
<quantity>3447</quantity>
<tax>79</tax>
<exp_date>2001-07-11T00:00:00</exp_date> </OrderDetails> <OrderDetails>
<code>2359</code>
<price>7019946277559.00</price>
<quantity>1859</quantity>
<tax>43</tax>
<exp_date>2000-08-29T00:00:00</exp_date> </OrderDetails> </Orders>
Then we run some simple tests against these tables. First
test retrieves all Orders containing items with code ?0123?. We made all
test against the relational tables Orders/OrderDetails and against the tables with xml
data. We performed these tests with one xml-index (VALUE) and with two
indexes (VALUE+PATH).
-- Relational
indexes ALTER TABLE OrdersXML_Attr ADD CONSTRAINT
OrdersXML_Attr_XPK PRIMARY KEY (id) GO ALTER TABLE OrdersXML_Elem ADD CONSTRAINT OrdersXML_Elem_XPK PRIMARY KEY (id) GO CREATE UNIQUE INDEX Orders_XK1 on Orders (id) GO CREATE INDEX OrderDetails_XK1 on OrderDetails
(id,code) GO
-- XML indexes CREATE
PRIMARY XML INDEX OrdersXML_Attr_PRIMARY on
OrdersXML_Attr (x)
GO
CREATE
XML INDEX
OrdersXML_Attr_VALUE ON OrdersXML_Attr(x)
USING
XML INDEX
OrdersXML_Attr_PRIMARY FOR VALUE
GO
CREATE
PRIMARY XML INDEX OrdersXML_Elem_PRIMARY on
OrdersXML_Elem (x)
GO
CREATE
XML INDEX
OrdersXML_Elem_VALUE ON OrdersXML_Elem(x)
USING
XML INDEX
OrdersXML_Elem_PRIMARY FOR VALUE
GO
Queries we
used in our experiment:
(*) attribute-style query select
id, x.value('(/Orders/OrderDetails/@code)[1]','int') as o
from
OrdersXML_Attr where x is not null and
x.exist('/Orders/OrderDetails [@code="0123"]')=1
(**) element-style query
select id, x.value('(/Orders/OrderDetails/code)[1]','int') as o
from OrdersXML_Elem
where x is not null and
x.exist('/Orders/OrderDetails [code="0123"]')=1
(***) normal relational query
select
distinct o.id, d.code,
o.doc_number
from Orders o, OrderDetails d
where o.id = d.id and d.code = '0123'
In this table you can see results
of our investigation. We provide 2 results in each cell, one for 'hot' execution
(2nd or 3rd execution of the same query) and the second value for 'cold'
execution (a very first executuon with void buffer cache, after execution of DBCC
DROPCLEANBUFFERS)
|
|
Attribute
style (*) |
Element
style(**) |
Relational
query (***) |
|
index
for VALUE |
17.8 / 2266
ms |
37054/
59373 ms |
1 /
250 ms |
|
both
indexes |
30.6 / 3190
ms |
37037/ 59936
ms |
|
As you can see, results in attribute-style are 10-15 times worse the for the traditional, relational storage
of data. Not so bad, at least, much better, then FULL TABLE SCAN.
In any case we recommend to perform tests on your modelk data before you
went too far. Bad surprise is degradation of performance when we use element-type
style of storage. Absolute values are too big and grows proportionally of the
table size so you must thing properly about using this type of storage. We will
try to examine this situation in our future research.
It might be also
interesting to compare the disk storage required to store 2 relational tables
and data, occupied by XML:
|
|
Attribute
style
OrdersXML_Attr |
Element
style
OrdersXML_Elem |
Relational
data
Orders/OrderDetails |
|
Rows |
94,300 |
94,300 |
94,300/1,224,974 |
|
Data
space |
394+3
Mb
(data+indexes)
135Mb in
XML-text* |
404+3
Mb
(data+indexes)
193 Mb in XML-text* |
8+3
/ 76+127 Mb |
(*) we convert xml field 'x' to text using convert(varchar(max), x)
and then count the length of all rows: sum(len(convert(varchar(max), x)))
Conclusion
As usual, XML is nbiot a silver bullet. It is not a good idea
to replace everything with XML. But you can use it rwhere eally fits, where data
structured are not stable, are badly defined, changes every day, come from
external data sources etc.
|