Contact  |  Site Map  |  Privacy      Call for more info
 
Latest articles

When SQL Server Query Optimizer Is Wrong

In most cases, SQL Server Optimizer generates optimal plans. It is impossible to compete with its internal knowledge of average disk access cost, record length or page fill ratio. But, there is one area where human expertise is always superior.

To follow my example, execute the following scripts on an empty database to create 2 tables (you can skip this step and look directly at the results):
Each Sale (master table) is associated with the details table - SalesItems.

create table Sales (
Id int primary key,
Amount money not null,
Date datetime not null,
Comment varchar(128)
)
GO
create table SaleItems (
SaleId int,
ItemId int not null,
Quantity int
)
GO

The Sales table (master table) will have 10'000 random records. The SalesItems table (details table) will have 1 to 10 items for each sale record in the Sales table.

Execute the following scripts to populate both tables and create indexes:

set nocount on
GO
declare @n int, @m int
set @n=10000
while @n>0 begin
insert into Sales select @n,$10.0*(@n%100)+$100.,dateadd(hh,-@n,'20070507'),
'This is sale N '+convert(varchar,@n)
set @m=@n%10
while @m>0 begin
insert into SaleItems select @n,@m,@m+@n
set @m=@m-1
end
set @n=@n-1
end
GO
create index SaleItems_Id on SaleItems (SaleId)
GO
create index Sales_Dates on Sales (Date)
GO

Note that Sales.Date varies from 2006-03-16 08:00 to 2007-05-06 23:00.

Now, we make a first attempt to write a stored procedure that retrieves the maximum quantity of sales for a specified period of time.

create procedure GetMaxQuantity
@p1 datetime, @p2 datetime
as
select max(Quantity) from SaleItems where SaleId in (
select Id from Sales where Date between @p1 and @p2)
GO

Before running the procedure, let's enable IO Statistics by executing:

set statistics io on

It is also important that we always clear the buffer cache before executing the procedure. Otherwise, the number of physical reads will not be accurate:

dbcc dropcleanbuffers

Now, let's execute the stored procedure:

exec GetMaxQuantity '20070301','20070302'

In my test environment, I get the following IO statistics:

Table 'SaleItems'. Scan count 25, logical reads 191, physical reads 1, read-ahead reads 4.
Table 'Sales'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0.

Let's again execute the procedure with a different date range:

dbcc dropcleanbuffers

exec GetMaxQuantity '20060301','20070302 

This time, the following IO statistics are returned:

Table 'SaleItems'. Scan count 8417, logical reads 55564, physical reads 90, 
read-ahead reads 173. Table 'Sales'. Scan count 1, logical reads 17, physical reads 1, read-ahead reads 16.

It is important that you execute the first query before the second. There is no surprise that the second execution, for the whole year, requires much more logical and physical reads. However, 55564 logical reads is too high. So what happened?

Let's examine the execution plan:

SQL server provides also many counters, like:

While for most of these numbers we can just trust SQL Server, there is one value we can verify - the estimated row count:

SQL Server thinks that when we select data from Sales, we get around 23 records, and then, after joining it to SaleItems we'll have:

Fortunately, it can be verified:

select count(*) from Sales where Date between '20060301' and '20070302'

In fact, there are 8417 records instead of 23, which was expected by SQL Server's query optimizer, and

select count(*) from SaleItems where SaleId in (
select Id from Sales where Date between '20060301' and '20070302')

37884 records instead of the expected 118! What a mistake. If only SQL Server knew it, it would have generated a totally different execution plan:

select max(Quantity) from SaleItems where SaleId in (
select Id from Sales where Date between '20060301' and '20070302')

As you can see, when constants are supplied, SQL Server can obtain information from the statistics tables. When it realizes there are too many records, it changes the execution plan and, instead of costly Lookups, uses a Hash Join.

The SQL Server query optimizer is a black box. However, we can sometimes make some interesting experiments.

Execute the following queries:

select * from Sales where Comment like '%A%'
select * from Sales where Comment like '%AA%'
select * from Sales where Comment like '%AAA%'
select * from Sales where Comment like '%AAAA%'
select * from Sales where Comment like '%AAAAA%'
select * from Sales where Comment like '%AAAAAA%'
select * from Sales where Comment like '%AAAAAAA%'
select * from Sales where Comment like '%AAAAAAAA%'

All of them do not return any rows, and therefore all estimations are absolutely wrong. But SQL Server's query optimizes gives an estimation of the number of returned rows:

As you can see, it is nothing more than pure guess and a heuristic formula.

This is the most common scenario: SQL Server underestimates the number of rows, and queries tables thousands or even millions of times. On the other hand, it can overestimate the number of rows, and uses a Hash Join and reads the whole table when it actually needs only a few rows.

SQL Server knows the physical layout of data much better than us. It even knows some primitive statistics based on columns. SQL Server 2005 can even trace the correlation of data in different tables. However, don't expect too much from it. It would never understand, that Name LIKE '%Smith%' returns more rows than expected, or joining data from 2 sales tables of 2 different departments would hardly give us any matches; and we did it just to confirm that fact.

Disclaimer:

In many articles you will find the execution times of different queries. Please do not interpret these results scientifically or as official benchmarks. You will not even find the configuration of the computer used for these experiments because it is rather unimportant.

What is important is that query A is running 2 times slower then query B. On your computer, it could be different: 1.5 or even 3 times slower. Read the full disclaimer