Contact  |  Site Map  |  Privacy      Call for more info
 
Latest articles

Sudden Catastrophic Performance Degradation

    

Imagine that one day you find that your server is running 5 or even 10 times slower then before. What could cause that degradation? New version of the software? New server configuration? Some problems with hardware? It is very logical to check all this.

    

But is it possible that server is running slower just without any real cause, but just because of the small and regular data growth? Interestingly enough, yes, it is possible.

    

Let's make a pure experiment. I have a big trace, created using SQL Profiler. It is not important, what is inside, I am gust going to read it. It must contain at least 100K rows. This trace is saved in a table called my MyTrace.

    

I limit the memory of my SQL server to 90M. It is a very small number, but let's say it is a model experiment in a ratio of 1:100. So in reality you have 9Gb of memory on your production server.

    

I create a table where I log the results of my experiment:

    

create table log (n int, t int)

    

Then I run the following script:

    

declare @n int, @j int, @t1 datetime

set @n=100

while @n<100000 begin

    select @j=count(*) from MyTrace

          where substring(TextData,1,30)='nonsense' and RowNumber<@n

    set @t1=getdate()

    select @j=count(*) from MyTrace

          where substring(TextData,1,30)='nonsense' and RowNumber<@n

    insert into log select @n,datediff(ms,@t1,getdate())

    set @n=@n+1000

    end

GO

    

Look what I do inside. I search some nonsense in my trace: at first, in first 100 rows, then in 1100 rows, then in 2K rows, etc. I do it twice to measure the time of the second, 'hot' execution. The increase of @n emulates in our environment the daily database growth. There is no surprise that the more records we have to scan, to longer it takes:

    

    

But the form of this chart is may be not expected. Yes, it takes longer and longer, but then something happens at about 57K rows. Well, what happens is pretty obvious: data no longer fit in memory, so we need to read data physically. Let's build the graph where on Y we put not an absolute time, but a relative time per row:

    

    

Initially it is about 0.007 ms per row, but then on a short interval from 57K to 73K in raises to 0.2 ms per row. The first time is a time of accessing a row from memory, while the second one is a time of a physical disk read per row.

    

So, how it might look on your production? For example, multiply the time by 10 (it is in milliseconds) and let's assume that one line is per day. So, on Friday your system is healthy, it takes 3.8 seconds to run the query:

    

    

Next week on Monday it takes 9 seconds, next day 11 seconds. On Wednesday it slows down to 20 seconds. Thursday it degrades to 26, users start to receive 'timeout' error messages if timeout is set to 30 seconds (like by default in .Net). Finally, on Friday it takes 60 seconds to run the query. Everything stops.

    

You should monitor memory cache hits ratio using SQL Ultimate Performance Monitor before it is too late. Use Memory Advisor (a part of SQL Ultimate Performance Monitor) to check, if your server needs more memory.

Disclaimer:

In many articles you will find the execution times of different queries. Please do not interpret these results scientifically or as official benchmarks. You will not even find the configuration of the computer used for these experiments because it is rather unimportant.

What is important is that query A is running 2 times slower then query B. On your computer, it could be different: 1.5 or even 3 times slower. Read the full disclaimer