Sudden Catastrophic Performance Degradation
Imagine that one day you find that your
server is running 5 or even 10 times slower then before. What could cause that
degradation? New version of the software? New server configuration? Some problems
with hardware? It is very logical to check all this.
But is it possible that server is running
slower just without any real cause, but just because of the small and regular data
growth? Interestingly enough, yes, it is possible.
Let's make a pure experiment. I have a big
trace, created using SQL Profiler. It is not important, what is inside, I am
gust going to read it. It must contain at least 100K rows. This trace is saved
in a table called my MyTrace.
I limit the memory of my SQL server to 90M.
It is a very small number, but let's say it is a model
experiment in a ratio of 1:100. So in reality you have 9Gb
of memory on your production server.
I create a table where I log the results of
my experiment:
create table
log (n int, t int)
Then I run the following script:
declare @n int, @j int, @t1 datetime
set @n=100
while @n<100000
begin
select @j=count(*) from MyTrace
where substring(TextData,1,30)='nonsense' and RowNumber<@n
set @t1=getdate()
select @j=count(*) from MyTrace
where substring(TextData,1,30)='nonsense' and RowNumber<@n
insert into log select @n,datediff(ms,@t1,getdate())
set @n=@n+1000
end
GO
Look what I do inside. I search some
nonsense in my trace: at first, in first 100 rows, then in 1100 rows, then in
2K rows, etc. I do it twice to measure the time of the second, 'hot' execution.
The increase of @n emulates in our environment the daily database growth. There
is no surprise that the more records we have to scan, to longer it takes:

But the form of this chart is may be not expected.
Yes, it takes longer and longer, but then something happens at about 57K rows.
Well, what happens is pretty obvious: data no longer fit in memory, so we need
to read data physically. Let's build the graph where on Y we put not an
absolute time, but a relative time per row:

Initially it is about 0.007 ms per row, but
then on a short interval from 57K to 73K in raises to 0.2 ms per row. The first
time is a time of accessing a row from memory, while the second one is a time
of a physical disk read per row.
So, how it might look on your production?
For example, multiply the time by 10 (it is in milliseconds) and let's assume
that one line is per day. So, on Friday your system is healthy, it takes 3.8
seconds to run the query:

Next week on Monday it takes 9 seconds,
next day 11 seconds. On Wednesday it slows down to 20 seconds. Thursday it
degrades to 26, users start to receive 'timeout' error messages if timeout is
set to 30 seconds (like by default in .Net). Finally, on Friday it takes 60
seconds to run the query. Everything stops.
You should monitor memory cache hits ratio
using
SQL Ultimate Performance Monitor before it is too late. Use Memory Advisor (a part of
SQL Ultimate Performance Monitor) to
check, if your server needs more memory.