|
Let’s say you have 2 different applications, A and B, and only have one physical server. What is a best way to share that server between these 2 applications? You have several options:
As you may have learned from one of my previous articles, VMware and SQL Server do not mix; so it shouldn’t even be considered. So you’re left with 2 choices: 2 separate databases or 2 different instances. But which is better? In some cases, you may not have a choice but to install 2 different instances. The following are some reasons why you might want to follow this approach:
Fortunately, in many cases, an ordinary application just follows the connection string you provide: login, server name, databases. So you can create any number of databases on your server and share it between such applications. It’s true that in 90% of cases, simple applications do not create logins, do not reboot the server, do not drop or create databases. So, using separate databases provides them safe sandboxes, right? Let’s verify. First I create 2 databases, A and B, on one instance of SQL server (2000). Please set recovery model to Simple, and pre-allocate 100Mb for data and 50Mb for log. Then, set the instance’s memory fixed at 70Mb:
Of course, it is a ridiculously small amount of memory for a production server; but we are making a small-scale experiment. Instead of 70MB, think about 8GB, and instead of 90MB tables, we’re going to create them using the following script: create table
Docs (n int identity, k int not null, v varchar(128)) create table
Big (n int identity, k int not null, v varchar(128)) GO insert into
Docs (k,v) select 1, 'this is a test' GO declare @t datetime, @n int set @t=getdate() set @n=18 while @n>0
begin insert into Docs (k,v) select k+n,v from Docs set @n=@n-1 end GO create clustered
index PP on Big
(n) GO declare @n int
set @n=8 while @n>0
begin insert into Big (k,v) select k,v from Docs set @n=@n-1 end GO checkpoint Think about huge tables. What is real here is that a table size (90MB, >2M rows) is bigger, than the server memory (70MB). That script must be executed on both databases: A and B. Now, we can perform a full table scan from both databases. Execute the following scripts several times: -- Scan A declare @t datetime
set @t=getdate() select max(v) from A..Big select datediff(ms,@t,getdate()) -- Scan B declare @t datetime
set @t=getdate() select max(v) from B..Big select datediff(ms,@t,getdate()) Ignore the duration of the first executions; we are interested in last:
For some reason, database B works much faster than database A. I don’t have a good explanation; I guess it’s because of the fragmentation of the disk or maybe it’s created on different sectors of the disk. Anyway, both scans represent some heavy read activity, usual for the OLAP/Reporting system. Now for an OLTP test, we randomly read 1000 records from different places: -- Random A declare @t datetime
set @t=getdate() declare @k bigint, @n int, @v varchar(128) set @n=1000 set @k=1 while @n>0
begin set @k=@k+77777 select @v=v from A..Big where n=@k%2000000 set @n=@n-1 end select datediff(ms,@t,getdate()) GO Make sure you clear the SQL Server cache before execution, using: DBCC DROPCLEANBUFFERS The first execution time was 9193ms, while the second execution time was only 20ms! It reminds us about two facts, well known before but are easily overseen today, because their effects are more and more masked by RAIDs and powerful disk arrays, making them less evident. However, these facts remain to be true: Fact 1: A hard drive is not a random access device (for databases). As you see, it took the same time (9 seconds for the ‘cold’ server) to read 2’000’000 records using a sequential read (table scan) and to read 1000 records in random order. Sequential read was 2000 times faster! * This
is one of the reasons, why object-oriented databases (even when they are more
logical and closer to the commonly-accepted OO ideology) could not win over relational
databases. OO databases processed data object by object, generating random read
patterns, and could not compete with relational ones Fact 2: A ‘cold’ execution (SQL Server has been just started, or is accessing data for the very first time) can be more than 450 times (9193/20) slower, than subsequent executions, when data is cached. *
That reminds me something that I experienced. In one company, timeout for the
IIS .Net applications was 30 seconds (default). Because of the effect mentioned
above, the execution time of some complicated queries, which normally took 1-2
seconds to execute, took 40-50 seconds (on a ‘cold’ server) which caused a
timeout error. Of
course, testers did not know it. So, every time QA server had been rebooted, I
had 5-10 reopened issues in bug tracker about timeouts. Everything was ok the
second time. Interestingly, many people could not believe that execution time
can even vary by as much as two times, so I had to make a small demo to prove
it. Is it applicable for the modern, powerful disk arrays and RAID controllers with a big cache? Yes, but in such configurations DBCC DROPCLEANBUFFERS is not enough – it flushes the cache of SQL Server, but not the controllers’. On a powerful production system, I had the following results:
But let’s return to our main subject. Our query on database A is ‘hot’. But then as time passes, there is activity in the other database, and the data in cache is gone: So we run script Random A (RA) several times, and get the same execution time – 20ms. Then we run Scan on database B (SB): RA=20, RA=20, SB=2983, RA=20 – no effect! But when we do it twice or more times: RA=20, RA=20, SB=
2863, SB=3086, RA=7963 (!!!) SQL Server 2005 has another caching strategy, and the same effect is much more difficult to reproduce. On the contrary, nothing like this happens when you use 2 instances. With 2 instances (each having a half of a memory, 35Mb) RA execution time is always 20ms, no matter how many scans are executed on another instance. However, with less memory (less caching and worse cache hits ratio), the Scan time degrades. For Scan A, it degraded from about 4200 ms to 7600 ms. So, when multiple databases share the same server, high activity in one of them rebalances the SQL Server cache so the most active database benefits from it, while the less active can suffer from lack of resources – similar to a “DOS” attack. So, should we install 2 instances instead? Is it the solution? There are many cons for having 2 instances:
So, unless you have some fatal issues that
do not allow you to put everything on the same server, the best thing to do is
to use the rule of “Occam’s razor” - entities should not be
multiplied beyond necessity. |
|
Disclaimer:
In many articles you will find the execution times of different queries. Please do not interpret these results scientifically or as official benchmarks. You will not even find the configuration of the computer used for these experiments because it is rather unimportant. What is important is that query A is running 2 times slower then query B. On your computer, it could be different: 1.5 or even 3 times slower. Read the full disclaimer » | |
|
|