How to Become a SQL Server Performance Expert
Is it possible to answer that question in few seconds? May be you tried to look at the performance counters in PerfMon. There are about 1000 different counters with funny names. You may ask: "Skipped Ghosted Records/sec" = 10, is it good or bad? The right answer is that it probably does not matter. However, there are counters that really do matter. This article will help you identify such counters and will teach you how to troubleshoot most common performance issues.
To start, install SQL Heartbeat tool from:
Connect to your server, and you will see two categories in a server tree:
Clicking on the "Online Activity/SQL Heartbeat" node will give you five different charts immediately. But it is better to click on "Historical Data" and to wait 1-2 days until more accurate metrics is accumulated. So, what can you expect to see on the diagrams? At first, you should take look at the Waits diagram:
By the way, did you notice a periodic pattern on this chart? For example, there is less activity during the weekends and there is additional activity during the night. To get a more complete picture of database health during normal business hours, you can exclude these hours from your analysis (especially since these are probably times when full backups are running). We can exclude such data to make our charts smoother:
The next step is to check which category is the worst. If the dominant color is blue or yellow (Reads or Writes), then your server is disk IO-bound. Here is an example:
Why it is I/O bound? Click on Physical R/W diagram. The problem can be in the workload, e.g. too many I/O operations:
Very high I/O activity can be caused by a poor Cache Hits ratio. You can click on the Cache Hits button to confirm:
Note: do not use "Cache Hits Ratio" reported by PerfMon. What is reported by PerfMon is nothing more than an "average temperature in the hospital" - it averages weekends and busy days, nightly index defrags and daily operations with different usage patterns.
Sometimes "Cache Hits" is good enough but the latency is poor. Click on the Seek Time button to confirm:
50 milliseconds (ms) means that server is only able to execute 20 I/O operations per second. That's about as fast as a good old floppy disk!
Some servers are CPU-bound, here is how a typical CPU-bound server my look like:
Generally, it is not bad if the server is CPU-bound as long as the absolute numbers are not very high (just keep in mind that Y scale indicates CPU milliseconds par second of server time on all processors, so on 4 processors you can have up to 4000ms CPU per second). But if the absolute value for CPU utilization is high, then click on the Query Stats button. You will see the TOP 10 problematic queries sorted by CPU, Reads, Writes. These queries will be your candidates for optimization.
By the way, you can get all these reports daily on your email. Just configure it in SQL Heartbeat (click on the Reports button in the toolbar).
Now there is really only one minor item left. Do you see the red spike above? These are locks. Any types of locks are dangerous. One hour of CPU time per day is OK but 1 minute of lock time can result in timeouts and unhappy customer. Deadlocks are even worse.
To troubleshoot locks and deadlocks, install SQL Deadlock Detector from:
Connect to the same server and check the list of events after a couple of hours of data collection.
Sometimes locks include several dozens of processes. From within SQL Deadlock Detector, you can get all the details about processes participating in locks:
The information provided includes names of the locked tables, exact T-SQL statements and a parent stored procedures.
You can quickly determine if your SQL Server is healthy or not. In fact, you CAN become a SQL Server performance expert when it comes to most common performance issues if you just follow the steps above.
To get started: