Troubleshooting General SQL Server Performance Issues

This article is for DBA's who have just started with SQL Server, it is geared towards an easy to follow steps to troubleshoot general SQL Server performance issues.

Part 1 :- Capture Data


Download Pssdiag from http://support.microsoft.com/kb/830232 and follow the below steps to capture date

How to capture the diagnostic info I requested using PSSDIAG:

  1. Create a folder named PSSDIAG on your SQL server machine.  This folder should be on a drive with plenty of space as diagnostic file collections can be quite large.
  2. Save the file attached to this email to your PSSDIAG folder as pssd.exe.
  3. Open a command prompt.  Change the current directory to your PSSDIAG folder and run pssd.exe to extract its contents.
  4. Run PSSDIAG.EXE at the command prompt to start the collection process.
  5. Once PSSDIAG displays the message, “PSSDIAG Started,” attempt to reproduce your issue.
  6. Stop PSSDIAG by pressing CTRL+C.
     
Part 2 :- Analyse Data

The following steps should be followed either to do first iteration baseline performance analysis to diagnose a problem or to do monthly performance maintenance.

Step 1: Run PSSDIAG when the problem is occurring (or during normal busy time periods and/or critical job processing for performance maintenance.)  Get 30 minutes worth of data (look at the “Output” folder - located in the same folder where PSSDIAG was installed on the machine to see how fast the files are growing.)   

Step 2: After PSSDIAG has been collected, there are some files that need to be reviewed immediately when doing first-iteration-baseline diagnostics.  Those files are listed below:

Note: The naming convention for the files are “ServerName_”

ServerName_run_sp_blocker_pss80.out  - Blocker script output.

ServerName_sp_trace.trc – This is the naming convention for the first .trc file generated (the SQL Profiler output file.)  If subsequent .trc files are generated because the MB value for the file has been reached, additional files will be created with the naming conventions below:

ServerName_sp_trace_1.trc
ServerName_sp_trace_2.trc

Pssdiag.blg – This is the output generated by System Monitor.

ServerName_SQL_Best_Practices_Boot.INI_startup.txt – Contains boot.ini settings.

ServerName_sp_sqldiag_shutdown.out – SQLDIAG.exe output generated.

ServerName_syslog_shutdown.txt – System Event Log Information.

ServerName_applog_shutdown.txt – Application Event Log Information

Step 3: The first of these files that needs to be reviewed is the output generated by the “ServerName_sp_sqldiag_shutdown.out” report.  The following items need to be extracted from the report related to the machine that SQL Server is installed on in order to perform further diagnostics:

ServerName_sp_sqldiag_shutdown.out” – What to look for:

There are some things that can be determined right away from looking at this output.  From the SQL ERRORLOGS be on the lookout for SQL Server versions that are not up to the minimum level.

In the “sp_configure” section look for changes to default settings (full list is in SQL Books Online.) Typically the only configuration that may require changing (vanilla out of the box) would be “awe” if the server has sufficient physical memory to take advantage of extended memory.  In that case the “awe” setting would need to be changed from the default of 0 to 1.  Additionally (in a case where awe is enabled), change the Max Server Memory setting to a specific value (based on memory available.)  When extended memory is used, if Max Server Memory is left at the default, upon instantiation, SQL Server will acquire as much memory as is available.

The table below shows information that should be obtained multiple sections of SQLDIAG.  Find the data and record (to be used later during analysis.)

General Baseline Data for Server: ServerName
Where to find information in SQLDIAG.txt
What to look for in SQLDIAG.txt
Record Findings Here:
Top portion of ERRORLOG
ServerName SQLDIAG run on machine (this of course should be the same name in the “ServerName_” portion of all the PSSDIAG reports): 

      “
Microsoft SQL Server 2000 Edition/Version:

      “
Server is listening on:

      “
Check for error messages and dump output.

Search on “virtual”
Cluster: Yes/No       If yes, number of nodes:                    Active/Passive?  Active/Active?

Search on “Fullinstallver”
MDAC Version:

Search on “sp_configure”

awe

     “
max degree of parallelism

     “
light weight pooling

     “
affinity mask

     “
priority boost

     “
set working size

     “
max worker threads

     “
max server memory (MB)

     “
min server memory (MB)

     “
network packet size

Search on “xp_msver”
ProcessorCount (Logical if hyper- threading turned on in BIOS) – How many?                                                                                                                     

     “
PhysicalMemory MB – How much?

Search on “sp_helpdb”
Are the “IsAutoCreateStatistics” and “IsAutoUpdateStatistics” settings turned on?  If yes, for which databases?

Search on “sysdevices”
What drives are indicated for all databases in “sysdevices?”

Search on “sysdatabases”
What drives are indicated for all databases in “sysdatabases?”




Step 4:  In keeping with the overall idea of just getting information about the server, now would be a good time to look at “ServerName_MSINFO32.TXT”, “ServerName_applog_Shutdown.txt”, “ServerName_syslog_Shutdown.txt”, and “ServerName_SQL_Best_Practices_BOOT.INI_Startup.TXT.”

Where to find information in MSINFO32.TXT
What to look for in MSINFO32.TXT
Record Findings Here:
Top portion of System Information Report
The version of the operating system that SQL Server is running on.


Where to find information in applog_Shutdown and syslog_Shutdown
What to look for in applog_Shutdown and syslog_Shutdown
Record Findings Here:
Throughout the report
Look for general “need to investigate” items (error messages related to SQL Server or that could affect SQL Server.)  Note: If the default settings for the ERROR logs generated by sqldiag.exe have been altered, or if the ERROR logs that contain data that is needed for a specific set of diagnostics has been overwritten – the applog and syslog may be the only record of a possible problem.


Where to find information in BOOT.INI
What to look for in BOOT.INI
Record Findings Here:

Look to see if /3GB or /PAE (or both) are configured.



Step 5:  The output generated by the SQL Blocker script should probably be reviewed next - ServerName_run_sp_blocker_pss80.out.  There are specific pieces of information to look for but how one goes about getting that data is based on personal preference and the size of the file. Recall that the blocker script output records a collection of data in time snaps for sysprocesses, input buffer, statistical data etc. “FINDSTR” commands can be used to find trends (certain waittypes, SPIDs at the head of the blocking chain etc.)  The following methodology can be used to review blocking script output (but is not the only way.)

1. Scroll to the bottom of the blocker script output report.  Do a find on “waittime” going “up” and do a cursory check of what is detected that is over 100 milliseconds (along with the associated waittypes.)  By doing this it is possible to see if there are queries taking a long time to execute, what the waittypes are found in the report (regardless of whether or not the associated duration is longer than 100 milliseconds.)   There are several types of waittypes which are explained in: KB 244455 - Definition of Sysprocesses Waittype and Lastwaittype Fields http://support.microsoft.com/default.aspx?scid=kb;en-us;244455

2. From the top of the report do a find on “spids at the head.”  Doing this will bring display SPIDs that are at the head of the blocking chain.  If you see the same SPID number repeated over multiple time snaps and the input buffer is the same – this query or stored procedure should be investigated (get details from the input buffer and then use the SQL Profiler trace output to get the full text of the query so that an execution plan can be generated.)

3. Other information can be obtained from the blocker script output related to resources that a particular query or stored procedure is waiting on, RECOMPILES, tempdb and much more.  Keep in mind that there can be blocking due to locks as well as system resources or both. 

Step 6:  SQL Profiler output “ServerName_sp_trace.trc” can be analyzed in several different ways.  The first (and easiest) is simply to use a tool called “ReadTrace” http://support.microsoft.com/kb/944837.)  By following the simple instructions several trends can be found related to queries with the longest duration, highest number of reads, writes, CPU etc.  Based on the results of Read80Trace, queries that should be reviewed for tuning can be identified.  The threshold number for long duration queries are those taking greater than 100 milliseconds to run (the data is reported for “duration” in milliseconds.)  High reads is 250, 000.  Note: These numbers are used as guidelines (for example a query that takes 2 minutes may be acceptable if it runs once a quarter while a query that takes 30 seconds to run might be unacceptable if that same query is executed hundreds of times in a short time span.) 

The second way to use SQL Profiler data is in conjunction with information obtained from the blocker script output (when the complete text of a query of stored procedure identified as a blocker needs to be obtained so that the execution plan can be generated.

Third, the SQL Profiler data can be used to check for problems related to transactions (for example to see if a BEGIN TRAN is missing a COMMIT etc.)

Step 7:  The last set of data to be reviewed is PSSDIAG.BLG which is collected by the System Monitor for several object counters by default.  Which counters and what threshold to monitor depends upon the type of analysis being performed (baseline-first-iteration or the third or fourth iteration with more specific analysis being performed on a problem area detected in earlier rounds.)  This is an area where there are several camps in terms of what needs to be reviewed as a baseline. Since customer environments vary greatly what is a “bad” number for a counter at one site may not be at another.  It is best to keep in mind that these counter values are guidelines only and can vary significantly from site to site depending upon the type of counter.  There are approximately three different lists that detail the minimum items to review for a baseline.  The following list has been sufficient for baseline-first-iteration analysis.

System Monitor
Processor Object:
% Processor Time – sustained periods of 75% or higher (not good.)
Memory Object
Available Mbytes: <= 20 MB (not good.)
Free System Page Table Entries < 7000 (If less then 7000 and the /3GB switch is configured, consider removing /3GB.)
Pages/Sec > 50 This counter is an indicator of the kinds of faults that cause system-wide delays.
Pages Input/sec> 10 (not good.)
SQL Server: MemoryManager Object
Memory Grants Pending> 1 (not good.)
Target Server Memory (KB):  See below.
Total Server Memory (KB): See below.

If Target Server Memory is higher than Total Server Memory – SQL Server may need more memory than is available.
SQL Server: Buffer Manager Object
Buffer Cache Hit Ratio: <90% (bad)
Free Pages: >= 200 – 1000 (good.)
Lazy Writes/sec: Between 0 – 1 (good) >20 (bad) If the value is high, this is an indicator of the need for more memory.
Page Life Expectancy: <300 (bad.)
Page reads/sec: >90 (bad)
Page writes/sec: >90 (bad)
Physical Disk
General guideline:
Avg Disk sec/transfer or read or write = < 20ms excellent, 20-30ms good, 30-40ms avg, 40+ms poor looking at the average times. If Max values are high then need to look at graph to see if sustained.
Current Disk Queue Length = < 2 is good

Step 8:  Get execution plans for the queries and stored procedures that have met the following criteria: 1) Long duration times 2) Blocking 3) Have short duration times but are executed repeatedly hundreds of times.  The execution plans can be obtained by running SET commands in Query Analyzer or by running the “Detailed” version of PSSDIAG (the latter may have to be used to get the execution plan for queries or stored procedures that update.)  The SET commands are below:

set statistics profile on
set statistics io on
set statistics time on
Query
set statistics profile off
set statistics io off
set statistics time off

After reviewing the execution plan, update statistics if indicators are present that the statistics are not accurate. Then either tune the query manually or via the Index Tuning Wizard / Database Engine Tuning Advisor.  Note: Although the Index Tuning Wizard /Database Engine Tuning Advisor can be useful, it cannot be a complete substitute for a manual review of the execution plan.

If the queries and stored procedures have been tuned (and tested one at a time) and there is still a performance problem – start looking at system resources.

Step 9:  Repeat steps 1, 4 (applog/syslog portions only), 5, 6, 7, and 8 until the system shows performance stability.  Note: Follow steps 1 -8 if PSSDIAG was run for regular performance tuning maintenance



Thanks
Huzeifa Bhai

No comments:

Post a Comment