Query Performance Issue in SQL Server 2008 R2

We migrated from SQL Server 2005 X86 to SQL Server 2008 R2 X64 a couple of weeks ago. The new hardware has twice as much RAM, much better I/O and CPU power to spare. I have discussed about the first issues we encountered with Hyper-Threading enabled here.

We expected the new hardware to outperform the old one by at least 8 times based on tests, but unfortunately performance was on par with the old machine and even gave signs of degradation over time. I/O numbers were perfect but execution times were dismal. The same exact code running under SQL Server 2005 was doing fine.

We went ahead and performed:

  1. Re-indexed the main tables
  2. Recomputed statistics across the board
  3. Tuned the log files and configured proper growth sizes
  4. Cleaned up some application tables to reduce the amount of queries
  5. Introduced the clause OPTION (OPTIMIZE FOR UNKNOWN)
    into the main dynamic stored procedure

We gained just a bit of performance after every attempt, but nothing drastically. Some queries that should have returned results in a few seconds were taking 5 to 10 minutes. Every time the database was busy, CPU usage was over 80% and the wait was LATCH_EX [NESTING_TRANSACTION_FULL] which is for internal use only.

I decided to perform a deep dive and dissect the main query used for our application which performs a bunch of anti-semi joins. Due to company policy I am not able to show the whole query, but will highlight the areas that showed signs of trouble.

When viewing the query plan, it showed that the system will perform a nested loop against the table suppressionlistDomain which accounted for 1% of the cost of the whole query. The query is executed over 1,500 times a day.

This piece of code is not optimized, granted, but was working fine in SQL Server 2005. The table suppressionlistDomain was partitioned by date (3rd column) and had a clustered primary key on suptype_id, domainname and datecreated. Rowcount was 700,000 rows:

1
2
3
LEFT OUTER JOIN [oen_suppression].dbo.suppressionListDomain ld
ON (ld.suptype_id = @v_suptype_id AND ld.domainname = RIGHT(LOWER(RTRIM(LTRIM(tn.email))), CHARINDEX('@', REVERSE(tn.email))-1))
WHERE ld.suptype_id IS NULL

I was perplexed when I saw the amount of reads generated:

Table ‘suppressionlistDomain’. Scan count 16155936, logical reads 19945600, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Steps I followed:

  • Recreate the table without partitioning. Result:

Table ‘suppressionlistDomain_new’. Scan count 0, logical reads 598368, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So, by just taking partitioning out of the equation I was able to reduce the amount of reads to just 2.5% of the original value. Amazing right? But still way too many reads.

  • Move the join operation at the bottom of the original query, but join it against the original partitioned table. I no longer required to use functions to dissect the domain as I could join it with another table that was part of the query that had the domain value:
1
2
3
LEFT OUTER JOIN [oen_suppression].dbo.suppressionListDomain ld
ON (ld.suptype_id = @v_suptype_id AND ld.domainname = l.domainName)
WHERE ld.suptype_id IS NULL

Table ‘suppressionlistDomain’. Scan count 405000, logical reads 500000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Over 400K scans? And 500K logical reads? So definitely partitioning is making the optimizer nuts.

  • Move the join operation at the bottom of the original query, but join it against the new de-partitioned table. I no longer required to use functions to dissect the domain as I could join it with another table that was part of the query that had the domain value:
1
2
3
LEFT OUTER JOIN [oen_suppression].dbo.suppressionListDomain_new ld
ON (ld.suptype_id = @v_suptype_id AND ld.domainname = l.domainName)
WHERE ld.suptype_id IS NULL

Table ‘suppressionlistDomain_new’. Scan count 0, logical reads 15000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Over

15,000 logical reads!

In a nutshell, this was performed:

  1. Recreated the suppressionlistDomain table without partitioning
  2. Moved the joined condition to the bottom of the query
  3. Deprecated the function to extract domain

It is interesting to see that different versions of SQL Server can make things worse. In this particular case, our server was pegged at 85% CPU utilization due to the insane amount of logical reads performed. 20 million reads multiplied by 1,500 queries was definitely not a number we shall see in this environment.

After the change, queries now run in just a few seconds and the CPU utilization stayed below 20%.