Migration to 2008 R2. Not as easy as Copy & Paste [Performance]

We have gone through several migrations already, mixing and matching sources (SQL 2000, 2005, X86, X64, Enterprise to Standard, etc). We now got most of our databases running under SQL Server 2008 R2.

In every instance we have found performance issues related to HyperThreading, bad plans, missing registry key or outdated iSCSI drivers. We have reached a point where we are now ready to analyze and solve the challenges soon after the migration has been performed.

We recently migrated a core DB, from 2005 to 2008 R2, from X86 to X64 and from Enterprise Edition to Standard Edition. System was effective but the application was sporadically timing out. After digging through the logs, collecting info based on server side traces, and using Adam Machanic’s sp_whoisactive tool, we found the culprit. Eight services pulling data using 1 stored procedure was bringing the box to it’s knees. With updated stats, better hardware and more memory available, we never expected this to happen. But it did.

We added an index and narrowed the scope of a where clause inside the stored procedure. The procedure went from ~30 mins and ~100 million reads to 10 seconds and ~15,000 reads.

Small changes, huge differences. The key is to spot the issue, understand and work it around. Then schedule a production push and QA the process.

Reads out of hand

Things look very good now.

Summarizing Performance issues and workarounds after migrating from 2005 to 2008 R2 [SQL Server]

Based on our experience, having Hyper Threading (HT) enabled on one particular node let to I/O operations take a very long time. This was on a Windows 2008 R2 Cluster running SQL Server 2008 R2. An interesting fact was that it was neither reflected in the wait stats nor in the pssdiag we ran for Microsoft support.

The way we noticed low I/O was just by watching the OS counters for physical disk. I wrote about it here and here.

After fixing the issue by disabling HT, we started experiencing a very high CPU utilization due to a excessive amount of logical reads (20 million per query). This was due to a really bad plan. Our processes were performing anti-semi joins with tables that were partitioned and  the code that was performing extremely bad in 2008 R2 while doing just fine in 2005. I wrote about it here.

We pinpointed it out by running Adam Machanic’s sp_whoisactive while under high load  (which can be downloaded from here).

We also ran server side traces to find out the most expensive operations by sorting the highest I/O and CPU utilization metrics.

With the steps above we were able to tune the offending processes and go from 85% sustained CPU utilization to almost nil.

High CPU Utilization