Migration to 2008 R2. Not as easy as Copy & Paste [Performance]

We have gone through several migrations already, mixing and matching sources (SQL 2000, 2005, X86, X64, Enterprise to Standard, etc). We now got most of our databases running under SQL Server 2008 R2.

In every instance we have found performance issues related to HyperThreading, bad plans, missing registry key or outdated iSCSI drivers. We have reached a point where we are now ready to analyze and solve the challenges soon after the migration has been performed.

We recently migrated a core DB, from 2005 to 2008 R2, from X86 to X64 and from Enterprise Edition to Standard Edition. System was effective but the application was sporadically timing out. After digging through the logs, collecting info based on server side traces, and using Adam Machanic’s sp_whoisactive tool, we found the culprit. Eight services pulling data using 1 stored procedure was bringing the box to it’s knees. With updated stats, better hardware and more memory available, we never expected this to happen. But it did.

We added an index and narrowed the scope of a where clause inside the stored procedure. The procedure went from ~30 mins and ~100 million reads to 10 seconds and ~15,000 reads.

Small changes, huge differences. The key is to spot the issue, understand and work it around. Then schedule a production push and QA the process.

Reads out of hand

Things look very good now.

Table Structure, Trading Maintenance with Performance [SQL Server]

We recently migrated a database from SQL Server 2005 X86 to 2008 R2 X64. Running an Enterprise Edition License, we were able to take advantage of partitioning. It had several tables with similar structures; row count north of 2 Billion rows.

Old Structure:

  1. 3 fields
    1. TypeId INT
    2. ContentData VARCHAR(100)
    3. insertionDate DATETIME
  2. Unique constraint: TypeId, ContentData. Added InsertionDate to comply with partitioning requirements.
  3. Table had a partition per month.

Pros

  1. On the first week of each new month we marked the previous month partition as read-only.
  2. A one-time File backup was performed to the previous month partition.
  3. Deltas (new data) would always be located on the last partition
  4. The current months’ partition would be the only piece left in Read-Write mode.

Cons

  1. Searching for ContentData for a particular TypeId scanned as many times as partitions used
  2. Bulk Data comparison between new and old
    TypeId + ContentData performed an index seek on each partition
1
2
3
4
SELECT ContentData
INTO #temp
FROM sampletable
WHERE TypeId  = 41;
1
Table 'sampletable'. Scan count 92, logical reads 7613, physical reads 223, read-ahead reads 7808, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Original Plan

Original Plan

New Structure:

  1. 3 fields
    1. TypeId INT
    2. ContentData VARCHAR(100)
    3. insertionDate DATETIME
  2. Unique constraint: TypeId, ContentData.
  3. Table has a partition per TypeId (range of 50 items each, e.g. 1,2, … ,50).
  4. Index on insertionDate with TypeId as include
  5. Page Compression enabled.

Pros

  1. Searching for a particular TypeId + ContentData generates a unique index seek plan, 1 scan count (1 partition).
  2. Scanning for ContentData for a particular TypeId generates an index scan for only 1 partition
  3. Bulk Data comparison between new and old TypeId + ContentData performs an index seek in one partition only
  4. Compression reduced I/O reads by 50%

Cons

  1. All datafiles are in read-write mode. Every file needs to considered in the backup maintenance plan
  2. Any partition can suffer block update even if it holds data from 4 years ago (cannot be tagged as historical)
  3. Retrieving deltas performs an index seek on every partition
1
2
3
4
SELECT ContentData
INTO #temp
FROM sampletable
WHERE TypeId  = 41;
1
Table 'sampletable'. Scan count 1, logical reads 5449, physical reads 66, read-ahead reads 5445, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
New Plan

New Plan

Conclusion:

The table is mostly used for retrieving and comparing data. The retrieval call will always pass in TypeId as parameter. The system now performs an index scan for 1 partition (1 scan count). For data comparison, an index seek is performed for a unique value under 1 partition only. Data compression reduced I/O reads by 50%.

In the past, data extraction and comparison scanned through all the partitions generating excessive I/O calls, but we did not need to worry about continuously backing up older partitions as they were already flagged as read-only and backed up.

With this exercise we have effectively increased performance at the expense of maintenance. Well worth the change.

Incorrect Bios Settings Can Reduce Database Performance [SQL Server]

Sam Saffron commented on my blog post asking me if I could share my thoughts on an issue he was experiencing. This was on a newly migrated SQL Server 2008 R2 database solution running on Nehalem Processor based CPUs. I went ahead and shared what I experienced in my environment.

It turned out that the issue was not Hyper Threading related, but a Bios misconfiguration instead. This is an excerpt of his blog post:

In the process of asking the question and researching it ourselves, we discovered the answer. These Dell servers were inexplicably shipped with BIOS settings that …

  • did not allow the host operating system to control the CPU power settings
  • did not set the machine to high performance mode
  • did not scale CPU speed under load properly

… kind of the worst of all worlds. But Kyle quickly flipped a few BIOS settings so that the machine was set to “hyperspeed mode”, and performance suddenly got a lot better. How much better?

A conclusion of this experience, check for Bios Configurations. Dell needs to tell their customers which settings can be beneficial for certain applications, like SQL Server.

Summarizing Performance issues and workarounds after migrating from 2005 to 2008 R2 [SQL Server]

Based on our experience, having Hyper Threading (HT) enabled on one particular node let to I/O operations take a very long time. This was on a Windows 2008 R2 Cluster running SQL Server 2008 R2. An interesting fact was that it was neither reflected in the wait stats nor in the pssdiag we ran for Microsoft support.

The way we noticed low I/O was just by watching the OS counters for physical disk. I wrote about it here and here.

After fixing the issue by disabling HT, we started experiencing a very high CPU utilization due to a excessive amount of logical reads (20 million per query). This was due to a really bad plan. Our processes were performing anti-semi joins with tables that were partitioned and  the code that was performing extremely bad in 2008 R2 while doing just fine in 2005. I wrote about it here.

We pinpointed it out by running Adam Machanic’s sp_whoisactive while under high load  (which can be downloaded from here).

We also ran server side traces to find out the most expensive operations by sorting the highest I/O and CPU utilization metrics.

With the steps above we were able to tune the offending processes and go from 85% sustained CPU utilization to almost nil.

High CPU Utilization

Microsoft Says there is nothing wrong with SQL Server 2008 R2

Over 3 weeks ago we solved the issue we were having with one  particular cluster node running SQL Server 2008 R2. We have been going back and forth with Microsoft to obtain an explanation on why the erratic behavior of the RDBMS engine.

Continue reading