We recently migrated a database from SQL Server 2005 X86 to 2008 R2 X64. Running an Enterprise Edition License, we were able to take advantage of partitioning. It had several tables with similar structures; row count north of 2 Billion rows.
Unique constraint: TypeId, ContentData. Added InsertionDate to comply with partitioning requirements.
Table had a partition per month.
On the first week of each new month we marked the previous month partition as read-only.
A one-time File backup was performed to the previous month partition.
Deltas (new data) would always be located on the last partition
The current months’ partition would be the only piece left in Read-Write mode.
Searching for ContentData for a particular TypeId scanned as many times as partitions used
Bulk Data comparison between new and old TypeId + ContentData performed an index seek on each partition
1 2 3 4
SELECT ContentData INTO #temp FROM sampletable WHERE TypeId =41;
The table is mostly used for retrieving and comparing data. The retrieval call will always pass in TypeId as parameter. The system now performs an index scan for 1 partition (1 scan count). For data comparison, an index seek is performed for a unique value under 1 partition only. Data compression reduced I/O reads by 50%.
In the past, data extraction and comparison scanned through all the partitions generating excessive I/O calls, but we did not need to worry about continuously backing up older partitions as they were already flagged as read-only and backed up.
With this exercise we have effectively increased performance at the expense of maintenance. Well worth the change.
Not in every scenario. But as a rule of thumb, test before you perform a split operation in a production environment.
Every year around this time we plan, test and deploy partitions for tables that will hold historical data from the next year. We typically create one partition per month and in one case a partition per day (which has its own sliding window schedule every month).
This has been a very busy year, and between the migrations and consolidations a historical table did not have partitions added for October and November. I took the decision to keep the data as-is and just add a new partitions starting on 11/12; one for each day. This table holds around 30 million rows per day.
Now, we usually partition tables using RIGHT functions in order to have the last boundary empty and able to split it beforehand without any locking. The table I mention in the above paragraph is using a LEFT partition function and all the current data is sitting in the rightmost partition.
Issuing a Split partition of the rightmost data will let the engine scan and move the data for the current partition. This is something that can be fine in systems with a few million rows, but not with a partition holding 500+ million rows. We tried on a testing environment and as expected it locked the table for a very long time. DML operations were not able to be performed against it until the split succeeded.
1 2 3
-- Using new filegroup ALTERPARTITIONSCHEME[ps_Left_day_sliding]NEXTUSED[OEN_Lists_2010_11]; ALTERPARTITIONFUNCTION[pf_Left_day_sliding]()SPLITRANGE('2010-11-12 00:00:00.000');
This is the table which has a partition per day.
We are not going to purse that route. We will create a brand new table with proper RIGHT partitions and swap it with the current one. We have that liberty as the data is purely historical, and DBAs retrieve from it when required. They will know where to look at after reading at the documentation they write after performing such a change.
We went ahead and also tested with RIGHT just to double make sure. It worked just fine.