We have a process that collects and archives anywhere from 30 to 100 million rows a day. I asked one of my team members to research and come up with a suggestion for us to load the data and keep it partitioned by day. Then apply a sliding window to the oldest partitions.
As per BOL, we went ahead and tested by inserting into a compressed heap and then creating a clustered index on that table:
1 2 3 4 5 6 7 8 9 | -- Insert into Partitioned Heap BEGIN TRAN INSERT INTO [History_201010].[dbo].[Buffer_History_Stage] ([column1], [column2], [column3], [column4]) SELECT [column1], [column2], [column3], [column4] FROM [History].[dbo].[Buffer_History_Archive] IF @@ERROR=0 COMMIT TRAN ELSE ROLLBACK TRAN |
–Results
(28034332 row(s) affected)
— Time to Complete: 00:09:38
— Create compressed, partitioned clustered index (sort in tempdb)
BEGIN TRAN
ALTER TABLE [History_201010].[dbo].[Buffer_History_Stage] ADD CONSTRAINT [PK_Buffer_History_Stage] PRIMARY KEY CLUSTERED
(
[column1] ASC,
[column2] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION=PAGE) ON [ps_history] ([datecreated])
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN
— Time to Complete: 00:07:19
–Switch Partition to Permanent Table (should be instantaneous)
BEGIN TRAN
ALTER TABLE [History_201010].[dbo].[Buffer_History_Stage] SWITCH PARTITION 1 TO [History_201010].dbo.[Buffer_History] PARTITION 1;
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN
— Time to Complete: 00:00:00
— Total Time to Complete: 00:16:57
In comparison, we attempted to just directly insert into a permanent table:
1 2 3 4 5 6 7 8 9 | --Insert into Compressed, Partitioned Clustered Index directly BEGIN TRAN INSERT INTO [History_201010].[dbo].[Buffer_History] ([column1], [column2], [column3], [column4]) SELECT [column1], [column2], [column3], [column4] FROM [History].[dbo].[Buffer_History_Archive] IF @@ERROR=0 COMMIT TRAN ELSE ROLLBACK TRAN |
–Results
— (28034332 row(s) affected)
— Total Time to Complete: 00:12:43
We tested this several times on an idle Test box, and were looking just for time; not really looking at the IO Statstics.
Based on what we have experienced, it appears that it is more efficient to just immediately insert into a compressed table, with a partitioned clustered index.
One difference between our test and what Microsoft was suggesting was that they performed a bulk insert. We used data already in another database for the same instance. Our tests were ran with 28 Million rows.