Data Compression before or after loading operations? [SQL Server]

We have a process that collects and archives anywhere from 30 to 100 million rows a day. I asked one of my team members to research and come up with a suggestion for us to load the data and keep it partitioned by day. Then apply a sliding window to the oldest partitions.

As per BOL, we went ahead and tested by inserting into a compressed heap and then creating a clustered index on that table:

1
2
3
4
5
6
7
8
9

-- Insert into Partitioned Heap
BEGIN TRAN
INSERT INTO [History_201010].[dbo].[Buffer_History_Stage] ([column1], [column2], [column3], [column4])
SELECT [column1], [column2], [column3], [column4]
FROM [History].[dbo].[Buffer_History_Archive]
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

–Results
(28034332 row(s) affected)
— Time to Complete: 00:09:38

— Create compressed, partitioned clustered index (sort in tempdb)
BEGIN TRAN
ALTER TABLE [History_201010].[dbo].[Buffer_History_Stage] ADD CONSTRAINT [PK_Buffer_History_Stage] PRIMARY KEY CLUSTERED
(
[column1] ASC,
[column2] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION=PAGE) ON [ps_history] ([datecreated])
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

— Time to Complete: 00:07:19

–Switch Partition to Permanent Table (should be instantaneous)
BEGIN TRAN
ALTER TABLE [History_201010].[dbo].[Buffer_History_Stage] SWITCH PARTITION 1 TO [History_201010].dbo.[Buffer_History] PARTITION 1;
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

— Time to Complete: 00:00:00
— Total Time to Complete: 00:16:57

In comparison, we attempted to just directly insert into a permanent table:

1
2
3
4
5
6
7
8
9

--Insert into Compressed, Partitioned Clustered Index directly
BEGIN TRAN
INSERT INTO [History_201010].[dbo].[Buffer_History] ([column1], [column2], [column3], [column4])
SELECT [column1], [column2], [column3], [column4]
FROM [History].[dbo].[Buffer_History_Archive]
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

–Results
— (28034332 row(s) affected)
— Total Time to Complete: 00:12:43

We tested this several times on an idle Test box, and were looking just for time; not really looking at the IO Statstics.

Based on what we have experienced, it appears that it is more efficient to just immediately insert into a compressed table, with a partitioned clustered index.

One difference between our test and what Microsoft was suggesting was that they performed a bulk insert. We used data already in another database for the same instance. Our tests were ran with 28 Million rows.

OZ

Tech enthusiast blogs about technology, databases, media, gadgets and communities

Data Compression before or after loading operations? [SQL Server]

Related

Share this:

Related