Data Compression before or after loading operations? [SQL Server]

We have a process that collects and archives anywhere from 30 to 100 million rows a day. I asked one of my team members to research and come up with a suggestion for us to load the data and keep it partitioned by day. Then apply a sliding window to the oldest partitions.

As per BOL, we went ahead and tested by inserting into a compressed heap and then creating a clustered index on that table:

1
2
3
4
5
6
7
8
9
-- Insert into Partitioned Heap
BEGIN TRAN
INSERT INTO [History_201010].[dbo].[Buffer_History_Stage] ([column1], [column2], [column3], [column4])
SELECT [column1], [column2], [column3], [column4]
FROM [History].[dbo].[Buffer_History_Archive]
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

–Results
(28034332 row(s) affected)
— Time to Complete: 00:09:38

— Create compressed, partitioned clustered index (sort in tempdb)
BEGIN TRAN
ALTER TABLE [History_201010].[dbo].[Buffer_History_Stage] ADD CONSTRAINT [PK_Buffer_History_Stage] PRIMARY KEY CLUSTERED
(
[column1] ASC,
[column2] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION=PAGE) ON [ps_history] ([datecreated])
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

— Time to Complete: 00:07:19

–Switch Partition to Permanent Table (should be instantaneous)
BEGIN TRAN
ALTER TABLE [History_201010].[dbo].[Buffer_History_Stage] SWITCH PARTITION 1 TO [History_201010].dbo.[Buffer_History] PARTITION 1;
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

— Time to Complete: 00:00:00
— Total Time to Complete: 00:16:57

In comparison, we attempted to just directly insert into a permanent table:

1
2
3
4
5
6
7
8
9
--Insert into Compressed, Partitioned Clustered Index directly
BEGIN TRAN
INSERT INTO [History_201010].[dbo].[Buffer_History] ([column1], [column2], [column3], [column4])
SELECT [column1], [column2], [column3], [column4]
FROM [History].[dbo].[Buffer_History_Archive]
IF @@ERROR=0
COMMIT TRAN
ELSE
ROLLBACK TRAN

–Results
— (28034332 row(s) affected)
— Total Time to Complete: 00:12:43

We tested this several times on an idle Test box, and were looking just for time; not really looking at the IO Statstics.

Based on what we have experienced, it appears that it is more efficient to just immediately insert into a compressed table, with a partitioned clustered index.

One difference between our test and what Microsoft was suggesting was that they performed a bulk insert. We used data already in another database for the same instance.  Our tests were ran with 28 Million rows.