Row and Page Compression are features added to SQL Server 2008 and there are plenty of articles/blogs that discuss about it. I had the opportunity to play with a particular table I needed to archive and gathered some statistics while in the process.The particular table I was working with was relatively small (2.9 Million rows), but enough for me to understand the impact.
Let’s start with the basics:
Add a primary key to the table
ALTER TABLE deliverypackage ADD PRIMARY KEY (deliverypackage_id); — Size: 328.891 MB
Use page level compression
ALTER TABLE deliverypackage REBUILD PARTITION = ALL with (DATA_COMPRESSION = PAGE, SORT_IN_TEMPDB = ON) –- Size: 177.102 MB
Just saved 47% in storage space, nice.
Now, create a compressed index with an include that will help me with queries:
CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity)
WITH (DROP_EXISTING = OFF, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [OEN_ARCHIVE_DATA1];
— 77.148 MB on index
Crunch numbers for a day:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT DATEPART(hour, datecreated) as hour, SUM(quantity) as quantity
FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’
GROUP BY DATEPART(hour, datecreated)
ORDER BY 1;
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
(15 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘deliverypackage’. Scan count 1, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 7 ms.
*/
Plan showed an index seek operation; looks pretty good to me.
Now include dateremoved:
SELECT DATEPART(hour, dateremoved) as hour, SUM(quantity) as quantity
FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’
GROUP BY DATEPART(hour, dateremoved)
ORDER BY 1
/*
SQL Server parse and compile time:
CPU time = 6 ms, elapsed time = 6 ms.
(17 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘deliverypackage’. Scan count 5, logical reads 18557, physical reads 0, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 65 ms.
*/
Wow, this one is not good at all. Let’s take a look at the plan:
Way too many reads, so let’s recreate the index and include dateremoved:
CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity, dateremoved)
WITH (DROP_EXISTING = ON, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [OEN_ARCHIVE_DATA1]
— 90.266 MB MB on index
Now query again:
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(17 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘deliverypackage’. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 7 ms.
*/
Much better.
So what happens if I recreate the index with no compression instead?
CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity, dateremoved)
WITH (DROP_EXISTING = ON, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = NONE) ON [OEN_ARCHIVE_DATA2]
— 125.820 MB MB MB on index
SELECT DATEPART(hour, dateremoved) as hour, SUM(quantity) as quantity
FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’
GROUP BY DATEPART(hour, dateremoved)
ORDER BY 1
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
(17 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘deliverypackage’. Scan count 1, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
*/
37 reads instead of 24 but the elapsed time is 4 ms. instead of 7 ms.
What I can infer from this small test is that compression allows us to reduce the amount of I/O at the expense of CPU cycles. Also, there is a huge difference when an index is tuned to satisfy a query without needing to look back at the row.