Playing with Page Level Compression and Indexes

Row and Page Compression are features added to SQL Server 2008 and there are plenty of articles/blogs that discuss about it. I had the opportunity to play with a particular table I needed to archive and gathered some statistics while in the process.The particular table I was working with was relatively small (2.9 Million rows), but enough for me to understand the impact.

Let’s start with the basics:

Add a primary key to the table

ALTER TABLE deliverypackage ADD PRIMARY KEY (deliverypackage_id); — Size: 328.891 MB

Use page level compression

ALTER TABLE deliverypackage REBUILD PARTITION = ALL with (DATA_COMPRESSION = PAGE, SORT_IN_TEMPDB = ON) –- Size: 177.102 MB

Just saved 47% in storage space, nice.

Now, create a compressed index with an include that will help me with queries:

CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity)

WITH (DROP_EXISTING = OFF, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [OEN_ARCHIVE_DATA1];

— 77.148 MB on index

Crunch numbers for a day:

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

SELECT DATEPART(hour, datecreated) as hour, SUM(quantity) as quantity

FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’

GROUP BY DATEPART(hour, datecreated)

ORDER BY 1;

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 3 ms.

(15 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 1, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 7 ms.

*/

Plan showed an index seek operation; looks pretty good to me.

Now include dateremoved:

SELECT DATEPART(hour, dateremoved) as hour, SUM(quantity) as quantity

FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’

GROUP BY DATEPART(hour, dateremoved)

ORDER BY 1

/*

SQL Server parse and compile time:

CPU time = 6 ms, elapsed time = 6 ms.

(17 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 5, logical reads 18557, physical reads 0, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 46 ms, elapsed time = 65 ms.

*/

Wow, this one is not good at all. Let’s take a look at the plan:

Way too many reads, so let’s recreate the index and include dateremoved:

CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity, dateremoved)

WITH (DROP_EXISTING = ON, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [OEN_ARCHIVE_DATA1]

— 90.266 MB MB on index

Now query again:

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

(17 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 7 ms.

*/

Much better.

So what happens if I recreate the index with no compression instead?

CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity, dateremoved)

WITH (DROP_EXISTING = ON, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = NONE) ON [OEN_ARCHIVE_DATA2]

— 125.820 MB MB MB on index

SELECT DATEPART(hour, dateremoved) as hour, SUM(quantity) as quantity

FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’

GROUP BY DATEPART(hour, dateremoved)

ORDER BY 1

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 3 ms.

(17 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 1, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 4 ms.

*/

37 reads instead of 24 but the elapsed time is 4 ms. instead of 7 ms.

What I can infer from this small test is that compression allows us to reduce the amount of I/O at the expense of CPU cycles. Also, there is a huge difference when an index is tuned to satisfy a query without needing to look back at the row.