Playing with Page Level Compression and Indexes

      No Comments on Playing with Page Level Compression and Indexes

Row and Page Compression are features added to SQL Server 2008 and there are plenty of articles/blogs that discuss about it. I had the opportunity to play with a particular table I needed to archive and gathered some statistics while in the process.The particular table I was working with was relatively small (2.9 Million rows), but enough for me to understand the impact.

Let’s start with the basics:

Add a primary key to the table

ALTER TABLE deliverypackage ADD PRIMARY KEY (deliverypackage_id); — Size: 328.891 MB

Use page level compression

ALTER TABLE deliverypackage REBUILD PARTITION = ALL with (DATA_COMPRESSION = PAGE, SORT_IN_TEMPDB = ON) –- Size: 177.102 MB

Just saved 47% in storage space, nice.

Now, create a compressed index with an include that will help me with queries:

CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity)

WITH (DROP_EXISTING = OFF, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [OEN_ARCHIVE_DATA1];

— 77.148 MB on index

Crunch numbers for a day:

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

SELECT DATEPART(hour, datecreated) as hour, SUM(quantity) as quantity

FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’

GROUP BY DATEPART(hour, datecreated)

ORDER BY 1;

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 3 ms.

(15 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 1, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 7 ms.

*/

Plan showed an index seek operation; looks pretty good to me.

Now include dateremoved:

SELECT DATEPART(hour, dateremoved) as hour, SUM(quantity) as quantity

FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’

GROUP BY DATEPART(hour, dateremoved)

ORDER BY 1

/*

SQL Server parse and compile time:

CPU time = 6 ms, elapsed time = 6 ms.

(17 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 5, logical reads 18557, physical reads 0, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 46 ms, elapsed time = 65 ms.

*/

Wow, this one is not good at all. Let’s take a look at the plan:

Way too many reads, so let’s recreate the index and include dateremoved:

CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity, dateremoved)

WITH (DROP_EXISTING = ON, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [OEN_ARCHIVE_DATA1]

— 90.266 MB MB on index

Now query again:

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

(17 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 7 ms.

*/

Much better.

So what happens if I recreate the index with no compression instead?

CREATE INDEX IX_datecreated ON deliverypackage (datecreated) INCLUDE (quantity, dateremoved)

WITH (DROP_EXISTING = ON, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = NONE) ON [OEN_ARCHIVE_DATA2]

— 125.820 MB MB MB on index

SELECT DATEPART(hour, dateremoved) as hour, SUM(quantity) as quantity

FROM deliverypackage where datecreated >= ‘9/13/2010’ AND datecreated < ‘9/14/2010’

GROUP BY DATEPART(hour, dateremoved)

ORDER BY 1

/*

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 3 ms.

(17 row(s) affected)

Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘deliverypackage’. Scan count 1, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 4 ms.

*/

37 reads instead of 24 but the elapsed time is 4 ms. instead of 7 ms.

What I can infer from this small test is that compression allows us to reduce the amount of I/O at the expense of CPU cycles. Also, there is a huge difference when an index is tuned to satisfy a query without needing to look back at the row.

Share