Wednesday, January 29, 2014

Data compression sql server

How data compression works

SQL Server stores its databases in large files on logical disks/volumes, which in turn represent a physical storage medium such as a RAID set, SAN drives or just a simple local disk such as a harddrive or SSD. These database files are internally subdivided into pages, where each page holds up to 8 kB of data (including a small 96-byte header). Different types of pages are used for data rows, indexes, blobs as well as a few other special applications that are beyond the scope of this article. Whenever SQL Server accesses data, it reads/writes entire pages at a time.

With this in mind, data compression i SQL Server can be performed on rows or pages that contain indexes or data.

Row compression compresses individual column values in each row. The gain for column values of different datatypes can vary a lot, because different column values can be compressed differently (see this article on MSDN for details).

Page compression compresses entire leaf-level pages, i.e. the pages that contain the leaf levels of clustered indexes or heaps. Generally speaking, using page level compression is often a better choice than row-level compression, but your milage may vary. The specifics of page compression can be read in this MSDN article.

The T-SQL

Adding compression to a table or index hardly requires any code at all. Here, compression is set on a table/heap:

CREATE TABLE dbo.tbl (
    a    int NOT NULL,
    b    varchar(100) NULL
) WITH (DATA_COMPRESSION=PAGE);
There are three valid, mutually exclusive arguments for the DATA_COMPRESSION switch:

ROW, to enable row-level compression,
PAGE, to enable page-level compression, and
NONE, to disable compression
The same switch can be used when creating indexes or primary key constraints.

CREATE TABLE dbo.tbl (
    a    int NOT NULL,
    b    varchar(100) NULL,
    CONSTRAINT PK_tbl
        PRIMARY KEY CLUSTERED (a)
        WITH (DATA_COMPRESSION=PAGE)
);
CREATE UNIQUE INDEX tbl_ix1
    ON dbo.tbl (a)
    WITH (DATA_COMPRESSION=PAGE);
For tables without clustered indexes (heaps) the data compression setting on the table defines if and how data is compressed. For tables with clustered indexes, the data compression setting on the table or clustered index defines how data is compressed, because it is the clustered index that actually stores and organizes the data.

Individual non-clustered indexes can also be compressed, and you’ll have to add the DATA_COMPRESSION switch to each of them. This means that you can mix non-compressed tables with compressed (non-clustered) indexes or vice versa if you should need to.

To read up on heaps and indexes, see the article Indexing basics.

Benefits of data compression

Depending on how your server is set up, a significant performance bottleneck may be in I/O speed, i.e. how fast SQL Server can read and write the physical storage medium. One way to speed up I/O is to use data compression, because when you compress data that is stored on the physical disks, it results in smaller amounts of physical bits and bites to read/write, which is faster.

However, there are no free lunches, and in this case the trade-off is that when you compress data you need a certain amount of processing power to do the actual compression/decompression work, which means that the choice of whether or not to use compression is a balance between how fast your I/O subsystem is vs. how much processing power you can spare.

When not to use compression

First, not all tables/indexes are compressible. If the (uncompressed) potential maximum size of your row would exceed the size of a page (8 kB minus the compression overhead), you could still create the table without compression, although you will get a warning message, but you won’t be able to create a compressed table. This applies for instance to variable-size datatypes such as varchar, nvarchar, vardecimal, etc. This, however, is a minor problem if you stick to good design practices.

Second, if you’re building a database that is going to be deployed on a non-Enterprise Edition server, you may not want to design optimizations around the fact that you have data compression built in.

Third, and most important, maybe your disks are faster than your processor cores. Depending on your server setup, uncompressed may just be faster.

Backup compression

Backup compression compresses backup files to save disk space and improve backup and restore performance, but does not really affect day-to-day database performance. Read more on backup compression on MSDN.

No comments:

Post a Comment