1. Overview
ZLIB is a lossless data compression method and software library by Jean-loup Gailly and Mark Adler initially released in 1995 and became “de-facto” standard of lossless data compression. ZLIB is inherent part of almost all operating systems based on Linux*, including Android, OS X* and versions for embedded and mobile platforms. Many applications, including the software packages as HTTP servers, use ZLIB as one (and sometimes, as the only) data compression methods.
Intel® Integrated Performance Primitives (Intel® IPP) library has functionality supporting and optimizing ZLIB library since Intel® IPP version 5.2. Unlike other ZLIB implementations, Intel® IPP functions for ZLIB optimize not only data compression part, but decompression operations too.
This article describes how Intel® IPP supports ZLIB, Intel® IPP for ZLIB distribution model, recent changes in Intel® IPP ZLIB functionality in version 2017, provides performance data obtained on different Intel® platforms.
2. ZLIB and Intel® IPP Implementation
The distribution model of Intel® IPP for ZLIB is as follows:
- Intel® IPP library package provides files for source code patching for all ZLIB major versions – from 1.2.5.3 to 1.2.8. This patches should be applied to ZLIB source code files downloaded from ZLIB repositories at zlib.net (latest ZLIB version), or zlib.net/fossils for previous versions of ZLIB;
- After patch file applied, the source code contains a set of conditional compilation constructions with respect to WITH_IPP definition. For example (from file deflate.c):
send_bits(s, (STATIC_TREES<<1)+last, 3); #if !defined(WITH_IPP) compress_block(s, (const ct_data *)static_ltree,(const ct_data *)static_dtree); #else { IppStatus status; status = ippsDeflateHuff_8u( (const Ipp8u*)s->l_buf, (const Ipp16u*)s->d_buf, (Ipp32u)s->last_lit, (Ipp16u*)&s->bi_buf, (Ipp32u*)&s->bi_valid, (IppDeflateHuffCode*)static_ltree, (IppDeflateHuffCode*)static_dtree, (Ipp8u*)s->pending_buf, (Ipp32u*)&s->pending ); Assert( ippStsNoErr == status, "ippsDeflateHuff_8u returned a bad status" ); } send_code(s, END_BLOCK, static_ltree); #endif
So, when source code file is compiled with no WITH_IPP definition, the original ZLIB library is built. If “-DWITH_IPP” compiler option is used, the Intel® IPP-enabled ZLIB library produced. Of course, several other compiler/linker options are required to build ZLIB with IPP (look below).
Intel® IPP library has the following functions to support ZLIB functionality:
Common functions:
- ippsAdler32_8u,
- ippsCRC32_8u
For compression (deflate):
- ippsDeflateLZ77Fast_8u,
- ippsDeflateLZ77Fastest_8u,
- ippsDeflateLZ77Slow_8u,
- ippsDeflateHuff_8u,
- ippsDeflateDictionarySet_8u,
- ippsDeflateUpdateHash_8u
For decompression (inflate):
- ippsInflateBuildHuffTable,
- ippsInflate_8u.
6 source code files are patched in ZLIB source code tree with the optimized Intel® IPP functions:
- adler32.c,
- crc32.c,
- deflate.c,
- inflate.c,
- inftrees.h,
- trees.c.
In general, the most compute intensive parts of ZLIB code are substituted with Intel® IPP function calls, all common/service parts of ZLIB remain intact.
3. What’s New in Intel® IPP 2017 Implementation of ZLIB
Intel® IPP 2017 adds some significant enhancement for the ZLIB optimization code, including a faster CPU-specific optimization code, a new “fastest“ compression level with the best compression performance, deflate parameters tuning support, and additional compression levels support:
3.1 CPU-Specific Optimizations
Intel® IPP 2017 functions provide the additional optimization for new Intel® platforms. For particular ZLIB needs, Intel® IPP library 2017 contains the following optimizations:
- Checksum computing using modern Intel® CPU instructions;
- Hash table operation using modern Intel® CPU instructions;
- Huffman tables generation functionality;
- Huffman tables decomposition during inflating;
- Additional optimization of pattern matching algorithms (new in Intel® IPP 2017)
3.2 New Fastest Compression Level
Intel® IPP 2017 for ZLIB implementation introduces a brand new compression level with best compression performance. This is done by simplifying pattern matching, thus by slightly decreasing compression ratio.
New compression level – called “fastest” – got numeric code of “-2” to distinguish it from ZLIB “default” compression level (Z_DEFAULT_COMPRESSION = -1).
The value of compression level decrease can be seen from the following table:
Data Compression Corpus | Ratio (level “fast”, 1)/ Performance* (MBytes/s) | Ratio (level “fastest”, -2) )/ Performance* (MBytes/s) |
Large Calgary | 2.80 / 86 | 2.10 (-0.7) / 197 (+111) |
Canterbury | 3.09 / 107 | 2.26 (-0.83)/ 294(+187) |
Large (3 files) | 3.10 / 97 | 2.01 (-1.09)/ 209(+112) |
Silesia | 2.80 / 89 | 2.16(-0.64) / 194(+105) |
Note: "Compression ratio” in the table above is geometric mean of ratios of uncompressed file sizes to compressed file sizes, “performance” is number of input data megabytes compressed per second measured on Intel® Xeon® processor E5-2680 v3, 2.5 GHz, single thread.
3.3 Deflate Parameters Tuning
To give additional freedom in tuning of data compression parameters, in Intel® IPP 2017 for ZLIB the original deflateTune function is activated:
ZEXTERN int ZEXPORT deflateTune OF((z_streamp strm, int good_length, int max_lazy,
int nice_length, int max_chain));
The purpose and usage of function parameters is the same as in original ZLIB deflate algorithm. The modified deflate function itself loads the pattern matching parameters from configuration_table array of deflate.c with some pre-defined sets for each compression level.
3.4 Additional Compression Levels
The deflateTune function parameters give a freedom to modify compression search algorithm to obtain best “compression ratio”/”compression performance” ratio for particular customer needs. Nevertheless, the process of finding optimal parameter set is not straightforward, because actual behavior of compress functionality highly depends on input data specifics.
Intel® IPP team has done several experiments with different data and fixed some parameter sets as additional compression levels. The level values and input data characteristics are in the table below.
Additional compression levels | Input data |
11-19 | General data (text documents, binary files) of large size (greater than 1 MB) |
21-29 | Highly-compressible data (database tables, text documents with repeating phrases, large uncompressed pictures like BMPs, PPMs) |
These sets are stored in array configuration_table in the file deflate.c. The affect to compression ratio in the levels, for example, from 11 to 19 in the same as of original levels from 1 to 9. That is, higher level provides better compression.You may use these sets, or discover your own.
4. Getting Started With Intel® IPP 2017 ZLIB
The process of preparation of Intel® IPP boosted Zlib library is described in readme.html file provided with Intel® IPP “components” package. It is explained how to download Zlib source code files from its site, how to un-archive, patch source code files and how to build Intel® IPP-enabled Zlib for different needs (static or dynamic Zlib libraries, statically or dynamically linked to Intel® IPP).
5. Usage Notes for Intel® IPP ZLIB Functions
5.1 Using the "Fastest" Compression Level
In order to obtain better compression performance, keeping ZLIB (deflate) compatibility, the new “fastest” compression method is implemented. It is light-weight compression, which
- Doesn’t look back in the dictionary to find a better match;
- Doesn’t collect input stream statistics for better Huffman-based coding.
This method corresponds to compression level “-2” and can be used as follows:
z_stream str_deflate;
str_deflate.zalloc =NULL;
str_deflate.zfree =NULL;
deflateInit(&str_deflate,-2);
The output (compressed) stream, generated with “fastest” compression is fully compatible with “deflate” standard and can be decompressed using regular ZLIB.
5.2 Tuning Compression Level
In Intel® IPP 2017 product ZLIB-related functions use table of substring marching parameters to control compression ratio and performance. This table, defined as configuration_table in deflate.c file contains the sets of four values. They are max_chain, good_length, nice_length and max_lazy. The description of these values is in the table below:
Value | Description |
max_chain | Maximum number of searches in the dictionary for better (higher matching length) substring match. Reasonable value range is 1-8192. |
good_length | If substring of this or higher length is matched in the dictionary, the maximum number of searches for this particular input string is reduced fourfold. Reasonable value range is 4-258. |
nice_match | If substring of this or higher length is matched in the dictionary, the search is stopped. Reasonable value range is 4-258. |
max_lazy | If this or wider substring is found in dictionary:
|
Note: the final results of compression ratio and performance highly depends on input data specifics.
The actual values of parameters are shown in the table below
Compression level | Deflate function | max_chain | good_length | nice_match | max_lazy |
1 | Fast | 4 | 8 | 8 | 8 |
2 | 4 | 16 | 16 | 9 | |
3 | 4 | 16 | 16 | 12 | |
4 | 48 | 32 | 32 | 16 | |
5 | Slow | 32 | 8 | 32 | 16 |
6 | 128 | 8 | 256 | 16 | |
7 | 144 | 8 | 256 | 16 | |
8 | 192 | 32 | 258 | 128 | |
9 | 256 | 32 | 258 | 258 |
These values were chosen for similar compression ratios with original open-source ZLIB on standard data compression collections. You can try your own combinations of matching values using deflateTune ZLIB function.For example, to change max_chain value from 128 to 64, and thus to speedup compression with some compression ratio degradation you need to do the following:
z_stream str_deflate;
str_deflate.zalloc =NULL;
str_deflate.zfree =NULL;
deflateInit(&str_deflate, Z_DEFAULT_COMPRESSION);
deflateTune(&str_deflate, 8, 26, 256, 64);
…
deflateEnd(&str_deflate);
Note, that the string matching parameters changed for all subsequent compression operations (ZLIB deflate calls) with str_deflate object, until it is destroyed, or re-initialized with deflateReset function call.
5.3 Using additional Compression Levels
Some input data sets for compression can have specifics, for example input data can be long, or input data can be highly compressible.
For this specific data we introduced additional compression level, which are in fact function calls of the same “fast” or “slow” compression functions, but with different sets of string matching values. The new compression levels are the following:
- From 11 to 19 – compression levels for big input data buffers (1 Mbyte and longer);
- From 21 to 29 – compression levels for highly-compressible data (compression ratio of 30x and more).
For example, for levels 6 and 16 on “Large” data compression corpus on Intel® Xeon® processor E5-2680 v3, the “geomean” results are:
Level | Ratio | Compression Performance (in Mbyte/sec) |
6 | 3.47 | 17.7 |
16 | 3.46 | 19.9 |
For levels, 6 and 26 on some synthetic highly-compressible data on Intel® Xeon® processor E5-2680 v3, the “geomean” results are:
Level | Ratio | Compression Performance (in Mbyte/sec) |
6 | 218 | 768 |
26 | 218 | 782 |
Note: These levels are “experimental” and don’t guarantee improvements on all input data.