Introduction
Compression algorithms traditionally use either a dynamic or static compression table. Those who want the best compression results use a dynamic table at the cost of more processing time, while the algorithms focused on throughput will use static tables. The Intel® Intelligent Storage Acceleration Library (Intel® ISA-L) semi-dynamic compression comes close to getting the best of both worlds. Testing shows the usage of semi-dynamic compression and decompression is only slightly slower than using a static table and almost as efficient as algorithms that use dynamic tables. This article's goal is to help you incorporate Intel ISA-L’s semi-dynamic compression and decompression algorithms into your storage application. It describes prerequisites for using Intel ISA-L, and includes a downloadable code sample, with full build instructions. The code sample is a compression tool that can be used to compare the compression ratio and performance of Intel ISA-L’s semi-dynamic compression algorithm on a public data set with the output of its open source equivalent, zlib*.
Hardware and Software Configuration
CPU and Chipset | Intel® Xeon® processor E5-2699 v4, 2.2 GHz
|
Platform | Platform: Intel® Server System R2000WT product family (code-named Wildcat Pass)
|
Memory | Memory size: 256 GB (16X16 GB) DDR4 2133P Brand/model: Micron – MTA36ASF2G72PZ2GATESIG |
Storage | Brand and model: 1 TB Western Digital (WD1002FAEX) Plus Intel® SSD Data Center P3700 Series (SSDPEDMD400G4) |
Operating System | Ubuntu* 16.04 LTS (Xenial Xerus) Linux* kernel 4.4.0-21-generic |
Note: Depending on the platform capability, Intel ISA-L can run on various Intel® processor families. Improvements are obtained by speeding up the computations through the use of the following instruction sets:
- Intel® Advanced Encryption Standard New Instruction (Intel® AES-NI)
- Intel® Streaming SIMD Extensions (Intel® SSE)
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Advanced Vector Extensions 2 (Intel® AVX2)
Why Use Intel® Intelligent Storage Library (Intel® ISA-L)?
Intel ISA-L has the ability to compress and decompress faster than zlib* with only a small sacrifice in the compression ratio. This capability is well suited for high throughput storage applications. This article includes a sample application that simulates a compression and decompression scenario where the output will show the efficiency. Click on the button at the top of this article to download.
Prerequisites
Intel ISA-L supports Linux and Microsoft Windows*. A full list of prerequisite packages can be found here.
Building the sample application (for Linux):
- Install the dependencies:
- a c++14 compliant c++ compiler
- cmake >= 3.1
- git
- autogen
- autoconf
- automake
- yasm and/or nasm
- libtool
- boost's "Filesystem" library and headers
- boost's "Program Options" library and headers
boost's "String Algo" headers
>sudo apt-get update
>sudo apt-get install gcc g++ make cmake git zlib1g-dev autogen autoconf automake yasm nasm libtool libboost-all-dev
You also need the latest versions of isa-l and zlib. The get_libs.bash script can be used to get them. The script will download the two libraries from their official GitHub* repositories, build them, and then install them in `
./libs/usr
` directory.>`bash ./libs/get_libs.bash
`- Build from the `ex1` directory:
- `mkdir <build-dir>`
- `cd <build-dir>`
- `cmake -DCMAKE_BUILD_TYPE=Release $OLDPWD`
- `make`
Getting Started with the Sample Application
The sample application contains the following files:
This example goes through the following steps at a high-level work flow and focuses on the “main.cpp” and “bm_isal.cpp” files:
Setup
1. In the “main.cpp” file, the program parses the command line and displays the options that are going to be performed.
int main(int argc, char* argv[]) { options options = options::parse(argc, argv);
Parsing the option of the command line
2. In the options.cpp file, the program parses the command line arguments using `options::parse()`.
Create the benchmarks object
3. In the “main.cpp” file, the program will benchmark each raw file using a compression-level inside the benchmarks::add_benchmark() function. Since the benchmarks do not run concurrently, there is only one file “pointer” created.
benchmarks benchmarks; // adding the benchmark for each files and libary/level combination for (const auto& path : options.files) { auto compression = benchmark_info::Method::Compression; auto decompression = benchmark_info::Method::Decompression; auto isal = benchmark_info::Library::ISAL; auto zlib = benchmark_info::Library::ZLIB; benchmarks.add_benchmark({compression, isal, 0, path}); benchmarks.add_benchmark({decompression, isal, 0, path}); for (auto level : options.zlib_levels) { if (level >= 1 && level <= 9) { benchmarks.add_benchmark({compression, zlib, level, path}); benchmarks.add_benchmark({decompression, zlib, level, path}); } else { std::cout << "[Warning] zlib compression level "<< level << "will be ignored\n"; } } }
Intel® ISA-L compression and decompression
4. In the “bm_isal.cpp” file, the program performs the compression and decompression on the raw file using a single thread. The key functions to note are isal_deflate and isal_inflate. Both functions accept a stream as an argument, and this data structure holds the data about the input buffer, the length in bytes of the input buffer, and the output buffer and the size of the output buffer. end_of_stream indicates whether it will be last iteration.
std::string bm_isal::version() { return std::to_string(ISAL_MAJOR_VERSION) + "." + std::to_string(ISAL_MINOR_VERSION) + "." + std::to_string(ISAL_PATCH_VERSION); } bm::raw_duration bm_isal::iter_deflate(file_wrapper* in_file, file_wrapper* out_file, int /*level*/) { raw_duration duration{}; struct isal_zstream stream; uint8_t input_buffer[BUF_SIZE]; uint8_t output_buffer[BUF_SIZE]; isal_deflate_init(&stream); stream.end_of_stream = 0; stream.flush = NO_FLUSH; do { stream.avail_in = static_cast<uint32_t>(in_file->read(input_buffer, BUF_SIZE)); stream.end_of_stream = static_cast<uint32_t>(in_file->eof()); stream.next_in = input_buffer; do { stream.avail_out = BUF_SIZE; stream.next_out = output_buffer; auto begin = std::chrono::steady_clock::now(); isal_deflate(&stream); auto end = std::chrono::steady_clock::now(); duration += (end - begin); out_file->write(output_buffer, BUF_SIZE - stream.avail_out); } while (stream.avail_out == 0); } while (stream.internal_state.state != ZSTATE_END); return duration; } bm::raw_duration bm_isal::iter_inflate(file_wrapper* in_file, file_wrapper* out_file) { raw_duration duration{}; int ret; int eof; struct inflate_state stream; uint8_t input_buffer[BUF_SIZE]; uint8_t output_buffer[BUF_SIZE]; isal_inflate_init(&stream); stream.avail_in = 0; stream.next_in = nullptr; do { stream.avail_in = static_cast<uint32_t>(in_file->read(input_buffer, BUF_SIZE)); eof = in_file->eof(); stream.next_in = input_buffer; do { stream.avail_out = BUF_SIZE; stream.next_out = output_buffer; auto begin = std::chrono::steady_clock::now(); ret = isal_inflate(&stream); auto end = std::chrono::steady_clock::now(); duration += (end - begin); out_file->write(output_buffer, BUF_SIZE - stream.avail_out); } while (stream.avail_out == 0); } while (ret != ISAL_END_INPUT && eof == 0); return duration; }
5. When all compression and decompression tasks are complete, the program displays the results on the screen. All temporary files are deleted using benchmarks.run()
.
Execute the sample application
In this example, the program will run as a single thread through the compression and decompression functions of the Intel ISA-L and zlib.
Run
From the ex1 directory:
cd <build-bir>/ex1
./ex1 --help
Usage
Usage: ./ex1 [--help] [--folder <path>]... [--file <path>]... : --help display this message --file path use the file at 'path' --folder path use all the files in 'path' --zlib-levels n,... coma-separated list of compression level [1-9] • --file and --folder can be used multiple times to add more files to the benchmark • --folder will look for files recursively • the default --zlib-level is 6
Test corpuses are public data files designed to test the compression and decompression algorithms, which are available online (for example, Calgary and Silesia corpuses). The --folder option can be used to easily benchmark them: ./ex1 --folder /path/to/corpus/folder.
Running the example
As Intel CPUs have integrated PCI-e* onto the package, it is possible to optimize access to solid-state drives (SSD) and avoid a potential performance degradation for accesses over an Intel® QuickPath Interconnect (Intel® QPI)/Intel® Ultra Path Interconnect (Intel® UPI) Intel QPI/Intel UPI link. For example, if you have a two-socket (two CPU) system with a PCI-e SSD, this SSD will be attached to either one of the sockets. If the SSD is attached to socket 1 and the program accessing the SSD is being accessed on socket 2, these requests and the data have to go over the Intel QPI/Intel UPI link that is used to connect the sockets together. To avoid this potential problem you can find out which socket the PCI-e SSD is attached to and then set thread affinity so that program runs on the same socket as the SSD. The following commands shows the list of PCI-e devices attached to the system where it can find ‘ssd’ in the output. For example:
lscpi –vvv | grep –i ssd cd /sys/class/pci_bus
05:00.0
is the PCI* identifier and can be used to get more details from within Linux.
cd /sys/class/pci_bus/0000:05/device
This directory includes a number of files that give additional information about the PCIe device, such as make, model, power settings, and so on. To determine which socket this PCIe device is connected to, use:
cat local_cpulist
The output returned looks like the following:
Now we can use this information to set thread affinity, using taskset:
taskset -c 10 ./ex1..
For the `-c 10
`option, this number can be anything from 0 to 21 as those are the core IDs for the socket this PCI-e SSD is attached to.
The application runs with the taskset
command assigns to core number 10 which should give the output below. If the system does not have a PCI-e SSD, the application can just run without the taskset command.
Program output displays a column for the compression library, either ‘isa-l’ or ‘zlib’. The table shows the compression ratio (compressed file/raw file), and the system and processor time that it takes to perform the operation. For decompression, it just measures the elapsed time for the decompression operation. All the data was produced on the same system.
Notes: 2x Intel® Xeon® processor E5-2699v4 (HT off), Intel® Speed Step enabled, Intel® Turbo Boost Technology disabled, 16x16GB DDR4 2133 MT/s, 1 DIMM per channel, Ubuntu* 16.04 LTS, Linux kernel 4.4.0-21-generic, 1 TB Western Digital* (WD1002FAEX), 1 Intel® SSD P3700 Series (SSDPEDMD400G4), 22x per CPU socket. Performance measured by the written sample application in this article.
Conclusion
This tutorial and its sample application demonstrates one method through which you can incorporate the Intel ISA-L compression and decompression features into your storage application. The sample application’s output data shows there is a balancing act between processing time (CPU time) and disk space. It can assist you in determining which compression and decompression algorithm best suits your requirements, then help you to quickly adapt your application to take advantage of Intel® Architecture with the Intel ISA-L.
Other Useful Links
- Accelerating your Storage Algorithms using Intelligent Storage Acceleration Library (ISA-L) video
- Accelerating Data Deduplication with ISA-L blog post
Authors
Thai Le is a software engineer who focuses on cloud computing and performance computing analysis at Intel.
Steven Briscoe is an application engineer focusing on cloud computing within the Software Services Group at Intel Corporation (UK).
Notices
System configurations, SSD configurations and performance tests conducted are discussed in detail within the body of this paper. For more information go to http://www.intel.com/content/www/us/en/benchmarks/intel-product-performance.html.
This sample source code is released under the Intel Sample Source Code License Agreement.