Improving Performance of Math Functions with Intel® Math Kernel Library

Introduction

Intel® Math Kernel Library¹ (Intel® MKL) is a product that accelerates math processing routines to increase the performance of an application when running on systems equipped with Intel® processors. Intel MKL includes linear algebra, fast Fourier transforms (FFT), vector math, and statistics functions.

To illustrate performance improvement using Intel MKL, this paper selects matrix multiplication operation as an example. Matrix multiplication operation is used here because it is a fundamental mathematical operation that has many applications across most scientific fields.

Performance Test Procedure

To demonstrate how Intel MKL can help improve the performance of matrix operation, we used a code sample downloaded from GitHub.

The tests were done on two systems; one system equipped with the Intel® Xeon® processor E5-2699 v4 and the other equipped with the Intel® Xeon® Platinum 8180 processor.

The performance was measured by comparing the time, in seconds, it takes to compute the matrix multiplication.

The tests were done using the following steps:

Measuring the time (in seconds) it takes to complete 2000 x 2000, 4000 x 4000, and 10000 x 10000 matrix multiplications using different methods of optimization. Figure 1 shows how to specify the matrix sizes and optimized methods options. More information about these methods can be found in the link above.

Figure 1. Matrix size specifications and optimization method options.
Figure 1 shows different optimized methods. Option 2 optimizes the matrix multiplication using vectorized sdot with Intel® Streaming SIMD Extensions (Intel SSE) and option 7 utilizes option 2 with loop tiling. All measurements were collected on the system equipped with the Intel Xeon processor E5-2699 v4.
Comparing results and selecting the best results to be used for later steps.
Repeating steps 1 and 2 on the system equipped with the Intel Xeon Platinum 8180 processor.
Creating a new matrix multiplication function using Intel MKL in the file matmul.c.
This involved two steps:
a) Adding the mkl include file as follows:
#include <mkl.h></mkl.h>
b) Making a call to the mkl function cblas_sgemm as follows:
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_a_rows, n_b_cols, n_a_cols, 1.0f, a[0], n_a_rows, b[0], n_b_rows, 0.0f, m[0], n_a_rows);
Running the test again with the Intel MKL function implemented. Measuring the time it takes to do the matrix multiplication for 2000 x 2000, 4000 x 4000, and 10000 x 10000 matrices.
Comparing the results in step 5 with the best results in steps 2 and 3.

Test Configurations

Hardware

System #1

System: Preproduction
Processor: Intel Xeon processor E5-2699 v4 @ 2.2 GHz
Cores: 22
Memory: 256 GB DDR4

System #2

System: Preproduction
Processor: Intel Xeon Platinum 8180 @ 2.5 GHz
Cores: 28
Memory: 256 GB DDR4

Software

Ubuntu* 16.04 LTS
GCC* version 5.4.0
Intel MKL 2017

Test Results

Figure 2. Results of different optimized methods on different-sized matrices.

Figure 2 shows that the optimized method using explicit vectorized sdot with loop tiling performed the best on all sizes of matrices. The results of this method will be compared against those of the Intel MKL method.

Figure 3. Results of Intel® MKL on systems equipped with Intel® Xeon® processor E5-2699 v4 and Intel® Xeon® Platinum 8180 processor.

Figure 3 shows the results of the matrix multiplications using the Intel MKL method on systems equipped with the Intel Xeon processor E5-2699 v4 and the Intel Xeon Platinum 8180 processor.

Figure 4. Results with and without Intel® MKL on system equipped with the Intel® Xeon® Platinum 8180 processor.

Figure 4 shows the results of the matrix multiplications using the Intel MKL method and without the Intel MKL method on a system equipped with the Intel Xeon Platinum 8180 processor.

Conclusion

Intel MKL greatly improves the performance of Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) functions since it takes advantage of special features in the new generation of Intel processors such as Intel® Advanced Vector Extensions 512 that greatly speed up matrix operations. With Intel MKL, you don’t need to modify your source code to take advantage of new features of Intel processors. Just make sure to link the code to the latest version of Intel MKL to automatically detect and make use of new features in Intel Xeon processors.

References

1. Intel® Math Kernel Library

Improving Performance of Math Functions with Intel® Math Kernel Library

Introduction

Performance Test Procedure

Test Configurations

Test Results

Conclusion

References

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List