Intel MKL 11.3 Beta has introduced Intel TBB support.
Intel MKL 11.3 can increase performance of applications threaded using Intel TBB. Applications using Intel TBB can benefit from the following Intel MKL functions:
- BLAS dot, gemm, gemv, gels
- LAPACK getrf, getrs, syev, gels, gelsy, gesv, pstrf, potrs
- Sparse BLAS csrmm, bsrmm
- Intel MKL Poisson Solver
- Intel MKL PARDISO
If such applications call functions not listed above, Intel MKL 11.3 executes sequential code. Depending on feedback from customers, future versions of Intel MKL may support Intel TBB in more functions.
Linking applications to Intel TBB and Intel MKL
The simplest way to link applications to Intel TBB and Intel MKL is to use Intel C/C++ Compiler. While Intel MKL supports static and dynamic linking, only dynamic Intel TBB library is available.
Under Linux, use the following commands to compile your application app.c and link it to Intel TBB and Intel MKL.
Dynamic Intel TBB, dynamic Intel MKL icc app.c -mkl -tbb
Dynamic Intel TBB, static Intel MKL icc app.c -static -mkl -tbb
Under Windows, use the following commands to compile your application app.c and link it to dynamic Intel TBB and Intel MKL.
Dynamic Intel TBB, dynamic Intel MKL icl.exe app.c -mkl -tbb
Improving Intel MKL performance with Intel TBB
Performance of Intel MKL can be improved by telling Intel TBB to ensure thread affinity to processor cores. Use the tbb::affinity_partitioner class to this end.
To improve performance of Intel MKL for small input data, you may limit the number of threads allocated by Intel TBB for Intel MKL. Use the tbb::task_scheduler_init class to do so.
For more information on controlling behavior of Intel TBB, see the Intel TBB documentation at https://www.threadingbuildingblocks.org/documentation.
LAPACK performance in applications using Intel TBB and Intel MKL 11.3
* Each call is single run of single size on range from 1000 to 10000 with step 1000. Performance (GFlops) is computed as cumulative number of floating point operations for all 10 calls divided by wall clock time from starting very first call till finishing very last call.