This article describes a new inspector-executor interface for Intel® Math Kernel Library (Intel® MKL) Sparse BLAS functionality, featured in the Intel Math Kernel Library Sparse Matrix Vector Multiply Format Prototype Package (Intel MKL SpMV Format Prototype Package). The package demonstrates an inspector-executor approach for sparse matrix-vector and sparse matrix-matrix multiplication, sparse matrix addition and triangular sparse system solvers.
IntroductionThe implementation of Sparse BLAS functionality provided in Intel MKL 11.2 [1] is based on the NIST Sparse BLAS C implementation. This API uses single function call for any computational operation and does not allow passing optimization information between function calls. This limits certain aggressive optimizations like balancing based on matrix sparsity patterns, matrix reordering, and even matrix format changes. These optimizations require time compared to one sparse-matrix vector multiplication and become beneficial only when multiple operations are performed with a single matrix, such as in iterative solvers.
The inspector-executor API uses a two-step approach to operations. An analysis stage is used to inspect the matrix sparsity pattern and apply matrix structure changes. Then this information is reused on each subsequent call to improve performance.
Package contentsThe Intel MKL SpMV Format Prototype Package features key Sparse BLAS operations used as building blocks for iterative sparse solvers and covers all the functionality available in Intel MKL:
- Sparse matrix-vector multiplication
- Sparse matrix-matrix multiplication with sparse or dense result
- Triangular system solution
- Sparse matrix addition
The API offers consistent support for C and Fortran style data layouts (row- and column-major) and indexing (zero-based and one-based), and combinations of these.
The package supports three key sparse matrix formats:
- compressed sparse row (CSR), the most popular sparse matrix format,
- block sparse row (BSR), a version of CSR for matrices with regular block structure
- ellpack sparse block (ESB), with irregular block structure, which provides performance speedup on Intel® Xeon Phi™ coprocessors [2].
Two other widely used formats, coordinate (COO) and compressed sparse column (CSC), are converted to one of the internal representation during execution.
The implementation has prototype status. The API and user experience are subject to change in future updates. The current implementation has the following limitations:
- Only double precision data is supported
- ESB format is only supported for matrix-vector and matrix-matrix multiplication
Each operation supported by the inspector-executor model consists of four basic stages. The example below shows a typical call sequence for implementing the iterative method.
- Initialization: a matrix handle is created for storing the matrix type description, indices, and values. User-provided arrays are reused if possible.
// create handle which stores user data in CSR format
MKL_INT indexing = SPARSE_INDEX_BASE_ZERO;
error = mkl_sparse_create_d_csr(&handle, indexing, rowIndx, ColIndx, values);
- Inspection (optional): the matrix structure is analyzed and information is stored in the handle. Optimizations are applied to the matrix structure if feasible.
// run matrix structure analysis and store optimization data in handle
MKL_INT op = SPARSE_OPERATION_NON_TRANSPOSE;
error = mkl_sparse_set_mv_hint(handle, op, matrix_descr, n_iter);
// optimize matrix described by the handle. It uses optimization hints that should be set up before this call
error = mkl_sparse_optimize(handle);
- Execution: operation uses the best kernel and balancing strategy identified at the inspection stage. If inspection data is not available, generic kernels and balancing strategy are used.
for (int i = 0; i < n_iter; i++) {
error = mkl_sparse_d_mv(op, 1.0, handle, matrix_descr, x_i, 0.0, y_i);
…
}
- Cleanup: Free any memory used by the handle.
// remove unnecessary data
error = mkl_sparse_destroy(handle);
API overview
The Intel MKL SpMV Format Prototype Package uses handle to store data in an internal representation with additional internal arrays. The API allows several operations with this handle, as described in the following sections.
Converters from general matrix sparse representation to internals:
mkl_sparse_create_d_{csr|coo|csc|bsr}(…) - Create internal matrix representation (handle);
mkl_sparse_copy(…)– Duplicate handle;
mkl_sparse_destroy(…)– Remove handle;
mkl_sparse_convert_{csr|bsr}(…)– Convert handle from {csr|coo|csc|bsr} format to {csr|bsr};
mkl_sparse_d_set_value(…) - Modify matrix value in handle;
Optimizations routines
mkl_sparse_set_{mv| mm| spmm| trsv| trsm}_hint(…)– Set matrix operations to be executed during the execution phase for matrix-vector multiplication, matrix-matrix multiplication, or triangular solver routines.
mkl_sparse_set_memory_hint(…)– Describe ability to work with additional memory;
mkl_sparse_optimize(…) – Optimize internal data based on the matrix operations and the amount of memory set by hint routines;
Executions routines
mkl_sparse_d_{mv| mm| spmm| trsv| trsm}– Perform matrix-vector multiplication, matrix-matrix multiplication, or triangular solver routine.
mkl_sparse_d_add– Compute sum of sparse matrices.
Performance resultsThe following chart shows the performance results of the Intel MKL SpMV Format Prototype Package.
Figure 1: Performance comparison of Intel MKL SpMV Format Prototype Package triangular solver and Intel MKL on Intel® Xeon® processors with matrices from the Florida collection [3]
Figure 2: Performance comparison of triangular solver Intel MKL SpMV Format Prototype Package and Intel MKL on Intel Xeon Phi processors with matrices from Florida collection
We are seeking interested parties to evaluate this prototype implementation and provide feedback. If you are interested, please send a request to intel.mkl@intel.com to download the Intel MKL SpMV Format Prototype Package.
Bibliography.[1] Intel Math Kernel Library: https://software.intel.com/en-us/intel-mkl
[2] Efficient Sparse Matrix-Vector Multiplication on x86-based Many-core Processors. Xing Liu, Mikhail Smelyanskiy, Edmond Chow, Pradeep Dubey. In Proceedings of the 2013 International Conference on Supercomputing, June 2013.
[3] The University of Florida Sparse Matrix Collection T. A. Davis and Y. Hu, ACM Transactions on Mathematical Software, Vol 38, Issue 1, 2011, pp 1:1 - 1:25.
Copyright © 2015, Intel Corporation. All rights reserved.
"Intel, Xeon, and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries."