Issue Description: In Intel® MKL 11.1, MIC SGEMM and DGEMM crashes with segfault if TRANSB is 'N' and the right border of the matrix B is aligned to a page boundary. The workaround is to allocate extra memory for B
The bug can be triggered regardless of the pages size used and regardless of transparent huge pages being enabled or disabled. Larger page size makes it harder to trigger the bug
Example:
#include <stdio.h> #include"mkl.h" /*#define WORKAROUND_ENABLED*/ int main(int argc, char **argv) { double alpha = -1, beta = 1; int m = 3840, n = 3840, k = 512; int lda = m, ldb = k, ldc = m; #ifdef WORKAROUND_ENABLED int n1 = n + 1; #else int n1 = n; #endif double *a, *b, *c; a = _mm_malloc(lda * k * sizeof(*a), 4096); b = _mm_malloc(ldb * n1 * sizeof(*b), 4096); c = _mm_malloc(ldc * n * sizeof(*c), 4096); dgemm("N", "N", &m, &n, &k, &alpha, a, &m, b, &k, &beta, c, &m); _mm_free(a); _mm_free(b); _mm_free(c); return0; } |
Explanation :
If you run it as is it is unlikely to fail. But this is only because it is small. Uncommenting the definition of the WORKAROUND_ENABLED macro will make the issue go away completely
The root cause is that supposing that B has N columns, GEMM in some cases reads elements from the column N+1. This does not destroy any data, but if B address is such that the (N+1)th column would be located (not necessarily entirely) on a different page the Nth and that page is not allocated then there will be a segfault. The workaround above increases the amount of memory allocated for B so that there’s always a page allocated for the (N+1)th column of B and this bug is not triggered
The next two pictures show the idea:
This does not happen for all input sizes but only if TRANSB is ‘N’. The issue is fixed in MKL 11.1.1