Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Intel® MKL [S,D]GEMM crashes on the Intel® Many Integrated Core Architecture (Intel® MIC Architecture)

$
0
0

Issue Description: In Intel® MKL 11.1, MIC SGEMM and DGEMM crashes with segfault if TRANSB is 'N' and the right border of the matrix B is aligned to a page boundary. The workaround is to allocate extra memory for B

The bug can be triggered regardless of the pages size used and regardless of transparent huge pages being enabled or disabled. Larger page size makes it harder to trigger the bug

Example:

#include <stdio.h>                                                 

#include"mkl.h"                                                                                                                     

/*#define WORKAROUND_ENABLED*/                                     

int main(int argc, char **argv)                                    

{                                                                  

    double alpha = -1, beta = 1;                                                                                                    

    int m = 3840, n = 3840, k = 512;                               

    int lda = m, ldb = k, ldc = m;                                 

#ifdef WORKAROUND_ENABLED                                          

    int n1 = n + 1;                                                 

#else                                                              

    int n1 = n;                                                    

#endif                                                                                                                              

    double *a, *b,  *c;                                                                                                                 

    a = _mm_malloc(lda * k * sizeof(*a), 4096);                     

    b = _mm_malloc(ldb * n1 * sizeof(*b), 4096);                   

    c = _mm_malloc(ldc * n * sizeof(*c), 4096);                                                                                         

    dgemm("N", "N", &m, &n, &k, &alpha, a, &m, b, &k, &beta, c, &m);                                                                    

    _mm_free(a); _mm_free(b); _mm_free(c);                                                                                              

    return0;                                                      

}                                                                  

Explanation :

If you run it as is it is unlikely to fail. But this is only because it is small. Uncommenting the definition of the WORKAROUND_ENABLED macro will make the issue go away completely

The root cause is that supposing that B has N columns, GEMM in some cases reads elements from the column N+1. This does not destroy any data, but if B address is such that the (N+1)th column would be located (not necessarily entirely) on a different page the Nth and that page is not allocated then there will be a segfault. The workaround above increases the amount of memory allocated for B so that there’s always a page allocated for the (N+1)th column of B and this bug is not triggered

The next two pictures show the idea:

This does not happen for all input sizes but only if TRANSB is ‘N’. The issue is fixed in MKL 11.1.1


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>