Cause:
Always the inner loop is targeted for vectorization and outer loop is targeted for parallelization. Below is an example for this scenario.
Example:
#include<iostream> #define N 25 int main(){ int a[N][N], b[N], i; for(int j = 0; j < N; j++) { for(int i = 0; i < N; i++) a[j][i] = 0; b[j] = 1; } int sum = __sec_reduce_add(a[:][:]) + __sec_reduce_add(b[:]); return 0; }
$ icpc example7.cc -vec-report2
example7.cc(7): (col. 2) remark: loop was not vectorized: loop was transformed to memset or memcpy
example7.cc(5): (col. 1) remark: loop was not vectorized: not inner loop
Resolution Status:
Add the following pragma ("#pragma omp simd collapse(2)") before the outer for loop and compiler with -openmp compiler option. The collapse(2) explicitly states the compiler to collapse the 2 loops into 1 for vectorization. Doing the above will produce the following vectorization report:
remark: OpenMP SIMD LOOP WAS VECTORIZED