One of the major features introduced in OpenMP4.0 specification are some pragmas to explicitly enable vectorization/SIMD in a program. Below is a demonstration of each explicit vectorization tool offered by OpenMP4.0:
1. #pragma omp simd
Vectorization using #pragma omp simd instructs the compiler to enforce vectorization of loops. It is designed to minimize the amount of source code changes needed in order to obtain vectorized code. The "omp simd" pragam can be used to vectorize loops that the compiler does not normally auto-vectorize even with the use of vectorization hints such as "pragma vector always" or "pragma ivdep".
char foo(char *A, int n){ int i; char x = 0; #ifdef SIMD #pragma omp simd reduction(+:x) #endif #ifdef IVDEP #pragma ivdep #endif for (i=0; i<n; i++){ x = x + A[i]; } return x; } >icl /c /Qvec-report2 simd.cpp -openmp simd.cpp simd.cpp(12) (col. 3): remark: loop was not vectorized: existence of vector dependence. >icl /c /Qvec-report2 simd.cpp /DIVDEP -openmp simd.cpp simd.cpp(12) (col. 3): remark: loop was not vectorized: existence of vector dependence. >icl /c /Qvec-report2 simd.cpp /DSIMD -openmp simd.cpp simd.cpp(12) (col. 3): remark: OpenMP SIMD LOOP WAS VECTORIZED.
The pragma comes with a bunch of clauses and it is always advised to use the relevant clauses which best describes the behavior of the function body. Please refer to section 2.8.1 at http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf for more information on the relevant clauses for "ompsimd"pragma.
2. #pragmaomp declare simd
Traditionally functions in C/C++ language accept scalar arguments and return scalar values. This traditional behavior can be a bottleneck in vectorizing a loop which involves function calls. Because in the process of vectorizing the loop body, the operations are done on vector operands rather than each scalar operand. When a function call is encountered which neither accepts vector operands as arguments nor returns vector results, this poses a serious bottleneck in vectorizing the loop body. In such cases, this new feature of OpenMP4.0 comes in handy to instruct the compiler to generate specific vector variants of the scalar function. Below is an example which demonstrates how to use the pragma.
#pragma omp declare simd
int vfun_add_one(int x)
{
return x+1;
}
>icl /c /Qvec-report2 elementalfunc.cpp -openmp
elementalfunc.cpp
elementalfunc.cpp(3) (col. 1): remark: FUNCTION WAS VECTORIZED.
The pragma comes with a bunch of clauses and it is always advised to use the relevant clauses which best describes the behavior of the function body. Please refer to section 2.8.2 at http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf for more information on the relevant clauses for "omp declare simd"pragma.
3. Targeting same “for” loop for Threading and SIMD:
OpenMP4.0 allows to instruct the compiler to execute a given “for” loop across multiple OpenMP threads with each thread executing in SIMD mode. Example:
char foo(char *A, int n){
int i;
char x = 0;
#pragma omp parallel for simd
for(i = 0; i < n; i++)
x = x + A[i];
return 0;
}
$ icc test.c -c -vec-report2 -c -openmp-report2 -openmp
test.c(4): (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED
test.c(5): (col. 1) remark: OpenMP SIMD LOOP WAS VECTORIZED