Enabling SIMD in program using OpenMP4.0

One of the major features introduced in OpenMP4.0 specification are some pragmas to explicitly enable vectorization/SIMD in a program. Below is a demonstration of each explicit vectorization tool offered by OpenMP4.0:

1. #pragma omp simd

Vectorization using #pragma omp simd instructs the compiler to enforce vectorization of loops. It is designed to minimize the amount of source code changes needed in order to obtain vectorized code. The "omp simd" pragam can be used to vectorize loops that the compiler does not normally auto-vectorize even with the use of vectorization hints such as "pragma vector always" or "pragma ivdep".

char foo(char *A, int n){
int i;
char x = 0;
#ifdef SIMD
#pragma omp simd reduction(+:x) 
#endif
#ifdef IVDEP
#pragma ivdep
#endif
for (i=0; i<n; i++){
x = x + A[i];
}
return x;
}

>icl /c /Qvec-report2 simd.cpp -openmp
simd.cpp
simd.cpp(12) (col. 3): remark: loop was not vectorized: existence of vector dependence.
>icl /c /Qvec-report2 simd.cpp /DIVDEP -openmp
simd.cpp
simd.cpp(12) (col. 3): remark: loop was not vectorized: existence of vector dependence.
>icl /c /Qvec-report2 simd.cpp /DSIMD -openmp
simd.cpp
simd.cpp(12) (col. 3): remark: OpenMP SIMD LOOP WAS VECTORIZED.

The pragma comes with a bunch of clauses and it is always advised to use the relevant clauses which best describes the behavior of the function body. Please refer to section 2.8.1 at http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf for more information on the relevant clauses for "ompsimd"pragma.

2. #pragmaomp declare simd

Traditionally functions in C/C++ language accept scalar arguments and return scalar values. This traditional behavior can be a bottleneck in vectorizing a loop which involves function calls. Because in the process of vectorizing the loop body, the operations are done on vector operands rather than each scalar operand. When a function call is encountered which neither accepts vector operands as arguments nor returns vector results, this poses a serious bottleneck in vectorizing the loop body. In such cases, this new feature of OpenMP4.0 comes in handy to instruct the compiler to generate specific vector variants of the scalar function. Below is an example which demonstrates how to use the pragma.


	#pragma omp declare simd
int vfun_add_one(int x)
{
return x+1;
}

>icl /c /Qvec-report2 elementalfunc.cpp -openmp
elementalfunc.cpp
elementalfunc.cpp(3) (col. 1): remark: FUNCTION WAS VECTORIZED.

The pragma comes with a bunch of clauses and it is always advised to use the relevant clauses which best describes the behavior of the function body. Please refer to section 2.8.2 at http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf for more information on the relevant clauses for "omp declare simd"pragma.

3. Targeting same “for” loop for Threading and SIMD:

OpenMP4.0 allows to instruct the compiler to execute a given “for” loop across multiple OpenMP threads with each thread executing in SIMD mode. Example:


	char foo(char *A, int n){
int i;
char x = 0;
#pragma omp parallel for simd
for(i = 0; i < n; i++)
x = x + A[i];
return 0;
}

$ icc test.c -c -vec-report2 -c -openmp-report2 -openmp
test.c(4): (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED
test.c(5): (col. 1) remark: OpenMP SIMD LOOP WAS VECTORIZED

Enabling SIMD in program using OpenMP4.0

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112