Parallelism delivers the performance High Performance Computing (HPC) requires. The parallelism runs across several layers: super scalar, vector instructions, threading and distributed memory with message passing. OpenMP* is a commonly used threading abstraction, especially in HPC. Many HPC applications are moving to a hybrid shared memory/distributed programming model where both OpenMP* and MPI* are used.
This paper focuses on the OpenMP parallel model, and particularly on profiling performance of OpenMP-based applications. Intel supplies a powerful performance profiling tool: Intel® VTune™ Amplifier XE is quite handy for finding performance bottlenecks in OpenMP codes. This article contains the steps to profile OpenMP applications and describes the common performance issues that can be discovered by Intel® VTune™ Amplifier XE.
Open the attached PDF document to read the full text.