miniGhost on Intel® Xeon® processors and Intel® Xeon Phi™ Coprocessor

Purpose

This article provides instructions for code access, build, and run directions for the miniGhost code, running on Intel® Xeon® processors and Intel® Xeon Phi™ Coprocessors.

Introduction

miniGhost is a Finite Difference mini-application which implements a difference stencil across a homogenous three dimensional domain.

The kernels that it contains are:
- computation of stencil options,
- inter-process boundary (halo, ghost) exchange.
- Global summation of grid values.

miniGhost was mainly designed to study the performance characteristics of the BSPMA configuration within the context of computations widely used across a variety of scientific algorithms.

BSPMA (Bulk synchronous parallel with message aggregation) model is where the face data is accumulated for each variable into user managed buffers. The buffers are then transmitted to (up to) six neighbor processes and computations of the selected stencil is applied to each variable. The other model is SVAF (single variable, aggregated face data), but this article is focused on BSMPA model only.

miniGhost serves as a proxy (or miniapp) for CTH (Shock Physics) code from Sandia.

Code Access

miniGhost is part of the Mantevo project. To get access to the code refer to: https://mantevo.org/packages/ or here

Please note that v0.9 of the code has been used for performance and optimizations in this article.

Build Directions

The reference version of miniGhost can be built for serial execution or parallel (MPI + OpenMP) execution:

Build

The following examples illustrate different commands for different Intel Xeon processors. Choose the command for the appropriate processor for your build.

NOTE: In the following code examples, extra paragraph spacing shows where very long command lines terminate. Single-spaced code examples are one command line.

Uncompress the source files.

tar -x MiniGhost<ver>.tar
cd source

Source the latest Intel® C and C++ Compilers and Intel® MPI Library. The following example is for the Intel® Compiler (15.0.1 20141023) and Intel® MPI Library (5.0.2.044), including which instructions to use for the type of processor and coprocessor support you need. (The parenthetical comments are not to be included with the code.)

source /opt/intel/composer_xe_2015.1.133/bin/compilervars.sh intel64

(for Intel® Xeon® processor):

source /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh

(for Intel® Xeon Phi™ Coprocessor):

source /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh

To build for the Intel® Xeon® processor v2 family (formerly codenamed Ivy Bridge):

Make the following changes in makefile.mpi

				FC=mpiifort
				CC=mpiicc
				CFLAGS += -Df2c_ -O3 -openmp -g –xCORE-AVX-I
				FFLAGS += -D_MG_INT4 -D_MG_REAL8 -O3 -openmp -g –xCORE-AVX-I

				Build Command

				make –f makefile.mpi

To build for the Intel® Xeon® processor v3 family (formerly codenamed Haswell):

Make the following changes in makefile.mpi

				FC=mpiifort
				CC=mpiicc
				CFLAGS += -Df2c_ -O3 -openmp -g –xCORE-AVX2
				FFLAGS += -D_MG_INT4 -D_MG_REAL8 -O3 -openmp -g –xCORE-AVX2

				Build Command

				make –f makefile.mpi

To build for the Intel Xeon Phi Coprocessor (Knights Corner):

				FC=mpiifort
				CC=mpiicc
				CFLAGS += -Df2c_ -O3 -openmp -g –mmic
				FFLAGS += -D_MG_INT4 -D_MG_REAL8 -O3 -openmp -g –mmic

				Build Command

				make –f makefile.mpi

Compiler Flags Used

Compiler Flag	Effect
O3	Optimize for maximum speed and enable more aggressive optimizations that may not improve performance on some programs
CORE-AVX-I	May generate Intel® Advanced Vector Extensions (Intel® AVX), including instructions in Intel® Core 2™ processors in process technologies smaller than 32nm, Intel® SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel® processors
CORE-AVX2	May generate Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel AVX, SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, and SSE instructions for Intel processors
mmic	Build an application that runs natively on Intel® Multi-integrated Core Architecture (Intel® MIC Architecture)

Run Directions

Run Time input parameters can be listed using runtime input (--help )

./miniGhost.x --help

Check MG_OPTIONS.F for list of all parameterized options

In the following instructions, select the executing command appropriate for the processor you are using.

Move to the “run” directory:
```
		cd run
```
Be sure to run the appropriate binary for each architecture.
Source the Intel Compilers and Intel MPI Library as appropriate for the architecture.

For Intel Xeon processor v2 family (e.g. Intel® Xeon® processor E5-2697v2), execute the following:

		export OMP_NUM_THREADS=4
		export I_MPI_PIN_DOMAIN=omp
		export KMP_AFFINITY=compact, verbose

		mpirun -n 12 ../source/miniGhost.ivb --scaling 1 --nx 445 --ny 445 --nz 445 --num_vars 40 --num_spikes 1 --debug_grid 1 --report_diffusion 21 --percent_sum 100 --num_tsteps 20 --stencil 24 --comm_method 10 --report_perf 1 --npx 3 --npy 2 --npz 2 --error_tol 8

For Intel Xeon processor v3 family (e.g. Intel® Xeon® processor E5-2697 v3), execute the following:

		export OMP_NUM_THREADS=4
		export I_MPI_PIN_DOMAIN=omp
		export KMP_AFFINITY=compact, verbose

		mpirun -n 14 ../source/miniGhost.hsw --scaling 1 --nx 445 --ny 445 --nz 445 --num_vars 40 --num_spikes 1 --debug_grid 1 --report_diffusion 21 --percent_sum 100 --num_tsteps 20 --stencil 24 --comm_method 10 --report_perf 1 --npx 7 --npy 2 --npz 1 --error_tol 8

For both Intel Xeon processor v2 family (e.g. Intel® Xeon® processor E5-2697 v2) and Intel Xeon Phi Coprocessor 7120A (Symmetric Mode), first source the Intel Compilers and Intel MPI Library, then execute the following:

Create two files (executable) run.ivb and run.knc with the following commands:

run.ivb:

			../source/miniGhost.ivb --scaling 1 --nx 445 --ny 445 --nz 445 --num_vars 40 --num_spikes 1 --debug_grid 1 --report_diffusion 21 --percent_sum 100 --num_tsteps 20 --stencil 24 --comm_method 10 --report_perf 1 --npx 2 --npy 1 --npz 1 --error_tol 8

run.knc:

			../source/miniGhost.knc --scaling 1 --nx 445 --ny 445 --nz 445 --num_vars 40 --num_spikes 1 --debug_grid 1 --report_diffusion 21 --percent_sum 100 --num_tsteps 20 --stencil 24 --comm_method 10 --report_perf 1 --npx 2 --npy 1 --npz 1 --error_tol 8

			export I_MPI_MIC=enable

Use the appropriate MPI fabric using the following:

			export I_MPI_FABRICS=shm:dapl

			export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u

			mpiexec.hydra –host `hostname` –n 1 –env OMP_NUM_THREADS 48 –env KMP_AFFINITY compact,verbose –env I_MPI_PIN_DOMAIN omp ./run.ivb : -host `hostname`-mic0 –wdir `pwd` -n 1 –env OMP_NUM_THREADS 240 –env KMP_AFFINITY compact,verbose –env I_MPI_PIN_DOMAIN omp ./run.knc

For Intel Xeon processor v3 family (e.g. Intel® Xeon® processor E5-2697 v3) and the Intel Xeon Phi Coprocessor 7120A (Symmetric Mode), execute the following:

Create two files (executable) run.hsw and run.knc with the following commands:

run.hsw:

			../source/miniGhost.hsw --scaling 1 --nx 445 --ny 445 --nz 445 --num_vars 40 --num_spikes 1 --debug_grid 1 --report_diffusion 21 --percent_sum 100 --num_tsteps 20 --stencil 24 --comm_method 10 --report_perf 1 --npx 2 --npy 1 --npz 1 --error_tol 8

run.knc:

			../source/miniGhost.knc --scaling 1 --nx 445 --ny 445 --nz 445 --num_vars 40 --num_spikes 1 --debug_grid 1 --report_diffusion 21 --percent_sum 100 --num_tsteps 20 --stencil 24 --comm_method 10 --report_perf 1 --npx 2 --npy 1 --npz 1 --error_tol 8

			export I_MPI_MIC=enable

Use the appropriate MPI fabric using the following:

			export I_MPI_FABRICS=shm:dapl

			export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u

			mpiexec.hydra –host `hostname` –n 1 –env OMP_NUM_THREADS 56 –env KMP_AFFINITY compact,verbose –env I_MPI_PIN_DOMAIN omp ./run.hsw : -host `hostname`-mic0 –wdir `pwd` -n 1 –env OMP_NUM_THREADS 240 –env KMP_AFFINITY compact,verbose –env I_MPI_PIN_DOMAIN omp ./run.knc

Performance and Optimizations:

In the MG_FLUX_ACCUMULATE subroutine (MG_FLUX_ACCUMULATE.F) six loops were parallelized with $OMP PARALLEL DO/$OMP PARALLEL END DO pragmas to eliminate some serial sections.

Below is the performance speedup of miniGhost (v0.9) using Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessor. Here Intel® Xeon® processor E5-2697v2 is used as the baseline.

Image may be NSFW.
Clik here to view.

References and Resources

[1] Richard F. Barrett, Courtenay T. Vaughan, and Michael A. Heroux. MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing.
http://prod.sandia.gov/techlib/access-control.cgi/2012/122437.pdf

[2] MiniGhost details as part of the NERSC-8/Trinity Benchmarks
https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/minighost/

miniGhost on Intel® Xeon® processors and Intel® Xeon Phi™ Coprocessor

Purpose

Introduction

Code Access

Build Directions

Build

Compiler Flags Used

Run Directions

Use the appropriate MPI fabric using the following:

Use the appropriate MPI fabric using the following:

Performance and Optimizations:

References and Resources

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112