Building and Running 3D-FFT Code that Leverages MPI-3 Non-Blocking Collectives with the Intel® Parallel Studio XE Cluster Edition

Purpose

This application note assists developers with using Intel® Software Development Tools with the 3D-FFT MPI-3 based code sample from the Scalable Parallel Computing Lab (SPCL), ETH Zurich.

Introduction

The original 3D-FFT code based on the prototype library libNBC was developed to help in optimizing parallel high performance applications by overlapping computation and communication [1]. The updated version of the code based on MPI-3 Non-Blocking Collectives (NBC) has now been posted at the SPCL, ETH Zurich web site. This new version relies on the MPI-3 API and therefore can be used by modern MPI libraries that implement it. One such MPI library implementation is Intel® MPI Library that fully supports the MPI-3 Standard [2].

Obtaining the latest Version of Intel® Parallel Studio XE 2015 Cluster Edition

The Intel® Parallel Studio XE 2015 Cluster Edition software product includes the following components used to build the 3D-FFT code:

Intel® C++ Compiler XE
Intel® MPI Library (version 5.0 or above) which supports the MPI-3 Standard
Intel® Math Kernel Library (Intel® MKL) that contains an optimized FFT (Fast Fourier Transform) solver and the wrappers for FFTW (Fastest Fourier Transform in the West)

The latest versions of Intel® Parallel Studio XE 2015 Cluster Edition may be purchased, or evaluation copies requested, from the URL https://software.intel.com/en-us/intel-parallel-studio-xe/try-buy. Existing customers with current support for Intel® Parallel Studio XE 2015 Cluster Edition can download the latest software updates directly from https://registrationcenter.intel.com/

Code Access

To download the 3D-FFT MPI-3 NBC code, please go to the URL http://spcl.inf.ethz.ch/Research/Parallel_Programming/NB_Collectives/Kernels/3d-fft_nbc_mpi_intel.tgz

Building the 3D-FFT NBC Binary

To build the 3D-FFT NBC code:

Set up the build environment, e.g.,

`01`	source /opt/intel/composer_xe_2015.2.164/bin/compilervars.sh intel64

`02`	source /opt/intel/impi/5.0.3.048/bin64/mpivars.sh

Regarding the above mentioned versions (.../composer_xe_2015.2.164 and .../impi/5.0.3.048), please source the corresponding versions that are installed on your system.

2. Untar the 3D-FFT code download from the link provided in the Code Access section above and build the 3D-FFT NBC binary

`01`	mpiicc -o 3d-fft_nbc 3d-fft_nbc.cpp -I$MKLROOT/include/fftw/ -mkl

Running the 3D-FFT NBC Application

Intel® MPI Library support of asynchronous message progressing allows to overlap computation and communication in NBC operations [2]. To enable asynchronous progress in the Intel® MPI Library, the environment variable MPICH_ASYNC_PROGRESS should be set to 1:

`01`	export MPICH_ASYNC_PROGRESS=1

Run the application using the mpirun command as usual. For example, the command shown below starts the application with 32 ranks on the 2 nodes (node1 and node2) with 16 processes per node:

`01`	mpirun -n 32 -ppn 16 -hosts node1,node2 ./3d-fft_nbc

and produces this output

`01`	1 repetitions of N=320, testsize: 0, testint 0, tests: 0, max_n: 10

`02`	approx. size: 62.500000 MB

`03`	normal (MPI): 0.192095 (NBC_A2A: 0.037659/0.000000) (Test: 0.000000) (2x1d-fft: 0.069162) - 1x512000 byte

`04`	normal (NBC): 0.203643 (NBC_A2A: 0.047140/0.046932) (Test: 0.000000) (2x1d-fft: 0.069410) - 1x512000 byte

`05`	pipe (NBC): 0.173483 (NBC_A2A: 0.042651/0.031492) (Test: 0.000000) (2x1d-fft: 0.069383) - 1x512000 byte

`06`	tile (NBC): 0.155921 (NBC_A2A: 0.018214/0.010794) (Test: 0.000000) (2x1d-fft: 0.069577) - 1x512000 byte

`07`	win (NBC): 0.173479 (NBC_A2A: 0.042485/0.026085) (Pack: 0.000000) (2x1d-fft: 0.069385) - 1x512000 byte

`08`	wintile (NBC): 0.169248 (NBC_A2A: 0.028918/0.021769) (Pack: 0.000000) (2x1d-fft: 0.069290) - 1x512000 byte

Acknowledgments

Thanks goes to Torsten Hoefler for hosting 3D-FFT distribution for Intel tools. Mikhail Brinskiy assisted in porting libNBC version of 3D-FFT code to MPI-3 Standard. James Tullos and Steve Healey suggested corrections and improvements to the draft.

References

1. Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, Brief Announcement: Leveraging Non-Blocking Collectives Communication in High-Performance Applications, SPAA'08, pp. 113-115, June 14-16, 2008, Munich, Germany.

2. Mikhail Brinskiy, Alexander Supalov, Michael Chuvelev, Evgeny Leksikov, Mastering Performance Challenges with the new MPI-3 Standard, PUM issue 18: http://goparallel.sourceforge.net/wp-content/uploads/2014/07/PUM18_Mastering_Performance_with_MPI3.pdf

Building and Running 3D-FFT Code that Leverages MPI-3 Non-Blocking Collectives with the Intel® Parallel Studio XE Cluster Edition

Purpose

Introduction

Obtaining the latest Version of Intel® Parallel Studio XE 2015 Cluster Edition

Code Access

Building the 3D-FFT NBC Binary

Running the 3D-FFT NBC Application

Acknowledgments

References

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List