Purpose
This application note assists developers with using Intel® Software Development Tools with the 3D-FFT MPI-3 based code sample from the Scalable Parallel Computing Lab (SPCL), ETH Zurich.
Introduction
The original 3D-FFT code based on the prototype library libNBC was developed to help in optimizing parallel high performance applications by overlapping computation and communication [1]. The updated version of the code based on MPI-3 Non-Blocking Collectives (NBC) has now been posted at the SPCL, ETH Zurich web site. This new version relies on the MPI-3 API and therefore can be used by modern MPI libraries that implement it. One such MPI library implementation is Intel® MPI Library that fully supports the MPI-3 Standard [2].
Obtaining the latest Version of Intel® Parallel Studio XE 2015 Cluster Edition
The Intel® Parallel Studio XE 2015 Cluster Edition software product includes the following components used to build the 3D-FFT code:
- Intel® C++ Compiler XE
- Intel® MPI Library (version 5.0 or above) which supports the MPI-3 Standard
- Intel® Math Kernel Library (Intel® MKL) that contains an optimized FFT (Fast Fourier Transform) solver and the wrappers for FFTW (Fastest Fourier Transform in the West)
The latest versions of Intel® Parallel Studio XE 2015 Cluster Edition may be purchased, or evaluation copies requested, from the URL https://software.intel.com/en-us/intel-parallel-studio-xe/try-buy. Existing customers with current support for Intel® Parallel Studio XE 2015 Cluster Edition can download the latest software updates directly from https://registrationcenter.intel.com/
Code Access
To download the 3D-FFT MPI-3 NBC code, please go to the URL http://spcl.inf.ethz.ch/Research/Parallel_Programming/NB_Collectives/Kernels/3d-fft_nbc_mpi_intel.tgz
Building the 3D-FFT NBC Binary
To build the 3D-FFT NBC code:
Set up the build environment, e.g.,
01 | source /opt/intel/composer_xe_2015.2.164/bin/compilervars.sh intel64 |
02 | source /opt/intel/impi/5.0.3.048/bin64/mpivars.sh |
Regarding the above mentioned versions (.../composer_xe_2015.2.164 and .../impi/5.0.3.048), please source the corresponding versions that are installed on your system.
2. Untar the 3D-FFT code download from the link provided in the Code Access section above and build the 3D-FFT NBC binary
01 | mpiicc -o 3d-fft_nbc 3d-fft_nbc.cpp -I$MKLROOT/include/fftw/ -mkl |
Running the 3D-FFT NBC Application
Intel® MPI Library support of asynchronous message progressing allows to overlap computation and communication in NBC operations [2]. To enable asynchronous progress in the Intel® MPI Library, the environment variable MPICH_ASYNC_PROGRESS should be set to 1:
01 | export MPICH_ASYNC_PROGRESS=1 |
Run the application using the mpirun command as usual. For example, the command shown below starts the application with 32 ranks on the 2 nodes (node1 and node2) with 16 processes per node:
01 | mpirun -n 32 -ppn 16 -hosts node1,node2 ./3d-fft_nbc |
and produces this output
01 | 1 repetitions of N=320, testsize: 0, testint 0, tests: 0, max_n: 10 |
02 | approx. size: 62.500000 MB |
03 | normal (MPI): 0.192095 (NBC_A2A: 0.037659/0.000000) (Test: 0.000000) (2x1d-fft: 0.069162) - 1x512000 byte |
04 | normal (NBC): 0.203643 (NBC_A2A: 0.047140/0.046932) (Test: 0.000000) (2x1d-fft: 0.069410) - 1x512000 byte |
05 | pipe (NBC): 0.173483 (NBC_A2A: 0.042651/0.031492) (Test: 0.000000) (2x1d-fft: 0.069383) - 1x512000 byte |
06 | tile (NBC): 0.155921 (NBC_A2A: 0.018214/0.010794) (Test: 0.000000) (2x1d-fft: 0.069577) - 1x512000 byte |
07 | win (NBC): 0.173479 (NBC_A2A: 0.042485/0.026085) (Pack: 0.000000) (2x1d-fft: 0.069385) - 1x512000 byte |
08 | wintile (NBC): 0.169248 (NBC_A2A: 0.028918/0.021769) (Pack: 0.000000) (2x1d-fft: 0.069290) - 1x512000 byte |
Acknowledgments
Thanks goes to Torsten Hoefler for hosting 3D-FFT distribution for Intel tools. Mikhail Brinskiy assisted in porting libNBC version of 3D-FFT code to MPI-3 Standard. James Tullos and Steve Healey suggested corrections and improvements to the draft.
References
1. Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine, Brief Announcement: Leveraging Non-Blocking Collectives Communication in High-Performance Applications, SPAA'08, pp. 113-115, June 14-16, 2008, Munich, Germany.
2. Mikhail Brinskiy, Alexander Supalov, Michael Chuvelev, Evgeny Leksikov, Mastering Performance Challenges with the new MPI-3 Standard, PUM issue 18: http://goparallel.sourceforge.net/wp-content/uploads/2014/07/PUM18_Mastering_Performance_with_MPI3.pdf