GTC-P (Gyrokinetic Toroidal Code - Princeton) for Intel® Xeon Phi™ Coprocessor

Authors: Rezaur Rahman (Intel Corporation, OR), Bei Wang (Princeton University, NJ)

Code Access

GTC-P code is maintained by Princeton Plasma Physics Lab (PPPL) and is available under the Theory Code Licensing agreement from the PPPL on request. The code supports the symmetric mode of operation of the Intel® Xeon® processor (Referred to as ‘host’ in this document) with the Intel® Xeon Phi™ coprocessor (Referred to as ‘coprocessor’ in this document) in a single node and in a cluster environment.

To get access to the code:

You can submit the request on this web site indicating you want the GTC-P code: http://theorycodes.pppl.wikispaces.net/Theory+Department+Codes.
Request access to the Intel® Xeon Phi™ coprocessor version of the code. You need version 2.1 or later.

Build Directions

You will need Intel® Composer XE 2013 or newer C/C++ and Fortran compiler and Intel® MPI Library 4.1.1 or newer.
Get the 2.1 or newer version of GTC-P from the PPPL team.
Set environment variables for Intel Composer XE and Intel MPI.
To build the host version of MPI/OpenMP* executable, do
1. $ make ARCH=icc.xeon clean
2. $ make ARCH=icc.xeon
3. This will create the binary bench_gtc
To build the coprocessor version of MPI/OpenMP executable, do
1. $ make ARCH=icc.mic clean
2. $ make ARCH=icc.mic
3. This will create the binary bench_gtc.mic. You can execute this natively on coprocessor
To build the MIC-symmetric version of MPI/OpenMP executable, do
1. $ make ARCH=icc.symmetric clean
2. $ make ARCH=icc.symmetric
3. This will create the binary bench_gtc.symmetric. You can execute this binary in symmetric mode where MPI processes are run on the host and the coprocessor simultaneously.

Run Directions

Symmetric Mode Execution on a Cluster

GTC-P currently supports symmetric mode execution on the Intel® Xeon® processor and the Intel® Xeon Phi™ coprocessor-based cluster, which means, MPI ranks run on both the processor and coprocessor. You need to build both the gtc_bench and gtc_bench.symmetric to execute them in parallel on the processor and coprocessor. You can find the instructions for setting up a cluster with Intel® Xeon Phi™ coprocessor cards here http://software.intel.com/en-us/articles/configuring-intel-xeon-phi-coprocessors-inside-a-cluster.

To run on a cluster with one coprocessor card per node, do the following:

Set up the workload with npe_radiald = 2.
Set up the hostfile to contain the host nodes and corresponding coprocessor nodes. For example, to use two nodes in a cluster, your setup may look like this:
1. Node1
2. Node1-mic0
3. Node2
4. Node2-mic0
Set export I_MPI_MIC=enable. This will allow MPI ranks to run on the coprocessor and communicate with host MPI ranks.
Set export I_MPI_MIC_POSTFIX=.mic. This will automatically add a prefix (.mic) to the executable when the mpirun script runs the MPI job on the Xeon Phi coprocessor cards.
Set the environment variables to invoke MPI runtime on host.
Start the application run as follows:
1. Enter mpiexec.hydra -r ssh -genv I_MPI_FABRICS shm:dapl -genv I_MPI_DAPL_PROVIDER ofa-v2-mlx4_0-1 -prepend-rank -perhost 1 -f hostfile -n 4 ~/gtc/run
2. Where: the “-genv I_MPI_FABRICS shm:dapl -genv I_MPI_DAPL_PROVIDER ofa-v2-mlx4_0-1” environment variables are used to select Infiniband fabrics for reduced communication overhead.
3. The hostfile contains the list of processor and coprocessor nodes to execute on.
4. ~/gtc/run tells the MPI runtime to execute the “run” script on the host processor and “run.mic” on the coprocessor from the ~/gtc folder accessible from both locations.

The script files are given below for your reference:

runsymmetric.sh :invoke this script with number of nodes including MIC nodes to run on, example, “./runsymmetric.sh 4”

export I_MPI_MIC=enable

export I_MPI_MIC_POSTFIX=.mic

export KMP_AFFINITY=scatter

source /opt/intel/impi/latest/bin64/mpivars.sh

mpiexec.hydra -r ssh -genv I_MPI_FABRICS shm:dapl -genv I_MPI_DAPL_PROVIDER ofa-v2-mlx4_0-1 -prepend-rank -perhost 1 -f hostfile -n $1 ~/gtc/run

run: script to start the run on host nodes. Its job is to setup path and environment variables specific to host run. ./bech_gtc is the executable to run on MIC.

export OMP_NUM_THREADS=24

export KMP_AFFINITY=scatter

source /opt/intel/impi/latest/bin64/mpivars.sh

source /opt/intel/compiler/latest/bin/compilervars.sh intel64

./bench_gtc A.txt 200 1

run.mic: script to start the run on MIC coprocessors. Its job is to setup path and environment variables specific to MIC run. ./bech_gtc.symmetric is the executable to run on MIC.

export PATH=/opt/intel/impi/latest/mic/bin:$PATH

export LD_LIBRARY_PATH=/opt/intel/itac/latest/mic/slib:/opt/intel/impi/latest/mic/lib:/opt/intel/compiler/2013_sp1.1.106/composerxe/lib/mic:~/:$LD_LIBRARY_PATH;

export KMP_AFFINITY=compact

export OMP_NUM_THREADS=240

~/gtc/bench_gtc.symmetric A.txt 200 1

GTC-P Parallelism

GTC-P includes three levels of decomposition: domain decomposition in the toroidal dimension, domain decomposition in the radial dimension, and particle decomposition within each subdomain. The number of toroidal domains is given as a command line argument, ntoroidal. The number of particle copies in each subdomain is given in the input file npe_radiald. The number of radial domains is calculated dynamically as: total_pe/(ntoroidal * npe_radiald), where total_pe is the total number of MPI processes in the simulation.

When running the code on the host or on the coprocessor only , we usually set npe_radiald=1 (turn off particle decomposition). However, when running in symmetric mode with one MIC per node, it is important that we set npe_radiald=2. This enforces that the host and the MIC share the same subdomain, but each carries half the number of particles in that subdomain. Sharing the same subdomain between the host and the MIC avoids running some grid-based subroutines repeatedly on MIC, where those grid-based subroutines are usually more efficient on the host than on the MIC. In addition, when running in symmetric mode, we set TOROIDAL_FIRST=0 at bench_gtc_opt.h. When TOROIDAL_FIRST=0, the MPI ranks are first placed in the particle decomposition dimension. This guarantees that the two MPI processes with the same toroidal domain and radial domain rank numbers are placed on the host and the MIC, respectively. For example, if you are using two nodes with four MPI processes with ntoroidal=2, npe_radiald=2, the processors and their associated process IDs are:

GTC-P Performance

The following runs were done on the Endeavor cluster at Intel.

Platform Configurations

Intel, the Intel logo, Ultrabook, and Core are trademarks of Intel Corporation in the US and/or other countries.
Copyright © 2014 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.

GTC-P (Gyrokinetic Toroidal Code - Princeton) for Intel® Xeon Phi™ Coprocessor

Code Access

Build Directions

Run Directions

Symmetric Mode Execution on a Cluster

GTC-P Parallelism

GTC-P Performance

Platform Configurations

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112