Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors for multi-node runs

For single-node runs, refer to the recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ processors

Purpose

This recipe describes a step-by-step process for getting, building, and running NAMD (scalable molecular dynamics code) on the Intel® Xeon Phi™ processor and Intel® Xeon® processor family to achieve better performance.

Introduction

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecule systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Below are the details for how to build NAMD on Intel Xeon Phi processor and Intel Xeon processor E5 family. You can learn more about NAMD at http://www.ks.uiuc.edu/Research/namd/.

Building and Running NAMD for Cluster on the Intel® Xeon® processors

E5-2697 v4 (formerly Broadwell (BDW)), Intel® Xeon Phi™ processor 7250 (formerly Knights Landing (KNL)), and Intel® Xeon® Gold 6148 processor (formerly Skylake (SKX))

Download the code

Download the latest NAMD source code from this site: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
Download Open Fabric Interfaces (OFI). NAMD uses Charm++/OFI for multi-node.
- You can use the installed OFI library, which comes with the IFS package, or download and build it manually.
- To check the version of the installed OFI use the “fi_info --version” command (OFI1.4.2 was used here).
- The OFI library can be downloaded from https://github.com/ofiwg/libfabric/releases.
Download Charm++ with OFI support:
From here: http://charmplusplus.org/download/
or
git clone: http://charm.cs.illinois.edu/gerrit/charm.git
Download the fftw3 version: http://www.fftw.org/download.html
Version 3.3.4 is used is this run.
Download the apao and stvm workloads: http://www.ks.uiuc.edu/Research/namd/utilities/

Build the Binaries

Set the environment for compilation:

CC=icc; CXX=icpc; F90=ifort; F77=ifort
export CC CXX F90 F77
source /opt/intel/compiler/<version>/compilervars.sh intel64

Build the OFI library (you can skip this step if you want to use the installed OFI library):
1. cd <libfabric_root_path>
2. ./autogen.sh
3. ./configure --prefix=<libfabric_install_path> --enable-psm2
4. make clean && make -j12 all && make install
5. custom OFI can be used further using LD_PRELOAD or LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=<libfabric_install_path>/lib:${LD_LIBRARY_PATH}
mpiexec.hydra …
or
LD_PRELOAD=<libfabric_install_path>/lib/libfabric.so mpiexec.hydra …
Build fftw3:
1. cd <fftw_root_path>
2. ./configure --prefix=<fftw_install_path> --enable-single --disable-fortran CC=icc
  Use –xCORE-AVX512 for SKX, -xMIC-AVX512 for KNL and –xCORE-AVX2 for BDW
3. make CFLAGS=“-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits” clean install
Build multi-node version of Charm++:
1. cd <charm_root_path>
2. ./build charm++ ofi-linux-x86_64 icc smp --basedir <libfabric_root_path> --with-production “-O3 -ip” -DCMK_OPTIMIZE

Build NAMD:

Modify the arch/Linux-x86_64-icc to look like the following (select one of the FLOATOPTS options depending on the CPU type):

NAMD_ARCH = Linux-x86_64
CHARMARCH = multicore-linux64-iccstatic

# For KNL
FLOATOPTS = -ip -xMIC-AVX512  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

# For SKX
FLOATOPTS = -ip -xCORE-AVX512  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

# For BDW
FLOATOPTS = -ip -xCORE-AVX2  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

CXX = icpc -std=c++11 -DNAMD_KNL
CXXOPTS = -static-intel -O2 $(FLOATOPTS)
CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
CXXCOLVAROPTS = -O2 -ip
CC = icc
COPTS = -static-intel -O2 $(FLOATOPTS)

Compile NAMD
1. ./config Linux-x86_64-icc --charm-base <charm_root_path> --charm-arch ofi-linux-x86_64-smp-icc --with-fftw3 --fftw-prefix <fftw_install_path>--without-tcl --charm-opts -verbose
2. cd Linux-x86_64-icc
3. make clean && gmake –j

Build memopt NAMD binaries:
Like BDW/KNL build with extra options “–with-memopt” for config.

Other Setup

Change the next lines in the *.namd file for both the stmv and opao1 workloads:
numsteps: 1000
outputtiming: 20
outputenergies: 600

Run the Binaries

Set the environment for launching:
1. source /opt/intel/compiler/<version>/compilervars.sh intel64
2. source /opt/intel/impi/<version>/intel64/bin/mpivars.sh
3. specify host names to run on in “hosts” file
4. export MPIEXEC=“mpiexec.hydra -hostfile ./hosts”
5. export PSM2_SHAREDCONTEXTS=0 (if you use PSM2 < 10.2.85)

Launch the task (for example with N nodes, with 1 process per node and PPN cores):

$MPPEXEC -n N -ppn 1 ./namd2 +ppn (PPN-1) <workload_path> +pemap 1-(PPN-1) +commap 0

For example for BDW (PPN=72):
$MPPEXEC -n 8 -ppn 1 ./namd2 +ppn 71 <workload_path> +pemap 1-71 +commap 0

For example for KNL (PPN=68, without hyper threads):
$MPPEXEC -n 8 -ppn 1 ./namd2 +ppn 67 <workload_path> +pemap 1-67 +commap 0

For example for KNL (with 2 hyper threads per core):
$MPPEXEC -n 8 -ppn 1 ./namd2 +ppn 134 <workload_path> +pemap 0-66+68 +commap 67

For KNL with MCDRAM in flat mode:

$MPPEXEC -n N -ppn 1 numactl -p 1 ./namd2 +ppn (PPN-1) <workload_path> +pemap 1-(PPN-1) +commap 0

Remarks

To achieve better scale on multi-node, increase the count of the communication threads (1, 2, 4, 8, 13, 17). For example, the following is a command for N KNL nodes with 17 processes per node and 8 threads per process (7 worker threads and 1 communication thread):

$MPPEXEC -n $(($N*17)) -ppn 17 numactl -p 1 ./namd2 +ppn 7 <workload_path> +pemap 0-67,68-135:4.3 +commap 71-135:4

Basic Charm++/OFI knobs (should be added as NAMD parameters)

+ofi_eager_maxsize: (default: 65536) Threshold between buffered and RMA paths
+ofi_cq_entries_count: (default: 8) Maximum number of entries to read from the completion queue with each call to fi_cq_read().
+ofi_use_inject: (default: 1) whether to use buffered send.
+ofi_num_recvs: (default: 8) Number of pre-posted receive buffers.
+ofi_runtime_tcp: (default: off) during the initialization phase the OFI EP names need to be exchanged among all nodes.
By default, the exchang is done with both PMI and OFI. If this flag is set then the exchange is done with PMI only.

For example:

$MPPEXEC -n 2 -ppn 1 ./namd2 +ppn 1 <workload_path> +ofi_eager_maxsize 32768 +ofi_num_recvs 16

Best performance results reported on an up to 128 Intel® Xeon Phi™ processor nodes cluster (ns/day; higher is better)

Workload/Node (2HT)	1	2	4	8	16
stmv (ns/day)	0.55	1.05	1.86	3.31	5.31

Workload/Node (2HT)	8	16	32	64	128
stmv.28M (ns/day)	0.152	0.310	0.596	1.03	1.91

Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors for multi-node runs

Purpose

Introduction

Building and Running NAMD for Cluster on the Intel® Xeon® processors

E5-2697 v4 (formerly Broadwell (BDW)), Intel® Xeon Phi™ processor 7250 (formerly Knights Landing (KNL)), and Intel® Xeon® Gold 6148 processor (formerly Skylake (SKX))

Download the code

Build the Binaries

Other Setup

Run the Binaries

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112