Purpose
This recipe describes a step-by-step process of how to get, build, and run NAMD, Scalable Molecular Dynamic, code on Intel® Xeon Phi™ processor and Intel® Xeon® E5 processors for better performance.
Introduction
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecule systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.
NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Find the details below of how to build on Intel® Xeon Phi™ processor and Intel® Xeon® E5 processors and learn more about NAMD at http://www.ks.uiuc.edu/Research/namd/
Building NAMD on Intel® Xeon® Processor E5-2697 v4 (BDW) and Intel® Xeon Phi™ Processor 7250 (KNL)
- Download the latest NAMD source code(Nightly Build) from this site: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
- Download fftw3 from this site: http://www.fftw.org/download.html
- Version 3.3.4 is recommended
- Build fftw3:
- Cd<path>/fftw3.3.4
- ./configure --prefix=$base/fftw3 --enable-single --disable-fortran CC=icc
Use xMIC-AVX512 for KNL or –xCORE-AVX2 for BDW - make CFLAGS="-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits" clean install
- Download charm++* version 6.7.1
- You can get charm++ from the NAMD Version Nightly Build source code
- Or download it separately from here: http://charmplusplus.org/download/
- Build multicore version of charm++:
- cd <path>/charm-6.7.1
- ./build charm++ multicore-linux64 iccstatic --with-production "-O3 -ip"
- Build BDW:
- Modify the Linux-x86_64-icc.arch to look like the following:
NAMD_ARCH = Linux-x86_64 CHARMARCH = multicore-linux64-iccstatic FLOATOPTS = -ip -xCORE-AVX2 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE CXX = icpc -std=c++11 -DNAMD_KNL CXXOPTS = -static-intel -O2 $(FLOATOPTS) CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4 CXXCOLVAROPTS = -O2 -ip CC = icc COPTS = -static-intel -O2 $(FLOATOPTS)
- ./config Linux-x86_64-icc --charm-base <charm_path> --charm-arch multicore-linux64- iccstatic --with-fftw3 --fftw-prefix <fftw_path> --without-tcl --charm-opts –verbose
- gmake -j
- Modify the Linux-x86_64-icc.arch to look like the following:
- Build KNL:
- Modify the arch/Linux-KNL-icc.arch to look like the following:
NAMD_ARCH = Linux-KNL CHARMARCH = multicore-linux64-iccstatic FLOATOPTS = -ip -xMIC-AVX512 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits DNAMD_DISABLE_SSE CXX = icpc -std=c++11 -DNAMD_KNL CXXOPTS = -static-intel -O2 $(FLOATOPTS) CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4 CXXCOLVAROPTS = -O2 -ip CC = icc COPTS = -static-intel -O2 $(FLOATOPTS)
- ./config Linux-KNL-icc --charm-base <charm_path> --charm-arch multicore-linux64-iccstatic --with-fftw3 --fftw-prefix <fftw_path> --without-tcl --charm-opts –verbose
- gmake –j
- Modify the arch/Linux-KNL-icc.arch to look like the following:
- Change the kernel setting for KNL: “
nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271”
- Download apoa and stmv workloads from here: http://www.ks.uiuc.edu/Research/namd/utilities/
- Change next lines in *.namd file for both workloads:
numsteps 1000
outputtiming 20 outputenergies 600
Run NAMD workloads on Intel® Xeon® Processor E5-2697 v4 and Intel® Xeon Phi™ Processor 7250
Run BDW (ppn = 72):
$BIN +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)
Run KNL (ppn = 136, MCDRAM, similar performance in cache):
numactl –m 1 $BIN +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)
Performance results reported in Intel® Salesforce repository
(ns/day; higher is better):
Workload | Intel® Xeon® Processor E5-2697 v4 (ns/day) | Intel® Xeon Phi™ Processor 7250 (ns/day) | KNL vs. 2S BDW (speedup) |
---|---|---|---|
stmv | 0.45 | 0.55 | 1.22x |
Ap0a1 | 5.5 | 6.18 | 1.12x |
Systems configuration:
Processor | Intel® Xeon® Processor E5-2697 v4(BDW) | Intel® Xeon Phi™ Processor 7250 (KNL) |
---|---|---|
Stepping | 1 (B0) | 1 (B0) Bin1 |
Sockets / TDP | 2S / 290W | 1S / 215W |
Frequency / Cores / Threads | 2.3 GHz / 36 / 72 | 1.4 GHz / 68 / 272 |
DDR4 | 8x16 GB 2400 MHz(128 GB) | 7210: 6x16 GB 2400 MHz |
MCDRAM | N/A | 16 GB Flat |
Cluster/Snoop Mode/Mem Mode | Home | Quadrant/flat |
Turbo | On | On |
BIOS | GRRFSDP1.86B0271.R00.1510301446 | GVPRCRB1.86B.0010.R02.1608040407 |
Compiler | ICC-2017.0.098 | ICC-2017.0.098 |
Operating System | Red Hat* Enterprise Linux* 7.2 (3.10.0-327.e17.x86_64) | Red Hat Enterprise Linux 7.2 (3.10.0-327.22.2.el7.xppsl_1.4.1.3272._86_64) |