For cluster run, please refer to the recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors on cluster
Purpose
This recipe describes a step-by-step process for getting, building, and running NAMD (scalable molecular dynamics code) on the Intel® Xeon Phi™ processor and Intel® Xeon® processor E5 family to achieve better performance.
Introduction
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecule systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.
NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Below are the details for how to build NAMD on the Intel Xeon Phi processor and Intel Xeon processor E5 family. You can learn more about NAMD at http://www.ks.uiuc.edu/Research/namd/.
Building and Running NAMD on the Intel® Xeon® Processor E5-2697 v4 (formerly Broadwell (BDW)), Intel® Xeon Phi™ Processor 7250 (formerly Knight Landing (KNL)), and Intel® Xeon® Gold 6148 Processor (formerly Skylake (SKX))
Download the code
- Download the latest NAMD source code from this site: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
- Download the Charm++ 6.7.1 version.
a. You can get Charm++ from the NAMD source code of the Version Nightly Build.
b. Or download it separately: http://charmplusplus.org/download/
- Download the fftw3 version: http://www.fftw.org/download.html
Version 3.3.4 is used is this run.
- Download apoa1 and stvm workloads: http://www.ks.uiuc.edu/Research/namd/utilities/
Build the binaries
- Set environment for compilation:
CC=icc; CXX=icpc; F90=ifort; F77=ifort export CC CXX F90 F77 source /opt/intel/compiler/<version>/compilervars.sh intel64
- Build fftw3:
a.
cd <fftw_root_path>
b.
./configure --prefix=<fftw_install_path> --enable-single --disable-fortran CC=icc Use –xCORE-AVX512 for SKX, -xMIC-AVX512 for KNL and –xCORE-AVX2 for BDW
c.
make CFLAGS=“-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits” clean install
- Build a multicore version of Charm++:
a.
cd <charm_root_path>
b.
./build charm++ multicore-linux64 iccstatic --with-production “-O3 -ip”
- Build NAMD:
a. Modify the arch/Linux-x86_64-icc to look like the following (select one of the FLOATOPTS options depending on the CPU type):
NAMD_ARCH = Linux-x86_64 CHARMARCH = multicore-linux64-iccstatic # For KNL FLOATOPTS = -ip -xMIC-AVX512 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE # For SKX FLOATOPTS = -ip -xCORE-AVX512 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE # For BDW FLOATOPTS = -ip -xCORE-AVX2 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE CXX = icpc -std=c++11 -DNAMD_KNL CXXOPTS = -static-intel -O2 $(FLOATOPTS) CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4 CXXCOLVAROPTS = -O2 -ip CC = icc COPTS = -static-intel -O2 $(FLOATOPTS)
b. Compile NAMD:
i.
./config Linux-x86_64-icc --charm-base <charm_root_path> --charm-arch multicore-linux64- iccstatic --with-fftw3 --fftw-prefix <fftw_install_path> --without-tcl --charm-opts –verbose
ii.
gmake –j
Other system setup
- Change the kernel setting for KNL: “
nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271
” Here is one way to change the settings (this could be different for every system):a. To be safe, first save your original
grub.cfg
:cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.ORIG
b. In “
/etc/default/grub
” add (append) the following to“GRUB_CMDLINE_LINUX”: nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271
c. Save your new configuration:
grub2-mkconfig -o /boot/grub2/grub.cfg
d. Reboot the system. After logging in, verify the settings with “
cat /proc/cmdline
” - Change next lines in *.namd file for both workloads:
numsteps 1000
outputtiming 20
outputenergies 600
Run NAMD
- on SKL/BDW (ppn = 40 / ppn = 72 correspondingly):
./namd2 +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)
- on KNL (ppn = 136 (2 hyper threads per core), MCDRAM in flat mode, similar performance in cache):
numactl -p 1 ./namd2 +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)
KNL example:
numactl -p 1 <namd_root_path>/Linux-KNL-icc/namd2 +p 136 apoa1/apoa1.namd +pemap 0-135
Performance results reported in the Intel Salesforce repository (ns/day; higher is better):
Workload | 2S Intel® Xeon® Processor E5-2697 v4 18c 2.3 GHz (ns/day) | Intel® Xeon Phi™ Processor 7250 bin1 (ns/day) | Intel® Xeon Phi™ Processor 7250 versus 2S Intel® Xeon® Processor E5-2697 v4 (speedup) |
stmv | 0.45 | 0.55
| 1.22x |
apoa1 | 5.5
| 6.18 | 1.12x |
Workload | 2S Intel® Xeon® Gold 6148 Processor 20c 2.4 GHz (ns/day) | Intel® Xeon Phi™ Processor 7250 versus 2S Intel® Xeon® Processor E5-2697 v4 (speedup) |
stmv | 0.73 | 1.44x |
apoa1 original
| 7.68 | 1.43x |
apoa1 | 8.70
| 1.44x |
Systems configuration
Processor | Intel® Xeon® Processor E5-2697 v4 | Intel® Xeon® Gold 6148 Processor | Intel® Xeon Phi™ Processor 7250 |
Stepping | 1 (B0) | 1 (B0) | 1 (B0) Bin1 |
Sockets / TDP | 2S / 290W | 2S / 300W | 1S / 215W |
Frequency / Cores / Threads | 2.3 GHz / 36 / 72 | 2.4 GHz / 40 / 80 | 1.4 GHz / 68 / 272 |
DDR4 | 8x16 GB 2400 MHz (128 GB) | 12x16 GB 2666 MHz (192 GB) | 6x16 GB 2400 MHz |
MCDRAM | N/A | N/A | 16 GB Flat |
Cluster/Snoop Mode/Mem Mode | Home | Home | Quadrant/flat |
Turbo | On | On | On |
BIOS | GRRFSDP1.86B0271.R00.1510301446 |
| GVPRCRB1.86B.0010.R02.1608040407 |
Compiler | ICC-2017.0.098 | ICC-2016.4.298 | ICC-2017.0.098 |
Operating System | Red Hat Enterprise Linux* 7.2 | Red Hat Enterprise Linux 7.3 | Red Hat Enterprise Linux 7.2 |
(3.10.0-327.e17.x86_64) | (3.10.0-514.6.2.0.1.el7.x86_64.knl1) | (3.10.0-327.22.2.el7.xppsl_1.4.1.3272._86_64) |