Purpose
This code recipe describes how to get, build, and use the NAMD* Scalable Molecular Dynamics code for the Intel® Xeon Phi™ Coprocessor.
Introduction
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++* parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 200,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER*, CHARMM*, and X-PLOR*.
NAMD is distributed free of charge with source code. Users can build NAMD or download binaries for a wide variety of platforms. Tutorials show how to use NAMD and VMD* for biomolecular modeling. Find out more about NAMD at http://www.ks.uiuc.edu/Research/namd/.
Code Support for Intel® Xeon Phi™ Coprocessor
NAMD 2.10 with Intel® Xeon Phi™ Coprocessor support is expected to be released in early to mid 2014. With support for Intel® many-integrated core (MIC) architecture, Intel expects to push NAMD performance and scalability to higher limits on Intel® architecture. Currently the code remains in development, but it can be compiled from nightly source code builds. Pre-built binaries are not available at this time.
NAMD code for Intel Xeon Phi Coprocessor continues to evolve. Intel developers are diligently working on known issues in order to achieve the project goals of performance and scalability on Intel Xeon Phi Coprocessor.
Code Access
To get access to the NAMD for Intel Xeon Phi Coprocessor code:
- Download the original code at http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD and select Source Code under Version Nightly Build.
Build Directions
To build NAMD you also need the following libraries.
- TCL (http://www.tcl.tk/);
- FFTW (http://www.fftw.org/) , use fftw2 version (if you want you can try fftw3 version):
./configure --enable-float --enable-type-prefix --enable-static --prefix=<fftwBaseDirHere> --disable-fortran CC=icc
make CFLAGS=" -O2 " clean install
- CHARM ++ (http://charm.cs.uiuc.edu/software/) can be built in 2 ways:
- Infiniband (verbs-linux-x86_64-smp-iccstatic) version:
./build charm++ verbs-linux-x86_64 smp iccstatic --with-production
Notes: check where your ibverbs lib is, if it is not in /opt/ofed/lib64 or /usr/local/ofed/lib64 directories you need to change [charmDir]/src/arch/verbs-linux-x86_64/conv-mach.sh file - MPI (mpi-linux-x86_64-smp-mpicxx) version:
./build charm++ mpi-linux-x86_64 smp mpicxx --with-production -DCMK_OPTIMIZE -DMPICH_IGNORE_CXX_SEEK
- Infiniband (verbs-linux-x86_64-smp-iccstatic) version:
NAMD build instructions for the Intel Xeon Phi Coprocessor version are essentially the same as compiling standard NAMD, with the following changes:
Note: You can obtain Intel® Composer XE Version 13 from https://registrationcenter.intel.com/regcenter/register.aspx, or register at https://software.intel.com/en-us/ to get a free 30-day evaluation copy.
Notes: using make’s "-j" option will speedup compilation significantly.
Running NAMD Workloads on Intel Xeon Phi Coprocessor
Running NAMD on Intel Xeon Phi Coprocessor is much like running the standard NAMD code, with the following exceptions:
- Source the Intel® compiler, so libraries can be found.
- Setup the following extra environment variables:
export KMP_AFFINITY=granularity=fine,compact export MIC_ENV_PREFIX=MIC export MIC_OMP_NUM_THREADS=240 export MIC_KMP_AFFINITY=granularity=fine,balanced
- To execute NAMD, on the namd2 command line, add +devices xxx, where xxx is a list of devices (e.g. "0,1" for the first two devices on a node). If the user omits the "+devices xxx" option at runtime, the application will attempt to use all available devices on a given node.
- The number of PE’s per node must be > number of MICs in the node, and there must be at least one patch per PE.
Host threads and PEs are part of the command line options traditionally used.
Some examples of running NAMD workloads:
- Ibverbs:
$BIN_DIR/charmrun ++nodelist $NODEFILE +p $NUM_PROCS
++ppn $PPN $BIN_DIR/wrapper.sh $BIN_DIR/$BIN $WORKLOAD_DIR/$CONFIG_FILE +pemap 1-$PPN +commap 0 "+devices 0,1"
PPN – for best results use 1 less than the number of available cores, for example PPN=23 if you have 24 cores per node(or PPN=47 if you use hyperthreading5)
NUM_PROCS = $PPN * $ NODECOUNT
- MPI:
mpiexec.hydra -perhost 1 -n $NODECOUNT
$BIN_DIR/$BIN +ppn $PPN $WORKLOAD_DIR/$CONFIG_FILE +pemap 1-$PPN +commap 0 +devices 0,1
Notes: "+pemap 1-$PPN +commap 0" more effective than "+setcpuaffinity"
Performance Testing2,3
The following results show performance on a single node and cluster.
Single-node Performance Testing
Note: Single-node performance uses the multi-core build of NAMD (no network layers are used).
Single-node Platform Configurations4
The following hardware and software were used for the above recipe and performance testing.
Server Configuration (Intel® Xeon® processor E5 V2 family):
- 2-socket/24 cores:
- Processor: Intel® Xeon® processor E5-2697 V2 @ 2.70GHz (12 cores) with Intel® Hyper-Threading5
- Operating System: Red Hat Enterprise Linux* 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
- Memory: 64GB
- Coprocessor: 2X Intel® Xeon Phi™ Coprocessor 7120P: 61 cores @ 1.238 GHz, 4-way Intel Hyper-Threading5, Memory: 15872 MB
- Intel® Many-core Platform Software Stack Version 2.1.6720-15
- Intel® C++ Compiler Version 13.1.3 20130607 (2013.5.192)
Server Configuration (Intel® Xeon® processor E5 family):
- 2-socket/16 cores:
- Processor: Intel® Xeon® processor E5 @ 2.60GHz (8 cores) with Intel® Hyper-Threading5
- Operating System: Red Hat Enterprise Linux* 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
- Memory: 64GB
- Coprocessor: 2X Intel® Xeon Phi™ Coprocessor 7120P: 61 cores @ 1.238 GHz, 4-way Intel Hyper-Threading5, Memory: 15872 MB
- Intel® Many-core Platform Software Stack Version 2.1.6720-13
- Intel® C++ Compiler Version 13.1.3 20130607 (2013.5.192)
NAMD
- NAMD: Linux-x64_64-icc
- Charm++: multicore-linux64-icc
- Configuration parameters were modified to achieve optimal performance4
Cluster Performance Testing2,3
Note: Cluster results use Infiniband*.
Cluster Platform Configuration4
The following hardware and software were used for the above recipe and performance testing.
Endeavor Cluster Configuration:
- 2-socket/24 cores:
- Processor: Intel® Xeon® processor E5-2697 V2 @ 2.70GHz (12 cores) with Intel® Hyper-Threading5
- Operating System: Red Hat Enterprise Linux* 2.6.32-358.6.2.el6.x86_64.crt1 #4 SMP Fri May 17 15:33:33 MDT 2013 x86_64 x86_64 x86_64 GNU/Linux
- Memory: 64GB
- Coprocessor: 2X Intel® Xeon Phi™ Coprocessor 7120P: 61 cores @ 1.238 GHz, 4-way Intel Hyper-Threading5, Memory: 15872 MB
- Intel® Many-core Platform Software Stack Version 2.1.6720-16
- Intel® C++ Compiler Version 13.1.3 20130607 (2013.5.192)
NAMD
- NAMD: Linux-x64_64-icc
- Charm++: verbs-linux-x86_64-smp-iccstatic
- Configuration parameters were modified to achieve optimal performance4
DISCLAIMERS:
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
2. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
3. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
4. For more information go to http://www.intel.com/performance
5. Available on select Intel® processors. Requires an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading.
Intel, the Intel logo, Xeon and Xeon Phi are trademarks of Intel Corporation in the US and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2014 Intel Corporation. All rights reserved.