LAMMPS* for Intel® Xeon Phi™ Coprocessor

Purpose

This code recipe describes how to get, build, and use the LAMMPS* code for the Intel® Xeon Phi™ coprocessor.

Introduction

Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS*) is a classical molecular dynamics code. LAMMPS has potentials for solid-state materials (metals, semiconductors), soft matter (biomolecules, polymers), and coarse-grained or mesoscopic systems. LAMMPS can be used to model atoms, or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

LAMMPS runs on single processors or in parallel using message-passing techniques with a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.

LAMMPS is distributed as open source code under the terms of the GNU Public License. The current version can be downloaded at http://lammps.sandia.gov/download.html. Links are also included to older F90/F77 versions. Periodic releases are also available on SourceForge*.

LAMMPS is distributed by Sandia National Laboratories, a U.S. Department of Energy laboratory. The main authors of LAMMPS are listed on the LAMMPS site along with contact info and other contributors. Find out more about LAMMPS at http://lammps.sandia.gov.

Code Support for Intel® Xeon Phi™ coprocessor

LAMMPS* with Intel Xeon Phi coprocessor support is expected to be released as an Intel-optimized package between July and September of 2014. The release will include support for potentials to allow simulation of soft matter, biomolecules, and materials. Contact your Intel representative about access prior to September, 2014.

Build Directions

Building LAMMPS for Intel Xeon Phi coprocessor is similar to a normal LAMMPS build. A makefile supporting offload and vectorization for CPU routines will be included. An example build will include the following code:

> source /opt/intel/compiler/2013_sp1.1.106/bin/iccvars.sh intel64 > source /opt/intel/impi/4.1.2.040/bin64/mpivars.sh > cd src > make yes-user-intel > make intel_offload > echo "LAMMPS executable is src/lmp_intel_offload"

Run Directions

To run LAMMPS on Intel Xeon Phi coprocessor:

Edit your run script, like you would with other packages (OPT, GPU, USER-OMP). See the figure below.
Run LAMMPS as you would normally. The modified code handles the offloading to the coprocessor. See the figure below.

LAMMPS will simulate the time evolution of the input system of atoms or other particles, as specified in the input script, writing data, including atom positions, thermodynamic quantities, and other statistics computations.

Expected output and performance can be checked with comparison to the log files in the examples directory provided.

Optimizations and Usage Model

A load balancer offloads part of neighbor-list and non-bond force calculations to the Intel Xeon Phi coprocessor for concurrent calculations with the CPU. This is achieved by using the offload API to run calculations well suited for many-core chips on both the CPU and the coprocessor. In this model, the same C++ routine is run twice, once with an offload flag, to support concurrent calculations.

The dynamic load balancing allows for concurrent 1) data transfer between host and coprocessor, 2) calculations of neighbor-list, non-bond, bond, and long-range terms, and 3) some MPI* communications. It continuously updates the fraction of offloaded work to minimize idle times. The Standard LAMMPS "fix" object manages concurrency and synchronization.

The Intel package adds support for single, mixed, and double-precision calculations on both CPU and coprocessor, and vectorization (AVX on CPU / 512-bit vectorization on Phi™). This can provide significant speedups for the routines on the CPU, too.

Performance Testing

The advantages using the Intel package are illustrated below with comparison to the baseline MPI/OpenMP* routines in LAMMPS and the optimized routines running on the CPU only or the CPU with offload to the coprocessor. Results are provided for the Rhodopsin* benchmark distributed with LAMMPS scaled to 256,000 atoms.

The Rhodopsin benchmark simulates the movement of a protein in the retina that plays an important role in the perception of light. The protein is simulated in a solvated lipid bilayer using the CHARMM* force field with Particle-Particle Particle-Mesh long-range electrostatics and SHAKE* constraints. The simulation is performed at a temperature of 300K and pressure of 1 atm. The results on a single node and 32 nodes of the Endeavor cluster (configuration below) are shown, demonstrating a speedup of up to 2.15X when using the Intel Xeon Phicoprocessor.

Figure Right: Rhodopsin protein benchmark with atoms in initial configuration.

Testing Platform Configurations

The following hardware was used for the above recipe and performance testing.

Endeavor Cluster Configuration:

2-socket/24 cores:
Processor: Intel® Xeon® processor E5-2697 V2 @ 2.70GHz (12 cores) with Intel® Hyper-Threading Technology4
Network: InfiniBand* Architecture Fourteen Data Rate (FDR)
Operating System: Red Hat Enterprise Linux* 2.6.32-358.el6.x86_64.crt1 #4 SMP Fri May 17 15:33:33 MDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Memory: 64GB
Coprocessor: 2X Intel Xeon Phi coprocessor 7120P: 61 cores @ 1.238 GHz, 4-way Intel Hyper-Threading Technology, Memory: 15872 MB
Intel® Many-core Platform Software Stack Version 2.1.6720-19
Intel® Compiler 2013 SP1.1.106 (icc version 14.0.1)
Compile flags: -O3 -xHost -fno-alias -fno-omit-frame-pointer -unroll-aggressive -opt-prefetch -mP2OPT_hpo_fast_reduction=F -offload-option.mic.compiler,"-fimf-domain-exclusion=15 -mGLOB_default_function_attrs=\"gather_scatter_loop_unroll=5\"

LAMMPS* for Intel® Xeon Phi™ Coprocessor

Purpose

Introduction

Code Support for Intel® Xeon Phi™ coprocessor

Build Directions

Run Directions

Optimizations and Usage Model

Performance Testing

Testing Platform Configurations

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112