Recipe: ROME1.0/SML for the Intel® Xeon Phi™ Processor 7250

Overview

This article provides a recipe for how to obtain, compile, and run ROME1.0 SML on Intel® Xeon® processors and Intel® Xeon Phi™ processors. Before you run SML, you need to run the MAP processing phase first, because SML will use the output of MAP. So this document also describes how to run MAP as well as SML. Please follow the instructions below to run the MAP and SML workloads.

The source and test workloads for this version of ROME can be downloaded from: http://ipccsb.dfci.harvard.edu/rome/download.html.

Introduction

ROME (Refinement and Optimization via Machine lEarning for cryo-EM) is one of the major research software packages from the Dana-Farber Cancer Institute. ROME is a parallel computing software system dedicated to high-resolution cryo-EM structure determination and data analysis, implementing advanced machine learning approaches optimized for HPC clusters. ROME 1.0 introduces SML (statistical manifold learning)-based deep classification, following MAP-based (maximum a posteriori) image alignment. More information about ROME can found at http://ipccsb.dfci.harvard.edu/rome/index.html.

The ROME system has be optimized for both Intel® Xeon® processors and Intel® Xeon Phi™ processors. Detailed information about the underlying algorithms and optimizations can be found at http://arxiv.org/abs/1604.04539.

In this document, we used three workloads: Inflammasome, RP-a and RP-b. The workload descriptions are as follows:

Inflammasome data: 16306 images of NLRC4/NAIP2 inflammasome with a size of 250² pixels
RP-a: 57001 images of proteasome regulatory particles (RP) with a size of 160² pixels
RP-b: 35407 images of proteasome regulatory particles (RP) with a size of 160² pixels

In these documents, we use “ring11_all” to refer to the Inflammasome workload, “data6” to refer to the RP-a workload, and “data8” to refer to the RP-b workload.

Preliminaries

To match these results, the Intel Xeon Phi processor machine needs to be booted with BIOS settings for quad cluster mode and MCDRAM cache mode. Please review this document for further information. The Intel Xeon processor system does not need to be started in any special manner.
To build this package, install the Intel® MPI Library for Linux* 5.1(Update 3) and Intel® Parallel Studio XE Composer Edition for C++ Linux* Version 2016 (Update 3) or higher products on your systems.
Download the source ROME1.0a.tar.gz from http://ipccsb.dfci.harvard.edu/rome/download.html
Unpack the source code to /home/users.

> cp ROME1.0a.tar.gz /home/users > tar –xzvf ROME1.0a.tar.gz
The workloads are provided by the Intel® Parallel Computing Center for Structural Biology (http://ipccsb.dfci.harvard.edu/). As noted above, the workloads can be downloaded from http://ipccsb.dfci.harvard.edu/rome/download.html. Following the EMPIAR-10069 link, download Inf_data1.* (Set 1) and rename them ring11_all.*. Download RP_data2.* (Set 2) and rename them data8.*. Download RP_data4.* (Set 4) and rename them data6.*. The scripts referred to below can be obtained by pulling the file KNL_LAUNCH.tgz from http://ipccsb.dfci.harvard.edu/rome/download.html
Copy the workloads and run scripts to your home directory. You should have the following files:

>cp ring11_all.star /home/users >cp ring11_all.mrcs /home/users >cp data6.star /home/users >cp data6.mrcs /home/users >cp data8.star /home/users >cp data8.mrcs /home/users >cp run_ring11_all_map_XEON.sh /home/users >cp run_ring11_all_sml_XEON.sh /home/users >cp run_ring11_all_map_XEONPHI.sh /home/users >cp run_ring11_all_sml_XEONPHI.sh /home/users >cp run_data6_map_XEON.sh /home/users >cp run_data6_sml_XEON.sh /home/users >cp run_data6_map_XEONPHI.sh /home/users >cp run_data6_sml_XEONPHI.sh /home/users >cp run_data8_map_XEON.sh /home/users >cp run_data8_sml_XEON.sh /home/users >cp run_data8_map_XEONPHI.sh /home/users >cp run_data8_sml_XEONPHI.sh /home/users

Prepare the binaries for the Intel Xeon processor and the Xeon Phi processor

Set up the Intel® MPI Library and Intel® C++ Compiler environments:

> source /opt/intel/impi/<version>/bin64/mpivars.sh > source /opt/intel/composer_xe_<version>/bin/compilervars.sh intel64 > source /opt/intel/mkl/<version>/bin/mklvars.sh intel64
Set environment variables for compilation of ROME:

>export ROME_CC=mpiicpc
Build the binaries for the Intel Xeon processor.

>cd /home/users/ROME1.0a >make >mkdir bin >mv rome_map bin/rome_map >mv rome_sml bin/rome_sml
Build the binaries for the Intel Xeon Phi processor.

>cd /home/users/ROME1.0a >vi makefile Modify FLAGS to below: FLAGS := -mkl -fopenmp -O3 -xMIC-AVX512 -DNDEBUG -std=c++11 >make >mkdir bin_knl >mv rome_map bin_knl/rome_map >mv rome_sml bin_knl/rome_sml

Run the test workloads on the Intel Xeon processor (an Intel® Xeon® processor E5-2697 v4 is assumed by the scripts)

Running the ROME MAP phase for these workloads:

Running workload1: ring11_all >cd /home/users/ >sh run_ring11_all_map_XEON.sh

Running workload2: data6 >cd /home/users/ >sh run_data6_map_XEON.sh

Running workload3: data8 >cd /home/users/ >sh run_data8_map_XEON.sh
Running the ROME SML phase for these workloads:

Running workload1: ring11_all >cd /home/users/ >sh run_ring11_all_sml_XEON.sh

Running workload2: data6 >cd /home/users/ >sh run_data6_sml_XEON.sh

Running workload3: data8 >cd /home/users/ >sh run_data8_sml_XEON.sh

Run the test workloads on the Intel Xeon Phi processor

Running the ROME MAP phase for these workloads:

>cd /home/users/ Running workload1: ring11_all >cd /home/users/ >sh run_ring11_all_map_XEONPHI.sh

Running workload2: data6 >cd /home/users/ >sh run_data6_map_XEONPHI.sh

Running workload3: data8 >cd /home/users/ >sh run_data8_map_XEONPHI.sh
Running ROME SML phase for these workloads:

Running workload1: ring11_all >cd /home/users/ >sh run_ring11_all_sml_XEONPHI.sh

Running workload2: data6 >cd /home/users/ >sh run_data6_sml_XEONPHI.sh

Running workload3: data8 >cd /home/users/ >sh run_data8_sml_XEONPHI.sh

Performance gain seen with ROME SML

For the workloads we described above, the following graph shows the speedups achieved from running this code on the Intel Xeon Phi processor. As you can see, up to a 2.37x speedup for the ring11_all workload can be achieved when running this code on one Intel® Xeon Phi™ processor 7250 versus one two-socket Intel Xeon processor E5-2697 v4. The data used below were stored on a Lustre* file system.

Speedups achieved from running this code on the Intel Xeon Phi processor

Testing platform configuration:

Intel Xeon processor E5-2697 v4: BDW-EP node with dual sockets, 18 cores/socket HT enabled @2.3 GHz 145W (Intel Xeon processor E5-2697 v4 w/128 GB RAM), Red Hat Enterprise Linux Server release 6.7 (Santiago)

Intel Xeon Phi processor 7250 (68 cores): Intel Xeon Phi processor 7250 68 core, 272 threads, 1400 MHz core freq. MCDRAM 16 GB 7.2 GT/s, DDR4 96 GB 2400 MHz, Red Hat Enterprise Linux Server release 6.7 (Santiago), quad cluster mode, MCDRAM cache mode.

Recipe: ROME1.0/SML for the Intel® Xeon Phi™ Processor 7250

Overview

Introduction

Preliminaries

Prepare the binaries for the Intel Xeon processor and the Xeon Phi processor

Run the test workloads on the Intel Xeon processor (an Intel® Xeon® processor E5-2697 v4 is assumed by the scripts)

Run the test workloads on the Intel Xeon Phi processor

Performance gain seen with ROME SML

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List