Recipe: Building and running NEMO* on Intel® Xeon Phi™ Processors


About NEMO*

The NEMO* (Nucleus for European Modelling of the Ocean) numerical solutions framework encompasses models of ocean, sea ice, tracers, and biochemistry equations and their related physics. It also incorporates the pre- and post-processing tools and the interface to other components of the Earth System. NEMO allows several ocean-related components of the Earth System to work together or separately, and also allows for two-way nesting via AGRIF software. It is interfaced with the remaining components of the Earth System package (atmosphere, land surfaces, and so on) via the OASIS coupler.

This recipe shows the performance advantages of using the Intel® Xeon Phi™ processor 7250.

NEMO 3.6 is the current stable version.

Downloading the Code

  1. Download the NEMO source code from the official NEMO repository (you should register at www.nemo-ocean.eu ):

    svn co –r 6939  http://forge.ipsl.jussieu.fr/nemo/svn/branches/2015/nemo_v3_6_STABLE/NEMOGCM nemo

  2. Download the XIOS IO server from the official XIOS repository:

    svn co -r 703 http://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/branchs/xios-1.0 xios

  3. If your system has NetCDF libraries with Fortran bindings already installed and they link with NEMO and XIOS binaries, go to the section “Building XIOS for the Intel Xeon Processor”:
  4. NetCDF-Fortran  from https://github.com/Unidata/netcdf-fortran/archive/netcdf-fortran-4.2.tar.gz

Building Additional Libraries for the Intel® Xeon® Processor

  1. First, choose a directory for your experiments, such as “~/NEMO-BDW”:
    export base=”~/NEMO-BDW”
  2. Create a directory and copy all required libraries in $base:
    mkdir -p $base/libraries
  3. Unpack the tarball files in $base/libraries/src.
  4. To build an Intel® Advanced Vector Extensions 2 (Intel® AVX2) version of libraries, set:
    export arch="-xCORE-AVX2"
  5. Set the following environment variables:
    export PREFIX=$base/libraries
    export CFLAGS="-I$PREFIX/include -L$PREFIX/lib –O3 -g -traceback -openmp ${arch} -fPIC"
    export FFFLAGS=$CFLAGS
    export FCFLAGS=$CFLAGS
    export LDFLAGS="-L$PREFIX/lib -openmp ${arch} -fPIC"
    export FC=mpiifort
    export CXX=mpiicc
    export CC=mpiicc
    export CPP="icc -E"
  6. Build szip:
    cd $base/libraries/src/szip-2.1
    ./configure --prefix=$PREFIX
    make -j 4
    make install
  7. Build zlib:
    cd $base/libraries/src/zlib-1.2.8
    ./configure --prefix=$PREFIX
    make –j 4
    make install
  8. Build HDF5:
    cd $base/libraries/src/hdf5-1.8.12
    ./configure --with-zlib=$PREFIX --prefix=$PREFIX --enable-fortran --with-szlib=$PREFIX --enable-hl
    make install
  9. Build CURL:
    cd $base/libraries/src/curl- 7.42.1
    ./configure --prefix=$PREFIX
    make –j 4
    make install
  10. Build NetCDF:
    cd $base/libraries/src/netcdf-4.3.3
    export LIBS=" -lhdf5_hl -lhdf5 -lz -lsz -lmpi"
    export LD_FLAGS+=" -L$PREFIX/lib"
    ./configure --prefix=$PREFIX
    make install
  11. Build NetCDF Fortran wrapper:
    cd $base/libraries/src/netcdf-fortran-4.2/
    export LIBS=""
    export CFLAGS="$CFLAGS -lnetcdf"
    export FFFLAGS=$CFLAGS
    export FCFLAGS=$CFLAGS
    export FC=ifort
    export CXX=mpiicc
    export CC=mpiicc
    export LDFLAGS+=" -L$I_MPI_ROOT/lib64/"
    ./configure --prefix=$PREFIX
    make install

Building XIOS for the Intel Xeon Processor

  1. Copy XIOS source code to $base/xios
  2. Create files:
  3. Add the following lines to the $base/xios/arch/arch-ifort_linux.env file:
    export NETCDF_INC_DIR=$base/libraries/include
    export NETCDF_LIB_DIR=$base/libraries/lib
    export HDF5_INC_DIR=$base/libraries/include
    export HDF5_LIB_DIR=$base/libraries/lib
  4. Add the following lines to the $base/xios/arch/arch-ifort_linux.fcm file:
    %NCDF_INC            -I$base/libraries/include
    %NCDF_LIB            -L$base/libraries/lib -lnetcdff -lnetcdf -lhdf5 -lcurl -lz -lsz
    %FC                  mpiifort
    %FCFLAGS             -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %FFLAGS              -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %LD                  mpiifort
    %FPPFLAGS            -P -C -traditional
    %LDFLAGS             -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %AR                  ar
    %ARFLAGS             -r
    %MK                  gmake
    %USER_INC            %NCDF_INC_DIR
    %USER_LIB            %NCDF_LIB_DIR
    %MAKE                gmake
    %BASE_LD        -lstdc++ -lifcore -lintlc
    %LINKER         mpiifort -nofor-main
    %BASE_INC       -D__NONE__
    %CCOMPILER      mpiicc
    %FCOMPILER      mpiifort
    %CPP            cpp
    %FPP            cpp -P
    %BASE_CFLAGS    -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %PROD_CFLAGS    -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %DEV_CFLAGS    -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %DEBUG_CFLAGS  -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %BASE_FFLAGS   -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %PROD_FFLAGS    -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %DEV_FFLAGS    -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
    %DEBUG_FFLAGS   -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
  5. Add the following lines to the $base/xios/arch/arch-ifort_linux.path file:
    NETCDF_LIB="-lnetcdff -lnetcdf -lcurl"
    HDF5_LIB="-lhdf5_hl -lhdf5 -lz -lcurl"
  6. Change directory to $base/xios and execute the following command:
    ./make_xios --full --prod --arch ifort_linux

Building NEMO for the Intel Xeon Processor and Preparing Workloads

  1. Copy NEMO source code to $base/nemo
  2. Apply the following patch to file $base/nemo/NEMOGCM/ NEMO/OPA_SRC/nemogcm.F90:
    @@ -116,6 +116,7 @@
           !!              Madec, 2008, internal report, IPSL.
           INTEGER ::   istp       ! time step index
    +DOUBLE PRECISION :: mpi_wtime, sstart, send
     #if defined key_agrif
    @@ -163,18 +164,19 @@
     #if defined key_agrif
               CALL Agrif_Regrid()
              DO WHILE ( istp <= nitend .AND. nstop == 0 )
    +sstart = mpi_wtime()
     #if defined key_agrif
                 CALL stp                         ! AGRIF: time stepping
                 CALL stp( istp )                 ! standard time stepping
    +print *, "Step ", istp, " - " , send-sstart , "s."
                 istp = istp + 1
                 IF( lk_mpp )   CALL mpp_max( nstop )
              END DO
           IF( lk_diaobs   )   CALL dia_obs_wri
           IF( ln_icebergs )   CALL icb_end( nitend )
  3. Create the file $base/nemo/ARCH/arch-mpiifort_linux.fcm and add the following lines:
    %NCDF_INC            -I/$base/libraries/include
    %NCDF_LIB            -L$base/libraries/lib -lnetcdff -lnetcdf -lz -lcurl -lhdf5_hl -lhdf5 -lz -lcurl
    %CPP                 icc -E
    %FC                  mpiifort
    %FCFLAGS          -r8 -g -traceback -qopenmp -O3 -xCORE-AVX2 -g -traceback
    %FFLAGS             -r8 -g -traceback -qopenmp -O3 -xCORE-AVX2 -g -traceback
    %LD                  mpiifort
    %FPPFLAGS            -P -C -traditional
    %LDFLAGS             -lstdc++ -lifcore -O3 -xCORE-AVX2 -g -traceback
    %AR                  ar
    %ARFLAGS             -r
    %MK                  gmake
    %XIOS_INC            -I$base/xios/inc
    %XIOS_LIB            -L$base/xios/lib -lxios
    %USER_INC            %NCDF_INC %XIOS_INC
    %USER_LIB            %NCDF_LIB %XIOS_LIB
  4. Build the binary for the GYRE workload:
    cd $base/nemo/NEMOGCM/CONFIG
    ./makenemo -n GYRE -m mpiifort_linux -j 4
  5. Create a sandbox directory for the GYRE runs:
    1.  mkdir -p $base/nemo/gyre-exp
       cp –r $base/nemo/NEMOGCM/CONFIG/GYRE/BLD/bin/nemo.exe $base/nemo/gyre-exp
       cp -r $base/nemo/NEMOGCM/CONFIG/GYRE/EXP00/* $base/nemo/gyre-exp
    2. Switch creating mesh files to off by changing “nn_msh” to 0 in namelist_ref file
    3. Enable benchmark mode by changing “nn_bench” to 1 in namelist_ref  file.
    4. Set the following parameters in the “&namcfg” section:
      jp_cfg = 70
      jpidta = 2102
      jpjdta = 1402
      jpkdta = 31
      jpiglo = 2102
    5. Switch off using the IO server in the iodef.xml file (“using_server = false”)
  6. Build a binary for the ORCA025 workload:
    1. Change  “$base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/cpp_ORCA2_LIM3.fcm” content to “bld::tool::fppkeys key_trabbl key_vvl key_dynspg_ts key_ldfslp key_traldf_c2d key_traldf_eiv key_dynldf_c3d key_zdfddm key_zdftmx key_mpp_mpi key_zdftke key_lim3 key_iomput”
    2. Change the line “ORCA2_LIM3 OPA_SRC LIM_SRC_3 NST_SRC” to “ORCA2_LIM3 OPA_SRC LIM_SRC_3” in file $base/nemo/NEMOGCM/CONFIG/cfg.txt
    3. ./makenemo -n ORCA2_LIM3 -m mpiifort_linux -j 4
  7. Go to the Barcelona Supercomputing Center (in Spanish), and in section 9 locate the paragraph, “PREGUNTAS Y RESPUESTAS:” with a path to the ftp server and credentials to log in.
  8. Download the BenchORCA025L75.tar.gz file from directory Benchmarks_aceptacion/NEMO/
  9. Extract the contents of the tarball file to $base/nemo/orca-exp
  10. Copy the NEMO binary to the sandbox directory:
    cp $base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/BLD/bin/nemo.exe $base/nemo/orca-exp
  11. Edit the file $base/nemo/orca-exp/iodef.xml and add the following lines into the “<context id="xios">    <variable_definition>” section:
    <variable id="min_buffer_size" type="int">994473778</variable><variable id="buffer_size" type="int">994473778</variable> 
  12. In the file namelist_ref in section “&namrun” set the following variables:
    nn_itend     =   10
    nn_stock    =    10
    nn_write    =    10
  13. Copy the $base/nemo/NEMOGCM/CONFIG/SHARED/namelist_ref file to $base/nemo/exp-orca
  14. Switch off using the IO server in the iodef.xml file (“using_server = false”)
  15. To build the KNL binaries change “-xCORE-AVX2” to “-xMIC-AVX512”, change $base to another directory, and do all of the steps again.

Running the GYRE Workload with the Intel Xeon Processor

  1. Go to $base/nemo/gyre-exp
  2. Source the environment variables for the compiler and the Intel® MPI Library:
    source /opt/intel/compiler/latest/bin/compilervars.sh intel64
    source /opt/intel/impi/latest/bin/compilervars.sh intel64
  3. Add libraries to LD_LIBRARY_PATH:
    export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
  4. Set additional variables for the Intel MPI Library:
    export I_MPI_FABRICS=shm:tmi
    export I_MPI_PIN_CELL=core
  5. Run NEMO:
    mpiexec.hyrda –genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe

Running the ORCA025 Workload with the Intel Xeon Processor

  1. Go to $base/nemo/orca-exp
  2. Source the environment variables for the compiler and the Intel MPI Library:
    source /opt/intel/compiler/latest/bin/compilervars.sh intel64
    source /opt/intel/impi/latest/bin/compilervars.sh intel64
  3. Add libraries to LD_LIBRARY_PATH:
    export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
  4. Set additional variables for Intel MPI Library:
    export I_MPI_FABRICS=shm:tmi
    export I_MPI_PIN_CELL=core
  5. Run NEMO:
    mpiexec.hyrda –genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe
  6. If you are faced with hangs while the application is running you can run NEMO with the XIOS server in detached mode:
    1. Copy xios_server.exe from $base/xios/bin to $base/nemo/orca-exp
    2. Edit iodef.xml file and set “using_server = true”
    3. mpiexec.hy–da -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe : -n 2 ./xios_server.exe

Building Additional Libraries for the Intel® Xeon Phi™ Processor

  1. First, choose a directory for your experiments, such as “~/NEMO-KNL”
    export base=”~/NEMO-KNL”
  2. Create the directory and copy all required libraries in $base:
    mk–ir -p $base/libraries
  3. Unpack the tarball files in $base/libraries/src
  4. To build an Intel AVX2 version of libraries, set:
    export a”ch="-xMIC-AV”512"
  5. Set the following environment variables:
     export PREFIX=$base/libraries
     export CFL”GS="-I$PREFIX/incl–de -L$PREFIX/lib –O3–-g -traceb–ck -openmp ${ar–h} -”PIC"
     export CPPFLAGS=$CFLAGS
     export CXXFLAGS=$CFLAGS
     export FFFLAGS=$CFLAGS
     export FCFLAGS=$CFLAGS
     export LDFL”GS="-L$PREFIX/–ib -openmp ${ar–h} -”PIC"
     export FC=mpiifort
     export CXX=mpiicc
     export CC=mpiicc
     export ”PP="–c” -E"
  6. Build szip:
     cd $base/libraries/src/szip-2.1
     ./config–e --prefix=$PREFIX
     m–ke -j 4
     make install
  7. Build zlib:
    cd $base/libraries/src/zlib-1.2.8
    ./config–e --prefix=$PREFIX
    make –j 4
    make install
  8. Build HDF5:
    cd $base/libraries/src/hdf5-1.8.12
    ./config–e --with-zlib=$PRE–X --prefix=$PRE–X --enable-fort–n --with-szlib=$PRE–X --enable-hl
    make install
  9. Build CURL:
    cd $base/libraries/src/curl- 7.42.1
    ./config–e --prefix=$PREFIX
    make –j 4
    make install
  10. Build NetCDF:
    cd $base/libraries/src/netcdf-4.3.3
    export L”B–=" -lhdf5–hl -lh–f5 –lz -–sz -”mpi"
    export LD_FLA”S–=" -L$PREFIX”lib"
    ./config–e --prefix=$PREFIX
    make install
  11. Build the NetCDF Fortran wrapper:
    cd $base/libraries/src/netcdf-fortran-4.2/
    export L””S=""
    export CFL”GS="$CFL–GS -lne”cdf"
    export FFFLAGS=$CFLAGS
    export FCFLAGS=$CFLAGS
    export FC=ifort
    export CXX=mpiicc
    export CC=mpiicc
    export LDFLA”S–=" -L$I_MPI_ROOT/li”64/"
    ./config–e --prefix=$PREFIX
    make install

Building XIOS for the Intel Xeon Phi Processor

  1. Copy XIOS source code to $base/xios
  2. Create files:
  3. Add the following lines to the $base/xios/arch/arch-ifort_linux.env file:
    export NETCDF_INC_DIR=$base/libraries/include
    export NETCDF_LIB_DIR=$base/libraries/lib
    export HDF5_INC_DIR=$base/libraries/include
    export HDF5_LIB_DIR=$base/libraries/lib
  4. Add the following lines to the $base/xios/arch/arch-ifort_linux.fcm file:
    %NCDF_INC            -I$base/libraries/include
    %NCDF_LIB            -L$base/libraries/–ib -lnetc–ff -lnet–df -lh–f5 -lc–rl –lz -lsz
    %FC                  mpiifort
    %FCFLAGS             –O3–-g -traceback –xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %FFLAGS              –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %LD                  mpiifort
    %FPPFLAGS           –-P–-C -traditional
    %LDFLAGS             –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %AR                  ar
    %ARFLAGS             -r
    %MK                  gmake
    %USER_INC            %NCDF_INC_DIR
    %USER_LIB            %NCDF_LIB_DIR
    %MAKE                gmake
    %BASE_LD        -lstdc++ -lifc–re -lintlc
    %LINKER         mpiif–rt -nofor-main
    %BASE_INC       -D__NONE__
    %CCOMPILER      mpiicc
    %FCOMPILER      mpiifort
    %CPP            cpp
    %FPP            –pp -P
    %BASE_CFLAGS    –O3–-g -traceb–ck - xMIC-AVX512-I$base/libraries/incl–de -L$base/libraries/lib
    %PROD_CFLAGS   –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %DEV_CFLAGS    –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %DEBUG_CFL–GS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %BASE_FFLAGS   –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %PROD_FFLAGS    –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %DEV_FFLAGS    –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
    %DEBUG_FFLAGS   –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
  5. Add the following lines to the $base/xios/arch/arch-ifort_linux.path file:
    NETCDF_”IB="-lnetc–ff -lnet–df -l”url"
    HDF5_”IB="-lhdf5–hl -lh–f5 –lz -l”url"
  6. Change the directory to $base/xios and execute the following command:
    ./make_x–s --f–l --p–d --arch ifort_linux

Building NEMO for the Intel Xeon Phi Processor and Preparing Workloads

  1. Copy the NEMO source code to $base/nemo
  2. Apply the following patch to file $base/nemo/NEMOGCM/ NEMO/OPA_SRC/nemogcm.F90:
    @@ -116,6 +116,7 @@
           !!              Madec, 2008, internal report, IPSL.
           INTEGER ::   istp       ! time step index
    +DOUBLE PRECISION :: mpi_wtime, sstart, send
     #if defined key_agrif
    @@ -163,18 +164,19 @@
     #if defined key_agrif
               CALL Agrif_Regrid()
              DO WHILE ( istp <= nitend .AND. nstop == 0 )
    +sstart = mpi_wtime()
     #if defined key_agrif
                 CALL stp                         ! AGRIF: time stepping
                 CALL stp( istp )                 ! standard time stepping
    +print“*, "S“ep ", is“p– “ - " , send-sstar“ ,”"s."
                 istp = istp + 1
                 IF( lk_mpp )   CALL mpp_max( nstop )
              END DO
           IF( lk_diaobs   )   CALL dia_obs_wri
           IF( ln_icebergs )   CALL icb_end( nitend )
  3. Create the file $base/nemo/ARCH/arch-mpiifort_linux.fcm and add the following lines:
    %NCDF_INC            -I/$base/libraries/include
    %NCDF_LIB            -L$base/libraries/–ib -lnetc–ff -lnet–df –lz -lc–rl -lhdf5–hl -lh–f5 –lz -lcurl
    %CPP                 –cc -E
    %FC                  mpiifort
    %FCFLAGS          –r8–-g -traceb–ck -qope–mp –O3 - xMIC-AVX–12–-g -traceback
    %FFLAGS             –r8–-g -traceb–ck -qope–mp –O3 - xMIC-AVX–12–-g -traceback
    %LD                  mpiifort
    %FPPFLAGS           –-P–-C -traditional
    %LDFLAGS             -lstdc++ -lifc–re –O3 - xMIC-AVX–12–-g -traceback
    %AR                  ar
    %ARFLAGS             -r
    %MK                  gmake
    %XIOS_INC            -I$base/xios/inc
    %XIOS_LIB            -L$base/xios/–ib -lxios
    %USER_INC            %NCDF_INC %XIOS_INC
    %USER_LIB            %NCDF_LIB %XIOS_LIB
  4. Build the binary for the GYRE workload:
    cd $base/nemo/NEMOGCM/CONFIG
    ./maken–mo -n G–RE -m mpiifort_li–ux -j 4
  5. Create a sandbox directory for the GYRE runs:
    1. mk–ir -p $base/nemo/gyre-exp
      cp –r $base/nemo/NEMOGCM/CONFIG/GYRE/BLD/bin/nemo.exe $base/nemo/gyre-exp–cp -r $base/nemo/NEMOGCM/CONFIG/GYRE/EXP00/* $base/nemo/gyre-exp
    2. Switch off creating mesh files by changing “nn_msh” to 0 in the namelist_ref file
    3. Enable benchmark mode by changing “nn_bench” to 1 in the namelist_ref  file.
    4. Set the following parameters in the “&namcfg” section:
      jp_cfg = 70
      jpidta = 2102
      jpjdta = 1402
      jpkdta = 31
      jpiglo = 2102
      jpjglo = 1402
    5. Switch off using the IO server in the iodef.xml file (“using_server = false”)
  6. Build the binary for ORCA025 workload:
    1. Change  $base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/cpp_ORCA2_LIM3.fcm content to “bld::tool::fppkeys key_trabbl key_vvl key_dynspg_ts key_ldfslp key_traldf_c2d key_traldf_eiv key_dynldf_c3d key_zdfddm key_zdftmx key_mpp_mpi key_zdftke key_lim3 key_iomput”
    2. Change line “ORCA2_LIM3 OPA_SRC LIM_SRC_3 NST_SRC” to “ORCA2_LIM3 OPA_SRC LIM_SRC_3” in the file $base/nemo/NEMOGCM/CONFIG/cfg.txt 
    3. ./maken–mo -n ORCA2_L–M3 -m mpiifort_li–ux -j 4
  7. Go to the Barcelona Supercomputing Center (in Spanish), and in section 9 locate the paragraph, “PREGUNTAS Y RESPUESTAS:” with the path to the ftp server and credentials to log in.
  8. Download the BenchORCA025L75.tar.gz file from the Benchmarks_aceptacion/NEMO/ directory
  9. Extract the contents of the tarball file to $base/nemo/orca-exp
  10. Copy the NEMO binary to the sandbox directory:
    cp $base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/BLD/bin/nemo.exe $base/nemo/orca-exp
  11. Edit the file $base/nemo/orca-exp/iodef.xml and add the following lines into the “<context”id="”ios">    <variable_definition>” section:
    <variable”id="min_buffer_”ize" t”pe=”int">994473778</variable><variable”id="buffer_”ize" t”pe=”int">994473778</variable>
  12. In the file namelist_ref in section “&namrun” set the following variables:
    nn_itend    =  10
    nn_stock    =    10
    nn_write    =    10
  13. Copy the $base/nemo/NEMOGCM/CONFIG/SHARED/namelist_ref file to the $base/nemo/exp-orca directory
  14. Switch off using the IO server in the iodef.xml file (“using_server = false”)
  15. To build the KNL binaries, change “-xCORE- to “-xMIC-AVX512”, change $base to another directory, and do all of the steps again.

Running the GYRE Workload with the Intel Xeon Phi Processor

  1. Go to $base/nemo/gyre-exp
  2. Source the environment variables for the compiler and Intel MPI Library:
    source /opt/intel/compiler/latest/bin/compilervars.sh intel64
    source /opt/intel/impi/latest/bin/compilervars.sh intel64
  3. Add the libraries to LD_LIBRARY_PATH:
    export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
  4. Set additional variables for Intel MPI Library:
    export I_MPI_FABRICS=shm:tmi
    export I_MPI_PIN_CELL=core
  5. Run NEMO:
    mpiexec.hyrda -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe

Running the ORCA025 Workload with the Intel Xeon Phi Processor

  1. Go to $base/nemo/orca-exp
  2. Source environment variables for the compiler and Intel MPI Library:
    source /opt/intel/compiler/latest/bin/compilervars.sh intel64
    source /opt/intel/impi/latest/bin/compilervars.sh intel64
  3. Add libraries to LD_LIBRARY_PATH:
    export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
  4. Set additional variables for the Intel MPI Library:
    export I_MPI_FABRICS=shm:tmi
    export I_MPI_PIN_CELL=core
  5. Run NEMO:
    mpiexec.hyrda -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe
  6. If you are faced with hangs while the application is running you can run NEMO with the XIOS server in detached mode:
    1. Copy xios_server.exe from $base/xios/bin to $base/nemo/orca-exp
    2. Edit iodef.xml file and set “using_server = true”
    3. mpiexec.hyrda -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe : -n 2 ./xios_server.exe

Configuring Test Systems


Dual-socket Intel® Xeon® processor E5-2697 v4, 2.3 GHz (turbo OFF), 18 cores/socket, 36 cores, 72 threads (HT on)

Intel® Xeon Phi™ processor 7250, 68 core, 136 threads, 1400 MHz core freq. (turbo OFF), 1700 MHz uncore freq.


128 GB (8 x 16 GB) DDR4 2400 DDR4 DIMMs

96 GB (6 x 16 GB) DDR4 2400 MHz  RDIMMS

Cluster File System Abstract

Intel® Enterprise Edition for Lustre* software SSD (Intel® EE for Lustre* software) SSD (136 TB storage)

Intel® Enterprise Edition for Lustre* software SSD (Intel® EE for Lustre* software) SSD (136 TB storage)


Intel® Omni-Path Architecture (Intel® OPA) Si 100 series

Intel® Omni-Path Architecture (Intel® OPA) Si 100 series

OS / Kernel / IB stack

Oracle Linux* server release 7.2

Kernel: 3.10.0-229.20.1.el6.x86_64.knl2

OFED version:

Oracle Linux server release 7.2

Kernel: 3.10.0-229.20.1.el6.x86_64.knl2

OFED Version

  • NEMO configuration: V3.6 r6939 with XIOS 1.0 r703, Intel® Parallel Studio XE, Intel MPI Library 2017 for Linux*
  • MPI configuration:
    • I_MPI_FABRICS=shm:tmi
    • I_MPI_PIN_CELL=core

Performance Results for the Intel Xeon Processor and Intel Xeon Phi Processor

    1. Time of second step for GYRE workload:

# nodesIntel® Xeon® ProcessorIntel® Xeon Phi™ Processor








    2. Time of second step for ORCA workload:

# nodesIntel® Xeon® processorIntel® Xeon Phi™ processor

