About NEMO*
The NEMO* (Nucleus for European Modelling of the Ocean) numerical solutions framework encompasses models of ocean, sea ice, tracers, and biochemistry equations and their related physics. It also incorporates the pre- and post-processing tools and the interface to other components of the Earth System. NEMO allows several ocean-related components of the Earth System to work together or separately, and also allows for two-way nesting via AGRIF software. It is interfaced with the remaining components of the Earth System package (atmosphere, land surfaces, and so on) via the OASIS coupler.
This recipe shows the performance advantages of using the Intel® Xeon Phi™ processor 7250.
NEMO 3.6 is the current stable version.
Downloading the Code
- Download the NEMO source code from the official NEMO repository (you should register at www.nemo-ocean.eu ):
svn co –r 6939 http://forge.ipsl.jussieu.fr/nemo/svn/branches/2015/nemo_v3_6_STABLE/NEMOGCM nemo
- Download the XIOS IO server from the official XIOS repository:
svn co -r 703 http://forge.ipsl.jussieu.fr/ioserver/svn/XIOS/branchs/xios-1.0 xios
- If your system has NetCDF libraries with Fortran bindings already installed and they link with NEMO and XIOS binaries, go to the section “Building XIOS for the Intel Xeon Processor”:
- szip 2.1 from https://support.hdfgroup.org/ftp/lib-external/szip/2.1/src/szip-2.1.tar.gz
- zlib 1.2.8 from http://www.zlib.net/fossils/zlib-1.2.8.tar.gz
- HDF5 1.8.12 from https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.12/src/hdf5-1.8.12.tar.gz
- CURL 7.42.1 from https://curl.haxx.se/download/curl-7.42.1.tar.gz
- NetCDF-C 4.3.3 from https://github.com/Unidata/netcdf-c/archive/v4.3.3.tar.gz
- NetCDF-Fortran from https://github.com/Unidata/netcdf-fortran/archive/netcdf-fortran-4.2.tar.gz
Building Additional Libraries for the Intel® Xeon® Processor
- First, choose a directory for your experiments, such as “~/NEMO-BDW”:
export base=”~/NEMO-BDW”
- Create a directory and copy all required libraries in $base:
mkdir -p $base/libraries
- Unpack the tarball files in $base/libraries/src.
- To build an Intel® Advanced Vector Extensions 2 (Intel® AVX2) version of libraries, set:
export arch="-xCORE-AVX2"
- Set the following environment variables:
export PREFIX=$base/libraries export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${PREFIX}/lib export CFLAGS="-I$PREFIX/include -L$PREFIX/lib –O3 -g -traceback -openmp ${arch} -fPIC" export CPPFLAGS=$CFLAGS export CXXFLAGS=$CFLAGS export FFFLAGS=$CFLAGS export FCFLAGS=$CFLAGS export LDFLAGS="-L$PREFIX/lib -openmp ${arch} -fPIC" export FC=mpiifort export CXX=mpiicc export CC=mpiicc export CPP="icc -E"
- Build szip:
cd $base/libraries/src/szip-2.1 ./configure --prefix=$PREFIX make -j 4 make install
- Build zlib:
cd $base/libraries/src/zlib-1.2.8 ./configure --prefix=$PREFIX make –j 4 make install
- Build HDF5:
cd $base/libraries/src/hdf5-1.8.12 ./configure --with-zlib=$PREFIX --prefix=$PREFIX --enable-fortran --with-szlib=$PREFIX --enable-hl make make install
- Build CURL:
cd $base/libraries/src/curl- 7.42.1 ./configure --prefix=$PREFIX make –j 4 make install
- Build NetCDF:
cd $base/libraries/src/netcdf-4.3.3 export LIBS=" -lhdf5_hl -lhdf5 -lz -lsz -lmpi" export LD_FLAGS+=" -L$PREFIX/lib" ./configure --prefix=$PREFIX make make install
- Build NetCDF Fortran wrapper:
cd $base/libraries/src/netcdf-fortran-4.2/ export LIBS="" export CFLAGS="$CFLAGS -lnetcdf" export CPPFLAGS=$CFLAGS export CXXFLAGS=$CFLAGS export FFFLAGS=$CFLAGS export FCFLAGS=$CFLAGS export FC=ifort export CXX=mpiicc export CC=mpiicc export LDFLAGS+=" -L$I_MPI_ROOT/lib64/" ./configure --prefix=$PREFIX make make install
Building XIOS for the Intel Xeon Processor
- Copy XIOS source code to $base/xios
- Create files:
$base/xios/arch/arch-ifort_linux.env $base/xios/arch/arch-ifort_linux.fcm $base/xios/arch/arch-ifort_linux.path
- Add the following lines to the $base/xios/arch/arch-ifort_linux.env file:
export NETCDF_INC_DIR=$base/libraries/include export NETCDF_LIB_DIR=$base/libraries/lib export HDF5_INC_DIR=$base/libraries/include export HDF5_LIB_DIR=$base/libraries/lib
- Add the following lines to the $base/xios/arch/arch-ifort_linux.fcm file:
%NCDF_INC -I$base/libraries/include %NCDF_LIB -L$base/libraries/lib -lnetcdff -lnetcdf -lhdf5 -lcurl -lz -lsz %FC mpiifort %FCFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %FFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %LD mpiifort %FPPFLAGS -P -C -traditional %LDFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %AR ar %ARFLAGS -r %MK gmake %USER_INC %NCDF_INC_DIR %USER_LIB %NCDF_LIB_DIR %MAKE gmake %BASE_LD -lstdc++ -lifcore -lintlc %LINKER mpiifort -nofor-main %BASE_INC -D__NONE__ %CCOMPILER mpiicc %FCOMPILER mpiifort %CPP cpp %FPP cpp -P %BASE_CFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %PROD_CFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %DEV_CFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %DEBUG_CFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %BASE_FFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %PROD_FFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %DEV_FFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib %DEBUG_FFLAGS -O3 -g -traceback -xCORE-AVX2 -I$base/libraries/include -L$base/libraries/lib
- Add the following lines to the $base/xios/arch/arch-ifort_linux.path file:
NETCDF_INCDIR="-I $NETCDF_INC_DIR" NETCDF_LIBDIR="-L $NETCDF_LIB_DIR" NETCDF_LIB="-lnetcdff -lnetcdf -lcurl" MPI_INCDIR="" MPI_LIBDIR="" MPI_LIB="" HDF5_INCDIR="-I $HDF5_INC_DIR" HDF5_LIBDIR="-L $HDF5_LIB_DIR" HDF5_LIB="-lhdf5_hl -lhdf5 -lz -lcurl"
- Change directory to $base/xios and execute the following command:
./make_xios --full --prod --arch ifort_linux
Building NEMO for the Intel Xeon Processor and Preparing Workloads
- Copy NEMO source code to $base/nemo
- Apply the following patch to file $base/nemo/NEMOGCM/ NEMO/OPA_SRC/nemogcm.F90:
@@ -116,6 +116,7 @@ !! Madec, 2008, internal report, IPSL. !!---------------------------------------------------------------------- INTEGER :: istp ! time step index +DOUBLE PRECISION :: mpi_wtime, sstart, send !!---------------------------------------------------------------------- ! #if defined key_agrif @@ -163,18 +164,19 @@ #if defined key_agrif CALL Agrif_Regrid() #endif - DO WHILE ( istp <= nitend .AND. nstop == 0 ) +sstart = mpi_wtime() #if defined key_agrif CALL stp ! AGRIF: time stepping #else CALL stp( istp ) ! standard time stepping #endif +send=mpi_wtime() +print *, "Step ", istp, " - " , send-sstart , "s." istp = istp + 1 IF( lk_mpp ) CALL mpp_max( nstop ) END DO #endif - IF( lk_diaobs ) CALL dia_obs_wri ! IF( ln_icebergs ) CALL icb_end( nitend )
- Create the file $base/nemo/ARCH/arch-mpiifort_linux.fcm and add the following lines:
%NCDF_INC -I/$base/libraries/include %NCDF_LIB -L$base/libraries/lib -lnetcdff -lnetcdf -lz -lcurl -lhdf5_hl -lhdf5 -lz -lcurl %CPP icc -E %FC mpiifort %FCFLAGS -r8 -g -traceback -qopenmp -O3 -xCORE-AVX2 -g -traceback %FFLAGS -r8 -g -traceback -qopenmp -O3 -xCORE-AVX2 -g -traceback %LD mpiifort %FPPFLAGS -P -C -traditional %LDFLAGS -lstdc++ -lifcore -O3 -xCORE-AVX2 -g -traceback %AR ar %ARFLAGS -r %MK gmake %XIOS_INC -I$base/xios/inc %XIOS_LIB -L$base/xios/lib -lxios %USER_INC %NCDF_INC %XIOS_INC %USER_LIB %NCDF_LIB %XIOS_LIB
- Build the binary for the GYRE workload:
cd $base/nemo/NEMOGCM/CONFIG ./makenemo -n GYRE -m mpiifort_linux -j 4
- Create a sandbox directory for the GYRE runs:
mkdir -p $base/nemo/gyre-exp cp –r $base/nemo/NEMOGCM/CONFIG/GYRE/BLD/bin/nemo.exe $base/nemo/gyre-exp cp -r $base/nemo/NEMOGCM/CONFIG/GYRE/EXP00/* $base/nemo/gyre-exp
- Switch creating mesh files to off by changing “nn_msh” to 0 in namelist_ref file
- Enable benchmark mode by changing “nn_bench” to 1 in namelist_ref file.
- Set the following parameters in the “&namcfg” section:
jp_cfg = 70 jpidta = 2102 jpjdta = 1402 jpkdta = 31 jpiglo = 2102
- Switch off using the IO server in the iodef.xml file (“using_server = false”)
- Build a binary for the ORCA025 workload:
- Change “$base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/cpp_ORCA2_LIM3.fcm” content to “bld::tool::fppkeys key_trabbl key_vvl key_dynspg_ts key_ldfslp key_traldf_c2d key_traldf_eiv key_dynldf_c3d key_zdfddm key_zdftmx key_mpp_mpi key_zdftke key_lim3 key_iomput”
- Change the line “ORCA2_LIM3 OPA_SRC LIM_SRC_3 NST_SRC” to “ORCA2_LIM3 OPA_SRC LIM_SRC_3” in file $base/nemo/NEMOGCM/CONFIG/cfg.txt
- ./makenemo -n ORCA2_LIM3 -m mpiifort_linux -j 4
- Go to the Barcelona Supercomputing Center (in Spanish), and in section 9 locate the paragraph, “PREGUNTAS Y RESPUESTAS:” with a path to the ftp server and credentials to log in.
- Download the BenchORCA025L75.tar.gz file from directory Benchmarks_aceptacion/NEMO/
- Extract the contents of the tarball file to $base/nemo/orca-exp
- Copy the NEMO binary to the sandbox directory:
cp $base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/BLD/bin/nemo.exe $base/nemo/orca-exp
- Edit the file $base/nemo/orca-exp/iodef.xml and add the following lines into the “<context id="xios"> <variable_definition>” section:
<variable id="min_buffer_size" type="int">994473778</variable><variable id="buffer_size" type="int">994473778</variable>
- In the file namelist_ref in section “&namrun” set the following variables:
nn_itend = 10 nn_stock = 10 nn_write = 10
- Copy the $base/nemo/NEMOGCM/CONFIG/SHARED/namelist_ref file to $base/nemo/exp-orca
- Switch off using the IO server in the iodef.xml file (“using_server = false”)
- To build the KNL binaries change “-xCORE-AVX2” to “-xMIC-AVX512”, change $base to another directory, and do all of the steps again.
Running the GYRE Workload with the Intel Xeon Processor
- Go to $base/nemo/gyre-exp
- Source the environment variables for the compiler and the Intel® MPI Library:
source /opt/intel/compiler/latest/bin/compilervars.sh intel64 source /opt/intel/impi/latest/bin/compilervars.sh intel64
- Add libraries to LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
- Set additional variables for the Intel MPI Library:
export I_MPI_FABRICS=shm:tmi export I_MPI_PIN_CELL=core
- Run NEMO:
mpiexec.hyrda –genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe
Running the ORCA025 Workload with the Intel Xeon Processor
- Go to $base/nemo/orca-exp
- Source the environment variables for the compiler and the Intel MPI Library:
source /opt/intel/compiler/latest/bin/compilervars.sh intel64 source /opt/intel/impi/latest/bin/compilervars.sh intel64
- Add libraries to LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
- Set additional variables for Intel MPI Library:
export I_MPI_FABRICS=shm:tmi export I_MPI_PIN_CELL=core
- Run NEMO:
mpiexec.hyrda –genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe
- If you are faced with hangs while the application is running you can run NEMO with the XIOS server in detached mode:
- Copy xios_server.exe from $base/xios/bin to $base/nemo/orca-exp
- Edit iodef.xml file and set “using_server = true”
- mpiexec.hy–da -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe : -n 2 ./xios_server.exe
Building Additional Libraries for the Intel® Xeon Phi™ Processor
- First, choose a directory for your experiments, such as “~/NEMO-KNL”
export base=”~/NEMO-KNL”
- Create the directory and copy all required libraries in $base:
mk–ir -p $base/libraries
- Unpack the tarball files in $base/libraries/src
- To build an Intel AVX2 version of libraries, set:
export a”ch="-xMIC-AV”512"
- Set the following environment variables:
export PREFIX=$base/libraries export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${PREFIX}/lib export CFL”GS="-I$PREFIX/incl–de -L$PREFIX/lib –O3–-g -traceb–ck -openmp ${ar–h} -”PIC" export CPPFLAGS=$CFLAGS export CXXFLAGS=$CFLAGS export FFFLAGS=$CFLAGS export FCFLAGS=$CFLAGS export LDFL”GS="-L$PREFIX/–ib -openmp ${ar–h} -”PIC" export FC=mpiifort export CXX=mpiicc export CC=mpiicc export ”PP="–c” -E"
- Build szip:
cd $base/libraries/src/szip-2.1 ./config–e --prefix=$PREFIX m–ke -j 4 make install
- Build zlib:
cd $base/libraries/src/zlib-1.2.8 ./config–e --prefix=$PREFIX make –j 4 make install
- Build HDF5:
cd $base/libraries/src/hdf5-1.8.12 ./config–e --with-zlib=$PRE–X --prefix=$PRE–X --enable-fort–n --with-szlib=$PRE–X --enable-hl make make install
- Build CURL:
cd $base/libraries/src/curl- 7.42.1 ./config–e --prefix=$PREFIX make –j 4 make install
- Build NetCDF:
cd $base/libraries/src/netcdf-4.3.3 export L”B–=" -lhdf5–hl -lh–f5 –lz -–sz -”mpi" export LD_FLA”S–=" -L$PREFIX”lib" ./config–e --prefix=$PREFIX make make install
- Build the NetCDF Fortran wrapper:
cd $base/libraries/src/netcdf-fortran-4.2/ export L””S="" export CFL”GS="$CFL–GS -lne”cdf" export CPPFLAGS=$CFLAGS export CXXFLAGS=$CFLAGS export FFFLAGS=$CFLAGS export FCFLAGS=$CFLAGS export FC=ifort export CXX=mpiicc export CC=mpiicc export LDFLA”S–=" -L$I_MPI_ROOT/li”64/" ./config–e --prefix=$PREFIX make make install
Building XIOS for the Intel Xeon Phi Processor
- Copy XIOS source code to $base/xios
- Create files:
$base/xios/arch/arch-ifort_linux.env $base/xios/arch/arch-ifort_linux.fcm $base/xios/arch/arch-ifort_linux.path
- Add the following lines to the $base/xios/arch/arch-ifort_linux.env file:
export NETCDF_INC_DIR=$base/libraries/include export NETCDF_LIB_DIR=$base/libraries/lib export HDF5_INC_DIR=$base/libraries/include export HDF5_LIB_DIR=$base/libraries/lib
- Add the following lines to the $base/xios/arch/arch-ifort_linux.fcm file:
%NCDF_INC -I$base/libraries/include %NCDF_LIB -L$base/libraries/–ib -lnetc–ff -lnet–df -lh–f5 -lc–rl –lz -lsz %FC mpiifort %FCFLAGS –O3–-g -traceback –xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %FFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %LD mpiifort %FPPFLAGS –-P–-C -traditional %LDFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %AR ar %ARFLAGS -r %MK gmake %USER_INC %NCDF_INC_DIR %USER_LIB %NCDF_LIB_DIR %MAKE gmake %BASE_LD -lstdc++ -lifc–re -lintlc %LINKER mpiif–rt -nofor-main %BASE_INC -D__NONE__ %CCOMPILER mpiicc %FCOMPILER mpiifort %CPP cpp %FPP –pp -P %BASE_CFLAGS –O3–-g -traceb–ck - xMIC-AVX512-I$base/libraries/incl–de -L$base/libraries/lib %PROD_CFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %DEV_CFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %DEBUG_CFL–GS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %BASE_FFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %PROD_FFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %DEV_FFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib %DEBUG_FFLAGS –O3–-g -traceb–ck - xMIC-AVX–12 -I$base/libraries/incl–de -L$base/libraries/lib
- Add the following lines to the $base/xios/arch/arch-ifort_linux.path file:
NETCDF_INC”IR="-I $NETCDF_INC”DIR" NETCDF_LIB”IR="-L $NETCDF_LIB”DIR" NETCDF_”IB="-lnetc–ff -lnet–df -l”url" MPI_INC””R="" MPI_LIB””R="" MPI_””B="" HDF5_INC”IR="-I $HDF5_INC”DIR" HDF5_LIB”IR="-L $HDF5_LIB”DIR" HDF5_”IB="-lhdf5–hl -lh–f5 –lz -l”url"
- Change the directory to $base/xios and execute the following command:
./make_x–s --f–l --p–d --arch ifort_linux
Building NEMO for the Intel Xeon Phi Processor and Preparing Workloads
- Copy the NEMO source code to $base/nemo
- Apply the following patch to file $base/nemo/NEMOGCM/ NEMO/OPA_SRC/nemogcm.F90:
@@ -116,6 +116,7 @@ !! Madec, 2008, internal report, IPSL. !!---------------------------------------------------------------------- INTEGER :: istp ! time step index +DOUBLE PRECISION :: mpi_wtime, sstart, send !!---------------------------------------------------------------------- ! #if defined key_agrif @@ -163,18 +164,19 @@ #if defined key_agrif CALL Agrif_Regrid() #endif - DO WHILE ( istp <= nitend .AND. nstop == 0 ) +sstart = mpi_wtime() #if defined key_agrif CALL stp ! AGRIF: time stepping #else CALL stp( istp ) ! standard time stepping #endif +send=mpi_wtime() +print“*, "S“ep ", is“p– “ - " , send-sstar“ ,”"s." istp = istp + 1 IF( lk_mpp ) CALL mpp_max( nstop ) END DO #endif - IF( lk_diaobs ) CALL dia_obs_wri ! IF( ln_icebergs ) CALL icb_end( nitend )
- Create the file $base/nemo/ARCH/arch-mpiifort_linux.fcm and add the following lines:
%NCDF_INC -I/$base/libraries/include %NCDF_LIB -L$base/libraries/–ib -lnetc–ff -lnet–df –lz -lc–rl -lhdf5–hl -lh–f5 –lz -lcurl %CPP –cc -E %FC mpiifort %FCFLAGS –r8–-g -traceb–ck -qope–mp –O3 - xMIC-AVX–12–-g -traceback %FFLAGS –r8–-g -traceb–ck -qope–mp –O3 - xMIC-AVX–12–-g -traceback %LD mpiifort %FPPFLAGS –-P–-C -traditional %LDFLAGS -lstdc++ -lifc–re –O3 - xMIC-AVX–12–-g -traceback %AR ar %ARFLAGS -r %MK gmake %XIOS_INC -I$base/xios/inc %XIOS_LIB -L$base/xios/–ib -lxios %USER_INC %NCDF_INC %XIOS_INC %USER_LIB %NCDF_LIB %XIOS_LIB
- Build the binary for the GYRE workload:
cd $base/nemo/NEMOGCM/CONFIG ./maken–mo -n G–RE -m mpiifort_li–ux -j 4
- Create a sandbox directory for the GYRE runs:
mk–ir -p $base/nemo/gyre-exp cp –r $base/nemo/NEMOGCM/CONFIG/GYRE/BLD/bin/nemo.exe $base/nemo/gyre-exp–cp -r $base/nemo/NEMOGCM/CONFIG/GYRE/EXP00/* $base/nemo/gyre-exp
- Switch off creating mesh files by changing “nn_msh” to 0 in the namelist_ref file
- Enable benchmark mode by changing “nn_bench” to 1 in the namelist_ref file.
- Set the following parameters in the “&namcfg” section:
jp_cfg = 70 jpidta = 2102 jpjdta = 1402 jpkdta = 31 jpiglo = 2102 jpjglo = 1402
- Switch off using the IO server in the iodef.xml file (“using_server = false”)
- Build the binary for ORCA025 workload:
- Change $base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/cpp_ORCA2_LIM3.fcm content to “bld::tool::fppkeys key_trabbl key_vvl key_dynspg_ts key_ldfslp key_traldf_c2d key_traldf_eiv key_dynldf_c3d key_zdfddm key_zdftmx key_mpp_mpi key_zdftke key_lim3 key_iomput”
- Change line “ORCA2_LIM3 OPA_SRC LIM_SRC_3 NST_SRC” to “ORCA2_LIM3 OPA_SRC LIM_SRC_3” in the file $base/nemo/NEMOGCM/CONFIG/cfg.txt
- ./maken–mo -n ORCA2_L–M3 -m mpiifort_li–ux -j 4
- Go to the Barcelona Supercomputing Center (in Spanish), and in section 9 locate the paragraph, “PREGUNTAS Y RESPUESTAS:” with the path to the ftp server and credentials to log in.
- Download the BenchORCA025L75.tar.gz file from the Benchmarks_aceptacion/NEMO/ directory
- Extract the contents of the tarball file to $base/nemo/orca-exp
- Copy the NEMO binary to the sandbox directory:
cp $base/nemo/NEMOGCM/CONFIG/ORCA2_LIM3/BLD/bin/nemo.exe $base/nemo/orca-exp
- Edit the file $base/nemo/orca-exp/iodef.xml and add the following lines into the “<context”id="”ios"> <variable_definition>” section:
<variable”id="min_buffer_”ize" t”pe=”int">994473778</variable><variable”id="buffer_”ize" t”pe=”int">994473778</variable>
- In the file namelist_ref in section “&namrun” set the following variables:
nn_itend = 10 nn_stock = 10 nn_write = 10
- Copy the $base/nemo/NEMOGCM/CONFIG/SHARED/namelist_ref file to the $base/nemo/exp-orca directory
- Switch off using the IO server in the iodef.xml file (“using_server = false”)
- To build the KNL binaries, change “-xCORE- to “-xMIC-AVX512”, change $base to another directory, and do all of the steps again.
Running the GYRE Workload with the Intel Xeon Phi Processor
- Go to $base/nemo/gyre-exp
- Source the environment variables for the compiler and Intel MPI Library:
source /opt/intel/compiler/latest/bin/compilervars.sh intel64 source /opt/intel/impi/latest/bin/compilervars.sh intel64
- Add the libraries to LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
- Set additional variables for Intel MPI Library:
export I_MPI_FABRICS=shm:tmi export I_MPI_PIN_CELL=core
- Run NEMO:
mpiexec.hyrda -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe
Running the ORCA025 Workload with the Intel Xeon Phi Processor
- Go to $base/nemo/orca-exp
- Source environment variables for the compiler and Intel MPI Library:
source /opt/intel/compiler/latest/bin/compilervars.sh intel64 source /opt/intel/impi/latest/bin/compilervars.sh intel64
- Add libraries to LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$base/libraries/lib/:$LD_LIBRARY_PATH
- Set additional variables for the Intel MPI Library:
export I_MPI_FABRICS=shm:tmi export I_MPI_PIN_CELL=core
- Run NEMO:
mpiexec.hyrda -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe
- If you are faced with hangs while the application is running you can run NEMO with the XIOS server in detached mode:
- Copy xios_server.exe from $base/xios/bin to $base/nemo/orca-exp
- Edit iodef.xml file and set “using_server = true”
- mpiexec.hyrda -genvall –f <hostfile> -n <number of ranks> -perhost <ppn> ./nemo.exe : -n 2 ./xios_server.exe
Configuring Test Systems
CPU | Dual-socket Intel® Xeon® processor E5-2697 v4, 2.3 GHz (turbo OFF), 18 cores/socket, 36 cores, 72 threads (HT on) | Intel® Xeon Phi™ processor 7250, 68 core, 136 threads, 1400 MHz core freq. (turbo OFF), 1700 MHz uncore freq. |
RAM | 128 GB (8 x 16 GB) DDR4 2400 DDR4 DIMMs | 96 GB (6 x 16 GB) DDR4 2400 MHz RDIMMS |
Cluster File System Abstract | Intel® Enterprise Edition for Lustre* software SSD (Intel® EE for Lustre* software) SSD (136 TB storage) | Intel® Enterprise Edition for Lustre* software SSD (Intel® EE for Lustre* software) SSD (136 TB storage) |
Interconnect | Intel® Omni-Path Architecture (Intel® OPA) Si 100 series | Intel® Omni-Path Architecture (Intel® OPA) Si 100 series |
OS / Kernel / IB stack | Oracle Linux* server release 7.2 Kernel: 3.10.0-229.20.1.el6.x86_64.knl2 OFED version: 10.2.0.0.158_72 | Oracle Linux server release 7.2 Kernel: 3.10.0-229.20.1.el6.x86_64.knl2 OFED Version 10.2.0.0.158_72 |
- NEMO configuration: V3.6 r6939 with XIOS 1.0 r703, Intel® Parallel Studio XE 17.0.0.098, Intel MPI Library 2017 for Linux*
- MPI configuration:
- I_MPI_FABRICS=shm:tmi
- I_MPI_PIN_CELL=core
Performance Results for the Intel Xeon Processor and Intel Xeon Phi Processor
1. Time of second step for GYRE workload:
# nodes | Intel® Xeon® Processor | Intel® Xeon Phi™ Processor |
---|---|---|
1 | 6.546229 | 3.642156 |
2 | 3.011352 | 2.075075 |
4 | 1.326501 | 0.997129 |
8 | 0.640632 | 0.492369 |
16 | 0.321378 | 0.284348 |
2. Time of second step for ORCA workload:
# nodes | Intel® Xeon® processor | Intel® Xeon Phi™ processor |
---|---|---|
2 | 5.764083 | |
4 | 2.642725 | 2.156876 |
8 | 1.305238 | 1.0546 |
16 | 0.67725 | 0.643372 |