Quantcast
Channel: Intel Developer Zone Articles
Viewing all 3384 articles
Browse latest View live

Getting Started with Intel® Data Analytics Acceleration Library 2019 for macOS* (Beta)

$
0
0

Intel® Data Analytics Acceleration Library (Intel® DAAL) is the library of Intel® architecture optimized building blocks covering all stages of data analytics: data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.

Intel DAAL is installed standalone and as part of the following suites:

Intel DAAL is also provided as a standalone package under the Community Licensing Program.

Prerequisites

System Requirements.

Install Intel DAAL on Your System

Intel DAAL installs in the directory <install dir>/daal.

By default, <install dir> is /opt/intel/compilers_and_libraries_2018.x.xxx/mac.

For installation details, refer to Intel DAAL Installation Guide.

Set Environment Variables

  1. Run the <install dir>/daal/bin/daalvars.sh script as appropriate to your target architecture:

    • IA-32 architecture:

      daalvars.sh ia32

    • Intel® 64 architecture:

      daalvars.sh intel64

  2. Optionally: Specify the Java* compiler different from the default compiler:

    export JAVA_HOME=$PATH_TO_JAVA_SDK

    export PATH=$JAVA_HOME/bin:$PATH

C++ Language

Step 1: Choose the Compiler Option for Automatic Linking of Your Application with Intel DAAL

Decide on the variant of the the -daal option of the Intel® C++ Compiler 16 or higher or configure your project in the Integrated Development Environment (IDE):

Compiler Option

IDE Equivalent

daal or ‑daal=parallel

Tells the compiler to link with standard threaded Intel DAAL.

Xcode*:

  1. Go to Project> Build Settings> ICC InteL C++ Compiler XE yy.y> Performance Library Build Components> Use Intel Data Analytics Acceleration Library.
  2. Select Use threaded Intel Data Analytics Acceleration Library or Use non-threaded Intel Data Analytics Acceleration Library, as appropriate.

‑daal=sequential

Tells the compiler to link with sequential version of Intel DAAL.

For more information on the daal compiler option, see the Intel® Compiler User and Reference Guide.

Step 2: Create and Run Your First Application with Intel DAAL

This short application computes Cholesky decomposition with Intel DAAL.

/*******************************************************************************
!  Copyright(C) 2014-2017 Intel Corporation. All Rights Reserved.
!
!  The source code, information  and  material ("Material") contained herein is
!  owned  by Intel Corporation or its suppliers or licensors, and title to such
!  Material remains  with Intel Corporation  or its suppliers or licensors. The
!  Material  contains proprietary information  of  Intel or  its  suppliers and
!  licensors. The  Material is protected by worldwide copyright laws and treaty
!  provisions. No  part  of  the  Material  may  be  used,  copied, reproduced,
!  modified, published, uploaded, posted, transmitted, distributed or disclosed
!  in any way  without Intel's  prior  express written  permission. No  license
!  under  any patent, copyright  or  other intellectual property rights  in the
!  Material  is  granted  to  or  conferred  upon  you,  either  expressly,  by
!  implication, inducement,  estoppel or  otherwise.  Any  license  under  such
!  intellectual  property  rights must  be express  and  approved  by  Intel in
!  writing.
!
!  *Third Party trademarks are the property of their respective owners.
!
!  Unless otherwise  agreed  by Intel  in writing, you may not remove  or alter
!  this  notice or  any other notice embedded  in Materials by Intel or Intel's
!  suppliers or licensors in any way.
!
!*******************************************************************************
!  Content:
!    Cholesky decomposition sample program.
!******************************************************************************/

#include "daal.h"
#include <iostream>

using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace daal::services;

const size_t dimension = 3;
double inputArray[dimension *dimension] =
{
    1.0,  2.0,  4.0,
    2.0, 13.0, 23.0,
    4.0, 23.0, 77.0
};

int main(int argc, char *argv[])
{
    /* Create input numeric table from array */
    SharedPtr<NumericTable> inputData = SharedPtr<NumericTable>(new Matrix<double>(dimension, dimension, inputArray));

    /*  Create the algorithm object for computation of the Cholesky decomposition using the default method */
    cholesky::Batch<> algorithm;

    /* Set input for the algorithm */
    algorithm.input.set(cholesky::data, inputData);

    /* Compute Cholesky decomposition */
    algorithm.compute();

    /* Get pointer to Cholesky factor */
    SharedPtr<Matrix<double> > factor =
        staticPointerCast<Matrix<double>, NumericTable>(algorithm.getResult()->get(cholesky::choleskyFactor));

    /* Print the first element of the Cholesky factor */
    std::cout << "The first element of the Cholesky factor: "<< (*factor)[0][0];

    return 0;
}
  1. Paste the application code into the editor of your choice.

  2. Save the file as my_first_daal_program.cpp.

  3. Compile with the following command, providing the selected variant of the ‑daal compiler option, for example, ‑daal=parallel:

    icc my_first_daal_program.cpp -daal=parallel -o my_first_daal_program

  4. Run the application.

Step 3 (Optional): Build Your Application with Different Compilers

List the following Intel DAAL libraries on a link line, depending on Intel DAAL threading mode and linking method:

 

Single-threaded (non-threaded) Intel DAAL

Multi-threaded (internally threaded) Intel DAAL

Static linking

libdaal_core.a

libdaal_sequential.a

libdaal_core.a

libdaal_thread.a

Dynamic linking

libdaal_core.dylib

libdaal_sequential.dylib

libdaal_core.dylib

libdaal_thread.dylib

These libraries are located in the directory <install dir>/daal/lib.

Regardless of the linking method, also add to your link line the library on which Intel DAAL libraries depend:

  • Intel® Threading Building Blocks run-time library of the Intel® compiler libtbb.dylib

For example, to build your application by statically linking with multi-threaded Intel DAAL:

icc my_first_daal_program.cpp ‑o my_first_daal_program
$DAALROOT/lib/libdaal_core.a $DAALROOT/lib/libdaal_thread.a ‑ltbb -ldl

Step 4: Build and Run Intel DAAL Code Examples

  1. Build an example:

    Go to the C++ examples directory and execute the make command:

    cd <install dir>/daal/examples/cpp

    make {libia32|dylibia32|libintel64|dylibintel64}

            example=<example_name>

            compiler={intel|gnu|clang}

            threading={parallel|sequential}

            mode=build

    Among the {libia32|dylibia32|libintel64|dylibintel64} parameters, choose the one that matches the architecture parameter you provided to the daalvars.sh script and has the prefix that matches the type of executables you want to build: lib for static and dylib for dynamic executables.

    The names of the examples are available in the daal.lst file.

    The command creates a directory for the chosen compiler, architecture, and library extension (a or dylib). For example: _results/intel_intel64_a.

  2. Run an example:

    Go to the C++ examples directory and execute the make command in the run mode. For example, if you ran the daalvars script with the intel64 target:

    cd <install dir>/daal/examples/cpp

    make libintel64 example=cholesky_batch.cpp mode=run

    The make command builds the static library for the Intel 64 architecture and cholesky_batch.cpp example with Intel® compiler, assumed by default, and runs the executable.

Java* Language

Build and Run Intel DAAL Code Examples

To build and run Java code examples, use the version of the Java Virtual Machine* corresponding to the architecture parameter you provided to the daalvars.sh script during setting environment variables.

  1. Free 4 gigabytes of memory on your system.
  2. Build examples:

    Go to the Java examples directory and execute the launcher command with the build parameter:

    cd <install dir>/daal/examples/java

    launcher.sh build $PATH_TO_JAVAC

    The command builds executables *.class (for example, CholeskyBatch.class) in the

    <install dir>/daal/examples/java/com/intel/daal/examples/<example name> directory.

  3. Run examples:

    Go to the Java examples directory and execute the launcher command with the run parameter:

    cd <install dir>/daal/examples/java

    launcher.sh {ia32|intel64} run $PATH_TO_JAVAC

    Choose the same architecture parameter as you provided to the daalvars.sh script.

    The output for each example is written to the file <example name>.res located in the ./_results/ia32 or ./_results/intel64 directory, depending on the specified architecture.

Python* Language

Step 1: Set Up the Build Environment

Set up the C++ build environment as explained under C++ Language.

Step 2: Install Intel DAAL for Python

Go to the directory with Python sources of Intel DAAL and run the install script:

cd <install dir>/pydaal_sources

<python home>/python setup.py install

This script compiles code using Intel DAAL for C++. It builds and installs the pyDAAL package for using Intel DAAL in Python programs.

Step 3: Run Intel DAAL Code Examples

To run Intel DAAL code examples, use the same version of Python as you used to install pyDAAL.

  • Go to the directory with Intel DAAL Python examples:

    cd <install dir>/examples/python

  • To run all the examples, execute the command:

    <python home>/python run_examples.py

    The output for each example is written to the ./_results/intel64/<example name>.res file.

  • To run one specific example, execute the command:

    <python home>/python <algorithm name>/<example name>.py

    For example: /usr/local/bin/python3.5.1/python cholesky/cholesky_batch.py

    This command prints the output to your console.

 

Training and Documentation

To learn more about the product, see the following resources:

Resource

Description

Online Training

Get access to Intel DAAL in-depth webinars and featured articles.

Developer Guide for Intel® Data Analytics Acceleration Library:

Find recommendations on programming with Intel DAAL, including performance tips.

Intel® Data Analytics Acceleration Library API Reference

View detailed Application Programming Interface (API) descriptions for the following programming languages:

  • C++
  • Java*
  • Python*

Intel® Data Analytics Acceleration Library Installation Guide

Learn about installation options available for the product and get installation instructions.

Intel® Data Analytics Acceleration Library Release Notes

Learn about:

  • New features of the product
  • Directory layout
  • Hardware and software requirements

<install dir>/daal/examples folder

Get access to the collection of programs that demonstrate usage of Intel DAAL application programming interfaces.

Intel® Data Analytics Acceleration Library code samples

Get access to the collection of code samples for various algorithms that you can include in your program and immediately use with Hadoop*, Spark*, message-passing interface (MPI), or MySQL*.

Intel® Software Documentation Library

View full documentation library for this and other Intel software products.

Legal Information

Intel, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

Copyright 2014-2018 Intel Corporation.

This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.

This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.

 

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

 


Getting Started with Intel® Data Analytics Acceleration Library 2019 for Windows* (Beta)

$
0
0

Intel® Data Analytics Acceleration Library (Intel® DAAL) is the library of Intel® architecture optimized building blocks covering all stages of data analytics: data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.

Intel DAAL is installed standalone and as part of the following suites:

Intel DAAL is also provided as a standalone package under the Community Licensing Program.

Prerequisites

System Requirements.

Install Intel DAAL on Your System

Intel DAAL installs in the directory <install dir>\daal.

By default, <install dir> is C:\Program files (x86)\IntelSWTools\compilers_and_libraries_2019.x.xxx\windows.

For installation details, refer to Intel DAAL Installation Guide.

Set Environment Variables

  1. Run the <install dir>\daal\bin\daalvars.bat script as appropriate to your target architecture:

    • IA-32 architecture:

      daalvars.bat ia32

    • Intel® 64 architecture:

      daalvars.bat intel64

  2. Optionally: Specify the Java* compiler different from the default compiler:

    set JAVA_HOME=%PATH_TO_JAVA_SDK%

    set PATH=%JAVA_HOME%\bin:%PATH%

C++ Language

Step 1: Configure Automatic Linking of Your Application with Intel DAAL

In Microsoft Visual Studio* Integrated Development environment (IDE), open or create a C++ project for your Intel DAAL application to build.

Specify the /Qdaal option of the Intel® C++ Compiler 16 or higher or configure your project in the IDE:

Compiler Option

IDE Equivalent

/Qdaal or /Qdaal:parallel

Tells the compiler to link with standard threaded Intel DAAL.

Visual Studio*:

  1. In Solution Explorer, go to Project> Properties> Configuration Properties> Intel Performance Libraries.
  2. From the Use Intel DAAL drop-down menu, select the appropriate linking method. For example: Multi-threaded Static Library.

/Qdaal:sequential

Tells the compiler to link with sequential version of Intel DAAL.

For more information on the /Qdaal compiler option, see the Intel® Compiler User and Reference Guide.

Step 2: Create and Run Your First Application with Intel DAAL

This short application computes Cholesky decomposition with Intel DAAL.

/*******************************************************************************
!  Copyright(C) 2014-2017 Intel Corporation. All Rights Reserved.
!
!  The source code, information  and  material ("Material") contained herein is
!  owned  by Intel Corporation or its suppliers or licensors, and title to such
!  Material remains  with Intel Corporation  or its suppliers or licensors. The
!  Material  contains proprietary information  of  Intel or  its  suppliers and
!  licensors. The  Material is protected by worldwide copyright laws and treaty
!  provisions. No  part  of  the  Material  may  be  used,  copied, reproduced,
!  modified, published, uploaded, posted, transmitted, distributed or disclosed
!  in any way  without Intel's  prior  express written  permission. No  license
!  under  any patent, copyright  or  other intellectual property rights  in the
!  Material  is  granted  to  or  conferred  upon  you,  either  expressly,  by
!  implication, inducement,  estoppel or  otherwise.  Any  license  under  such
!  intellectual  property  rights must  be express  and  approved  by  Intel in
!  writing.
!
!  *Third Party trademarks are the property of their respective owners.
!
!  Unless otherwise  agreed  by Intel  in writing, you may not remove  or alter
!  this  notice or  any other notice embedded  in Materials by Intel or Intel's
!  suppliers or licensors in any way.
!
!*******************************************************************************
!  Content:
!    Cholesky decomposition sample program.
!******************************************************************************/

#include "daal.h"
#include <iostream>

using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace daal::services;

const size_t dimension = 3;
double inputArray[dimension *dimension] =
{
    1.0,  2.0,  4.0,
    2.0, 13.0, 23.0,
    4.0, 23.0, 77.0
};

int main(int argc, char *argv[])
{
    /* Create input numeric table from array */
    SharedPtr<NumericTable> inputData = SharedPtr<NumericTable>(new Matrix<double>(dimension, dimension, inputArray));

    /*  Create the algorithm object for computation of the Cholesky decomposition using the default method */
    cholesky::Batch<> algorithm;

    /* Set input for the algorithm */
    algorithm.input.set(cholesky::data, inputData);

    /* Compute Cholesky decomposition */
    algorithm.compute();

    /* Get pointer to Cholesky factor */
    SharedPtr<Matrix<double> > factor =
        staticPointerCast<Matrix<double>, NumericTable>(algorithm.getResult()->get(cholesky::choleskyFactor));

    /* Print the first element of the Cholesky factor */
    std::cout << "The first element of the Cholesky factor: "<< (*factor)[0][0];

    return 0;
}
  1. Add a new C++ file to your project and paste the application code into it.

  2. Save the file.

  3. Compile and run the application.

Step 3 (Optional): Build Your Application with Different Compilers

If you did not install the C++ Integration(s) in Microsoft Visual Studio* component of the Intel® Parallel Studio XE or need more control over Intel DAAL libraries to link with your application, directly configure your Visual Studio project.

Add the following libraries to your project, depending on Intel DAAL threading mode and linking method:

 

Single-threaded (non-threaded) Intel DAAL

Multi-threaded (internally threaded) Intel DAAL

Static linking

daal_core.lib

daal_sequential.lib

daal_core.lib

daal_thread.lib

Dynamic linking

daal_core_dll.lib

daal_core_dll.lib

Regardless of the linking method, also add to your project the library on which Intel DAAL libraries depend:

  • Intel® Threading Building Blocks run-time library of the Intel® compiler tbb.lib

To configure your project, follow these steps, which may slightly differ in some versions of Visual Studio:

  1. In Solution Explorer, right-click your project and click Properties

  2. Select Configuration Properties> VC++ Directories

  3. Select Include Directories. Add the directory for the Intel DAAL include files, that is, %DAALROOT%\include

  4. Select Library Directories. Add the architecture-specific directory with Intel DAAL libraries that matches the architecture parameter provided to the daalvar.bat script.
    For example: %DAALROOT%\lib\intel64_win.

    Add the architecture-specific directory with the threading run-time library tbb.lib.
    For example: <install dir>\tbb\lib\intel64_win\vc_mt.

  5. Select Executable Directories. Add the architecture-specific directory with Intel DAAL dynamic-link libraries.
    For example: <install dir>\redist\intel64_win\daal.

    Add architecture-specific directories with threading run-time dynamic-link libraries.
    For example: <install dir>\redist\intel64_win\compiler and <install dir>\redist\intel64_win\tbb\vc_mt.

  6. Select Configuration Properties> Custom Build Setup> Additional Dependencies. Add the libraries required.
    For example: to build your application for Intel® 64 architecture by statically linking with multi-threaded Intel DAAL, add daal_core.lib daal_thread.libtbb.lib

 

In the case of dynamic linking with Intel DAAL, to change the default multi-threaded mode to single-threaded, precede the first call to Intel DAAL with the call right under the "Set single-threaded mode" comment:

...
int main(int argc, char *argv[])
{
    /* Set single-threaded mode */
    Environment::getInstance()->setDynamicLibraryThreadingTypeOnWindows(Environment::SingleThreaded); 

    /* Create input numeric table from array */
    SharedPtr<NumericTable> inputData = SharedPtr<NumericTable>(new Matrix<double>(dimension, dimension, inputArray));
...

Step 4: Build and Run Intel DAAL Code Examples

In Visual Studio*, use DAALExamples.sln solution file available in <install dir>\daal\examples\cpp.

Java* Language

Build and Run Intel DAAL Code Examples

To build and run Java code examples, use the version of the Java Virtual Machine* corresponding to the architecture parameter you provided to the daalvars.bat script during setting environment variables.

  1. Free 4 gigabytes of memory on your system.
  2. Build examples:

    Go to the Java examples directory and execute the launcher command with the build parameter:

    cd <install dir>\daal\examples\java

    launcher.bat build %PATH_TO_JAVAC%

    The command builds executables *.class (for example, CholeskyBatch.class) in the

    <install dir>\daal\examples\java\com\intel\daal\examples\<example name> directory.

  3. Run examples:

    Go to the Java examples directory and execute the launcher command with the run parameter:

    cd <install dir>\daal\examples\java

    launcher.bat {ia32|intel64} run %PATH_TO_JAVAC%

    Choose the same architecture parameter as you provided to the daalvars.bat script.

    The output for each example is written to the file <example name>.res located in the .\_results\ia32 or .\_results\intel64 directory, depending on the specified architecture.

Python* Language

Step 1: Set Up the Build Environment

Set up the C++ build environment as explained under C++ Language.

Step 2: Install Intel DAAL for Python

Go to the directory with Python sources of Intel DAAL and run the install script:

cd <install dir>\pydaal_sources

<python home>\python setup.py install

This script compiles code using Intel DAAL for C++. It builds and installs the pyDAAL package for using Intel DAAL in Python programs.

Step 3: Run Intel DAAL Code Examples

To run Intel DAAL code examples, use the same version of Python as you used to install pyDAAL.

  • Go to the directory with Intel DAAL Python examples:

    cd <install dir>\examples\python

  • To run all the examples, execute the command:

    <python home>\python run_examples.py

    The output for each example is written to the .\_results\intel64\<example name>.res file.

  • To run one specific example, execute the command:

    <python home>\python <algorithm name>\<example name>.py

    For example: C:\python3.5.1\python cholesky\cholesky_batch.py

    This command prints the output to your console.

Training and Documentation

To learn more about the product, see the following resources:

Resource

Description

Online Training

Get access to Intel DAAL in-depth webinars and featured articles.

Developer Guide for Intel® Data Analytics Acceleration Library:

Find recommendations on programming with Intel DAAL, including performance tips.

Intel® Data Analytics Acceleration Library API Reference

View detailed Application Programming Interface (API) descriptions for the following programming languages:

  • C++
  • Java*
  • Python*

Intel® Data Analytics Acceleration Library Installation Guide

Learn about installation options available for the product and get installation instructions.

Intel® Data Analytics Acceleration Library Release Notes

Learn about:

  • New features of the product
  • Directory layout
  • Hardware and software requirements

<install dir>\daal\examples folder

Get access to the collection of programs that demonstrate usage of Intel DAAL application programming interfaces.

Intel® Data Analytics Acceleration Library code samples

Get access to the collection of code samples for various algorithms that you can include in your program and immediately use with Hadoop*, Spark*, message-passing interface (MPI), or MySQL*.

Intel® Software Documentation Library

View full documentation library for this and other Intel software products.

Legal Information

Intel, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

Copyright 2014-2018 Intel Corporation.

This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.

This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.

 

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

 

Getting Started with Intel® MPI Library 2019 for Linux* OS (Beta)

$
0
0

Intel® MPI Library is a multi-fabric message passing library that implements the Message Passing Interface, version 3.1 (MPI-3.1) specification. Use the library to develop applications that can run on multiple cluster interconnects.

The Intel® MPI Library has the following features:

  • High sclability
  • Low overhead, enables analyzing large amounts of data
  • MPI tuning utility for accelerating your applications
  • Interconnect independence and flexible runtime fabric selection

Intel® MPI Library is available as a standalone product and as part of the Intel® Parallel Studio XE Cluster Edition.

Product Contents

The product comprises the following main components:

  • Runtime Environment (RTO) includes the tools you need to run programs, including the Hydra process manager, supporting utilities, shared (.so) libraries, and documentation.

  • Software Development Kit (SDK) includes all of the Runtime Environment components plus compilation tools, including compiler wrappers such as mpiicc, include files and modules, static (.a) libraries, debug libraries, and test codes.

Besides the SDK and RTO components, Intel® MPI Library also includes Intel® MPI Benchmarks, which enable you to measure MPI operations on various cluster architectures and MPI implementations. For details, see the Intel® MPI Benchmarks User's Guide.

Prerequisites

Before you start using Intel® MPI Library make sure to complete the following steps:

  1. Source the mpivars.[c]sh script to establish the proper environment settings for the Intel® MPI Library. It is located in the <installdir_MPI>/intel64/bin directory, where <installdir_MPI> refers to the Intel MPI Library installation directory (for example, /opt/intel/compilers_and_libraries_<version>.<update>.<package#>/linux/mpi).

  2. Create a hostfile text file that lists the nodes in the cluster using one host name per line. For example:
    clusternode1
    clusternode2
  3. Make sure the passwordless SSH connection is established among all nodes of the cluster. It ensures the proper communication of MPI processes among the nodes. To establish the connection, you can use the sshconnectivity.exp script located at <installdir>/parallel_studio_xe_<version>.<update>.<package>/bin.

After completing these steps, you are ready to use Intel® MPI Library.

For detailed system requirements, see the System Requirements section in Release Notes.

Building and Running MPI Programs

Compiling an MPI program

If you have the SDK component installed, you can build your MPI programs with Intel® MPI Library. Do the following:

  1. Make sure you have a compiler in your PATH. To check this, run the which command on the desired compiler. For example:

    $ which icc
    /opt/intel/compilers_and_libraries_2018.<update>.<package#>/linux/bin/intel64/icc
  2. Compile a test program using the appropriate compiler wrapper. For example, for a C program:

    $ mpiicc -o myprog <installdir>/test/test.c

Running an MPI program

Use the mpirun command to run your program. Use the previously created hostfile with the -f option to launch the program on the specifed nodes:

$ mpirun -n <# of processes> -ppn <# of processes per node> -f ./hostfile ./myprog

The test program above produces output in the following format:

Hello world: rank 0 of 2 running on clusternode1
Hello world: rank 1 of 2 running on clusternode2

This output indicates that you properly configured your environment and Intel® MPI Library successfully ran the test MPI program on the cluster.

Key Features

Intel® MPI Library has the following major features:

  • MPI-1, MPI-2.2 and MPI-3.1 specification conformance
  • Support for any combination of the following interconnection fabrics:
    • Shared memory
    • Network fabrics with tag matching capabilities through Tag Matching Interface (TMI), such as Intel® True Scale Fabric, Infiniband*, Myrinet* and other interconnects
    • Native InfiniBand* interface through OFED* verbs provided by Open Fabrics Alliance* (OFA*)
    • OpenFabrics Interface* (OFI*)
    • RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*
    • Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects
  • Support for the 2nd Generation Intel® Xeon Phi™ processor.
  • (SDK only) Support for Intel® 64 architecture Intel® MIC Architecture clusters using:
    • Intel® C++ Compiler version 15.0 and higher
    • Intel® Fortran Compiler version 15.0 and higher
    • GNU* C, C++ and Fortran 95 compilers
  • (SDK only) C, C++, Fortran* 77, Fortran 90 language bindings and Fortran 2008 bindings
  • (SDK only) Dynamic linking

Online Resources

See Also

Getting Started with Intel® MPI Library 2019 for Windows* OS (Beta)

$
0
0

Intel® MPI Library is a multi-fabric message passing library that implements the Message Passing Interface, version 3.1 (MPI-3.1) specification. Use the library to develop applications that can run on multiple cluster interconnects.

The Intel® MPI Library has the following features:

  • Low overhead, enables analyzing large amounts of data
  • MPI tuning utility for accelerating your applications
  • Interconnect independence and flexible runtime fabric selection

Intel® MPI Library is available as a standalone product and as part of the Intel® Parallel Studio XE Cluster Edition.

Product Contents

The product comprises the following main components:

  • Runtime Environment (RTO) has the tools you need to run programs, including the Hydra process manager, supporting utilities, dynamic (.dll) libraries, and documentation.

  • Software Development Kit (SDK) includes all of the Runtime Environment components plus compilation tools, including compiler drivers such as mpiicc, include files and modules, debug libraries, program database (.pdb) files, and test codes.

Besides the SDK and RTO components, Intel® MPI Library also includes Intel® MPI Benchmarks, which enable you to measure MPI operations on various cluster architectures and MPI implementations. You can see more details in Intel® MPI Benchmarks User's Guide.

Prerequisites

Before you start using Intel® MPI Library make sure to complete the following steps:

  1. Set the environment variables: from the installation directory <installdir>\mpi\<package number>\intel64\bin, run the mpivars.bat batch file:
    > <installdir>\mpi\<package number>\intel64\bin\mpivars.bat
    where <installdir> is the Intel MPI Library installation directory (by default, C:\Program Files (x86)\IntelSWTools).
  2. Install and run the Hydra services on the compute nodes. In the command prompt, enter:
    > hydra_service -install
    > hydra_service -start
  3. Register your credentials, enter:
    > mpiexec -register

For detailed system requirements, see the System Requirements section in Release Notes.

Building and Running MPI Programs

Compiling an MPI program

If you have the developer license and have the SDK component installed, you can build your MPI programs with Intel® MPI Library. Compile the program using the appropriate compiler driver. For example, for a test program:

> mpiicc -o test.exe <installdir>\test\test.c

Running an MPI program

Execute the program using the mpiexec command. For example, for the test program:

> mpiexec -n <# of processes> test.exe 

To specify the hosts to run the program on, use the -hosts option:

> mpiexec -n <# of processes> -ppn <# of processes per node> -hosts <host1>,<host2>,...,<hostN> test.exe

Key Features

Intel® MPI Library has the following major features:

  • MPI-1, MPI-2.2 and MPI-3.1 specification conformance
  • Support for any combination of the following interconnection fabrics:
    • Shared memory
    • RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*
    • Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects
  • (SDK only) Support for Intel® 64 architecture clusters using:
    • Intel® C++ Compiler version 15.0 and higher
    • Intel® Fortran Compiler version 15.0 and higher
    • Microsoft* Visual C++* Compilers
  • (SDK only) C, C++, Fortran* 77 and Fortran 90 language bindings
  • (SDK only) Dynamic linking

Online Resources

See Also

Getting Started with Intel® Data Analytics Acceleration Library 2019 for Linux* (Beta)

$
0
0

Intel® Data Analytics Acceleration Library (Intel® DAAL) is the library of Intel® architecture optimized building blocks covering all stages of data analytics: data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.

Intel DAAL is installed standalone and as part of the following suites:

Intel DAAL is also provided as a standalone package under the Community Licensing Program.

Prerequisites

System Requirements.

Install Intel DAAL on Your System

Intel DAAL installs in the directory <install dir>/daal.

By default, <install dir> is /opt/intel/compilers_and_libraries_2019.x.xxx/linux.

For installation details, refer to Intel DAAL Installation Guide.

Set Environment Variables

  1. Run the <install dir>/daal/bin/daalvars.sh script as appropriate to your target architecture:

    • IA-32 architecture:

      daalvars.sh ia32

    • Intel® 64 architecture:

      daalvars.sh intel64

  2. Optionally: Specify the Java* compiler different from the default compiler:

    export JAVA_HOME=$PATH_TO_JAVA_SDK

    export PATH=$JAVA_HOME/bin:$PATH

C++ Language

Step 1: Choose the Compiler Option for Automatic Linking of Your Application with Intel DAAL

Decide on the variant of the the -daal option of the Intel® C++ Compiler 16 or higher or configure your project in the Integrated Development Environment (IDE):

Compiler Option

IDE Equivalent

daal or ‑daal=parallel

Tells the compiler to link with standard threaded Intel DAAL.

Eclipse*:

  1. Go to Project> Properties> C/C++ Build> Settings> Intel C++ Compiler> Performance Library Build Components> Use Intel(R) Data Analytics Acceleration Library.
  2. Select Use threaded Intel DAAL (‑daal=parallel) or Use non-threaded Intel DAAL (‑daal=sequential), as appropriate.

‑daal=sequential

Tells the compiler to link with sequential version of Intel DAAL.

For more information on the daal compiler option, see the Intel® Compiler User and Reference Guide.

Step 2: Create and Run Your First Application with Intel DAAL

This short application computes Cholesky decomposition with Intel DAAL.

/*******************************************************************************
!  Copyright(C) 2014-2017 Intel Corporation. All Rights Reserved.
!
!  The source code, information  and  material ("Material") contained herein is
!  owned  by Intel Corporation or its suppliers or licensors, and title to such
!  Material remains  with Intel Corporation  or its suppliers or licensors. The
!  Material  contains proprietary information  of  Intel or  its  suppliers and
!  licensors. The  Material is protected by worldwide copyright laws and treaty
!  provisions. No  part  of  the  Material  may  be  used,  copied, reproduced,
!  modified, published, uploaded, posted, transmitted, distributed or disclosed
!  in any way  without Intel's  prior  express written  permission. No  license
!  under  any patent, copyright  or  other intellectual property rights  in the
!  Material  is  granted  to  or  conferred  upon  you,  either  expressly,  by
!  implication, inducement,  estoppel or  otherwise.  Any  license  under  such
!  intellectual  property  rights must  be express  and  approved  by  Intel in
!  writing.
!
!  *Third Party trademarks are the property of their respective owners.
!
!  Unless otherwise  agreed  by Intel  in writing, you may not remove  or alter
!  this  notice or  any other notice embedded  in Materials by Intel or Intel's
!  suppliers or licensors in any way.
!
!*******************************************************************************
!  Content:
!    Cholesky decomposition sample program.
!******************************************************************************/

#include "daal.h"
#include <iostream>

using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace daal::services;

const size_t dimension = 3;
double inputArray[dimension *dimension] =
{
    1.0,  2.0,  4.0,
    2.0, 13.0, 23.0,
    4.0, 23.0, 77.0
};

int main(int argc, char *argv[])
{
    /* Create input numeric table from array */
    SharedPtr<NumericTable> inputData = SharedPtr<NumericTable>(new Matrix<double>(dimension, dimension, inputArray));

    /*  Create the algorithm object for computation of the Cholesky decomposition using the default method */
    cholesky::Batch<> algorithm;

    /* Set input for the algorithm */
    algorithm.input.set(cholesky::data, inputData);

    /* Compute Cholesky decomposition */
    algorithm.compute();

    /* Get pointer to Cholesky factor */
    SharedPtr<Matrix<double> > factor =
        staticPointerCast<Matrix<double>, NumericTable>(algorithm.getResult()->get(cholesky::choleskyFactor));

    /* Print the first element of the Cholesky factor */
    std::cout << "The first element of the Cholesky factor: "<< (*factor)[0][0];

    return 0;
}
  1. Paste the application code into the editor of your choice.

  2. Save the file as my_first_daal_program.cpp.

  3. Compile with the following command, providing the selected variant of the ‑daal compiler option, for example, ‑daal=parallel:

    icc my_first_daal_program.cpp -daal=parallel -o my_first_daal_program

  4. Run the application.

Step 3 (Optional): Build Your Application with Different Compilers

List the following Intel DAAL libraries on a link line, depending on Intel DAAL threading mode and linking method:

 

Single-threaded (non-threaded) Intel DAAL

Multi-threaded (internally threaded) Intel DAAL

Static linking

libdaal_core.a

libdaal_sequential.a

libdaal_core.a

libdaal_thread.a

Dynamic linking

libdaal_core.so

libdaal_sequential.so

libdaal_core.so

libdaal_thread.so

These libraries are located in the architecture-specific directory <install dir>/daal/lib/{ia32|intel64}_lin, where the architecture parameter ia32 or intel64 is the same as you provided to the daalvar.sh script during setting environment variables.

Important

Do not change the above order of listing the libraries on a link line.

Regardless of the linking method, also add to your link line the library on which Intel DAAL libraries depend:

  • Intel® Threading Building Blocks run-time library of the Intel® compiler libtbb.so

For example, to build your application for Intel® 64 architecture by statically linking with multi-threaded Intel DAAL:

icc my_first_daal_program.cpp ‑o my_first_daal_program
$DAALROOT/lib/intel64_lin/libdaal_core.a $DAALROOT/lib/intel64_lin/libdaal_thread.a ‑ltbb ‑lpthread -ldl

Step 4: Build and Run Intel DAAL Code Examples

  1. Build an example:

    Go to the C++ examples directory and execute the make command:

    cd <install dir>/daal/examples/cpp

    make {libia32|soia32|libintel64|sointel64}

            example=<example_name>

            compiler={intel|gnu}

            threading={parallel|sequential}

            mode=build

    Among the {libia32|soia32|libintel64|sointel64} parameters, choose the one that matches the architecture parameter you provided to the daalvars.sh script and has the prefix that matches the type of executables you want to build: lib for static and so for dynamic executables.

    The names of the examples are available in the daal.lst file.

    The command creates a directory for the chosen compiler, architecture, and library extension (a or so). For example: _results/intel_intel64_a.

  2. Run an example:

    Go to the C++ examples directory and execute the make command in the run mode. For example, if you ran the daalvars script with the intel64 target:

    cd <install dir>/daal/examples/cpp

    make libintel64 example=cholesky_batch.cpp mode=run

    The make command builds the static library for the Intel 64 architecture and cholesky_batch.cpp example with Intel® compiler, assumed by default, and runs the executable.

Java* Language

Build and Run Intel DAAL Code Examples

To build and run Java code examples, use the version of the Java Virtual Machine* corresponding to the architecture parameter you provided to the daalvars.sh script during setting environment variables.

  1. Free 4 gigabytes of memory on your system.
  2. Build examples:

    Go to the Java examples directory and execute the launcher command with the build parameter:

    cd <install dir>/daal/examples/java

    launcher.sh build $PATH_TO_JAVAC

    The command builds executables *.class (for example, CholeskyBatch.class) in the

    <install dir>/daal/examples/java/com/intel/daal/examples/<example name> directory.

  3. Run examples:

    Go to the Java examples directory and execute the launcher command with the run parameter:

    cd <install dir>/daal/examples/java

    launcher.sh {ia32|intel64} run $PATH_TO_JAVAC

    Choose the same architecture parameter as you provided to the daalvars.sh script.

    The output for each example is written to the file <example name>.res located in the ./_results/ia32 or ./_results/intel64 directory, depending on the specified architecture.

Python* Language

Step 1: Set Up the Build Environment

Set up the C++ build environment as explained under C++ Language.

Step 2: Install Intel DAAL for Python

Go to the directory with Python sources of Intel DAAL and run the install script:

cd <install dir>/pydaal_sources

<python home>/python setup.py install

This script compiles code using Intel DAAL for C++. It builds and installs the pyDAAL package for using Intel DAAL in Python programs.

Step 3: Run Intel DAAL Code Examples

To run Intel DAAL code examples, use the same version of Python as you used to install pyDAAL.

  • Go to the directory with Intel DAAL Python examples:

    cd <install dir>/examples/python

  • To run all the examples, execute the command:

    <python home>/python run_examples.py

    The output for each example is written to the ./_results/intel64/<example name>.res file.

  • To run one specific example, execute the command:

    <python home>/python <algorithm name>/<example name>.py

    For example: /usr/local/bin/python3.5.1/python cholesky/cholesky_batch.py

    This command prints the output to your console.

Training and Documentation

To learn more about the product, see the following resources:

Resource

Description

Online Training

Get access to Intel DAAL in-depth webinars and featured articles.

Developer Guide for Intel® Data Analytics Acceleration Library:

Find recommendations on programming with Intel DAAL, including performance tips.

Intel® Data Analytics Acceleration Library API Reference

View detailed Application Programming Interface (API) descriptions for the following programming languages:

  • C++
  • Java*
  • Python*

Intel® Data Analytics Acceleration Library Installation Guide

Learn about installation options available for the product and get installation instructions.

Intel® Data Analytics Acceleration Library Release Notes

Learn about:

  • New features of the product
  • Directory layout
  • Hardware and software requirements

<install dir>/daal/examples folder

Get access to the collection of programs that demonstrate usage of Intel DAAL application programming interfaces.

Intel® Data Analytics Acceleration Library code samples

Get access to the collection of code samples for various algorithms that you can include in your program and immediately use with Hadoop*, Spark*, message-passing interface (MPI), or MySQL*.

Intel® Software Documentation Library

View full documentation library for this and other Intel software products.

Legal Information

Intel, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

Copyright 2014-2018 Intel Corporation.

This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.

This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.

 

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

 

Innovative Media and Video Solutions Showcase

$
0
0

New, Inventive Media Solutions Made Possible with Intel Media Software Tools

With Intel media software tools, video solutions providers can create inspiring, innovative products that capitalize on next gen capabilities like real-time 4K HEVC, virtual reality, simultaneous multi-camera streaming, high-dynamic range (HDR) content delivery, video security solutions with smart analytics, and more. Check these out. Envision using Intel's advanced media tools to transform your media, video, and broadcasting solutions for a competitive edge with high performance and efficiency, and room for higher profits, market growth, and more reach.

NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

Half a million fans came to Rio de Janeiro to witness the Summer Olympics. At the same time, millions more people all over the world enjoyed the competition live on TV. Arranging a live TV broadcast to another continent is a daunting task.

Thomson Reuters used the NetUP Transcoder for delivering live broadcasts from Rio de Janeiro to its New York and London offices, optimized by Intel® Media SDK.

Get the whole story in our new case study.

 

sportsfield

Amazing Video Solution Enables Game-changing Sports Calls

Slomo.tv innovated its videoReferee* systems, which provide instant high-quality video replays from up to 18 cameras direct to referee viewing systems. Referees can view video from 4 cameras simultaneously at different angles, in slow motion, or using zoom for objective, error-free gameplay analysis. The Kontinental Hockey League; basketball leagues in Korea, Russia, Lithuania; and Rio Olympics used videoReferee. Read More.

 

VR video

Immersive Experiences with Real-time 4K HEVC Streaming

See how Wowza, Rivet VR and Intel worked together to deliver a live-streamed 360-degree virtual-reality jazz concert at the legendary Blue Note Jazz Club in New York using hardware-assisted 4K video​. See just how: Video | Article

 

    Mobile Viewpoint Live Reporting Ronde of Norg

    Mobile Viewpoint Delivers HEVC HDR Live Broadcasting

    Mobile Viewpoint delivers live HEVC HDR broadcasting at the scenes of breaking news action. The company developed a mobile encoder running on 6th generation Intel® processors using the graphics-accelerated codec to create low power encoding and transmission, and optimized by Intel® Media Server Studio Pro Edition for HEVC compression and quality. The results: fast, high-quality, video broadcasting on-the-go so the world stays informed of fast-changing events. Read more.

     

    Sharp all-around security camera

    Sharp's Innovative Security Camera is Built with Intel® Media Technologies

    With security concerns now part of everyday life, SHARP built an omnidirectional wireless, intelligent, digital surveillance camera for these needs. Built with an Intel® Celeron® processor (N3160), SHARP 12 megapixel image sensors, and by using the Intel® Media SDK for hardware-accelerated encoding, the QG- B20C camera can capture video in 4Kx3K resolution, provide all-around views, and includes intelligent automatic detection functions. Read more.

     

    Magix Video Pro XMAGIX's Video Editing Software Provides HEVC to Broad Users

    While elite video pros have access to high-powered video production apps with bells and whistles available mostly only to enterprise, MAGIX unveiled Video Pro X, a video editing software for semi-pro video production. Optimized with Intel Media Server Studio, Video Pro X provides HEVC encoding to prosumers and semi-pros to help alleviate a bandwidth-constrained internet where millions of videos are shared. Read more.

     

    Comprimato2

    JPEG2000 Codec Now Native for Intel Media Server Studio

    Comprimato worked with Intel to provide additional video encoding technology as part of Intel Media Server Studio through a software plug-in for high quality, low latency JPEG2000 encoding. This  powerful encoding option allows users to transcode JPEG2000 contained in IMF, AS02 or MXF OP1a files to distribution formats like AVC and HEVC, and enables software-defined processes of IP video streams in broadcast applications. By using Media Server Studio to access hardware-acceleration and programmable graphics in Intel GPUs, encoding can run fast and reduce latency, which is important in live broadcasting. Read more.

     

    SPB TV AG Showcases Innovative Mobile TV/On-demand Transcoder

    SPB TV AG innovated its single-platform Astra* transcoder, a pro solution for fast, high-quality processing of linear TV broadcast and on-demand video streams from a single head-end to any mobile, desktop or home device. The transcoder uses Intel® Core™ i7 processors with media accelerators and delivers high-density transcoding optimized by Intel Media Server Studio. “We are delighted that our collaboration with Intel ensures faster and high quality transcoding, making our new product performance remarkable,” said CEO of SPB TV AG Kirill Filippov. Read more.

     

    SURF Communications collaborates with Intel for NFV & WebRTC all-inclusive platforms

    SURF Communication Solutions announced SURF ORION-HMP* and SURF MOTION-HMP*. The SURF-HMP architecture delivers fast, high-quality media acceleration - facilitating up to 4K video resolutions and ultra-high capacity HD voice and video processing. The system runs on Intel® processors with integrated graphics and is optimized by Intel Media Server Studio. SURF-HMP is driven-by a powerful processing engine that supports all major video and voice codecs and protocols in use, and delivers a multitude of applications fot transcoding, conferencing/mixing, MRF, playout, recording, messaging, video surveillance, encryption and more. Read more.

     


    More about Intel Media Software Tools

    Intel Media Server Studio - Provides an Intel® Media SDK, runtimes, graphics drivers, media/audio codecs, and advanced performance and quality analysis tools to help video solution providers deliver fast, high-density media transcoding. Download free Community Edition

    Intel Media SDK - A cross-platform API for developing client and media applications for Windows*. Achieve fast video plaback, encode, processing, media format conversion, and video conferencing. Accelerate RAW video and image processing. Get audio decode/encode support. Download the free SDK now

    Accelerating Media Processing: Which Media Software Tool do I use? English | Chinese

     

    Using Docker* Containers with Open vSwitch* and DPDK on Ubuntu* 17.10

    $
    0
    0

    Overview

    This article describes how to configure containers to take advantage of Open vSwitch* with the Data Plane Development Kit (OvS-DPDK). With the rise of cloud computing, it’s increasingly important to get the most out of your server’s resources, and these technologies allow us to achieve that goal. To begin, we will install Docker* (v1.13.1), DPDK (v17.05.2) and OvS (v2.8.0) on Ubuntu* 17.10. We will also configure OvS to use the DPDK. Finally, we will use iPerf3 to benchmark an OvS run versus OvS-DPDK run to test network throughput.

    We configure Docker to create a logical switch using OvS-DPDK, and then connect two Docker containers to the switch. We then run a simple iPerf3 OvS-DPDK test case. The following diagram captures the setup.

    Test Configuration Diagram
    Test Configuration

    Installing Docker and OvS-DPDK

    Run the following commands to install Docker and OvS-DPDK.

    sudo apt install docker.io
    sudo apt install openvswitch-switch-dpdk
    

    After installing OvS-DPDK, we will update Ubuntu to use OvS-DPDK and restart the OvS service.

    sudo update-alternatives --set OvS-vswitchd /usr/lib/openvswitch-switch
    -dpdk/OvS-vswitchd-dpdk
    sudo systemctl restart openvswitch-switch.service
    

    Configure Ubuntu* 17.10 for OvS-DPDK

    The system used in this demo is a two-socket, 22 core-per-socket server enabled with Intel® Hyper-Threading Technology (Intel® HT Technology), giving us a total of 44 physical cores. The CPU model used is an Intel® Xeon® processor E5-2699 v4 @ 2.20 GHz. To configure Ubuntu for optimal use of OvS-DPDK, we will change the GRUB* command-line options that are passed to Ubuntu at boot time for our system. To do this we edit the following config file:

    /etc/default/grub

    Change the setting GRUB_CMDLINE_LINUX_DEFAULT to the following:

    GRUB_CMDLINE_LINUX_DEFAULT="default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M hugepages=2048 iommu=pt intel_iommu=on isolcpus=1-21,23-43,45-65,67-87"

    This makes GRUB aware of the new options to pass to Ubuntu during boot time. We set isolcpus so that the Linux* scheduler will run on only two physical cores. Later, we will allocate the remaining cores to the DPDK. Also, we set the number of pages and page size for hugepages. For details on why hugepages are required and how they can help to improve performance, refer to the section called Use of Hugepages in the Linux Environment in the System Requirements chapter of the Getting Started Guide for Linux at dpdk.org.

    Note: The isolcpus setting varies depending on how many cores are available per CPU.

    Also, we edit /etc/dpdk/dpdk.conf to specify the number of hugepages to reserve on system boot. Uncomment and change the setting NR_1G_PAGES to the following:

    NR_1G_PAGES=8

    Depending on the system memory size and the problem size of your DPDK application, you may increase or decrease the number of 1G pages. Use of hugepages increases performance, because fewer pages are needed and less time is spent doing Translation Lookaside Buffers lookups.

    After both files have been updated, run the following commands:

    sudo update-grub
    sudo reboot
    

    Reboot to apply the new settings. If needed, during the boot enter the BIOS and enable:

    • Intel® Virtualization Technology (Intel® VT) for IA-32, Intel® 64 and Intel® architecture
    • Intel VT for Directed I/O

    After logging back into your Ubuntu session, create a mount path for your hugepages:

    sudo mkdir -p /mnt/huge
    sudo mkdir -p /mnt/huge_2mb
    sudo mount -t hugetlbfs none /mnt/huge
    sudo mount -t hugetlbfs none /mnt/huge_2mb -o pagesize=2MB
    sudo mount -t hugetlbfs none /dev/hugepages
    

    To ensure that the changes are in effect, run the commands below:

    grep HugePages_ /proc/meminfo
    cat /proc/cmdline
    

    If the changes took place, your output from the above commands should look similar to the image below:

    grep hugepages output
    grep: hugepages output

    cat /proc/cmdline output
    cat: /proc/cmdline output

    Configuring OvS-DPDK Settings

    To initialize the ovs-vsctl database, a one-time step, we will run the command sudo ovs-vsctl –no-wait init. The OvS database will contain user-set options for OvS and the DPDK. To pass arguments to the DPDK, we will use the command-line utility as follows:

    sudo ovs-vsctl set Open_vSwitch . <argument>

    Additionally, the OvS-DPDK package relies on the following config files:

    • /etc/dpdk/dpdk.conf– Configures hugepages
    • /etc/dpdk/interfaces– Configures/assigns network interface cards (NICs) for DPDK use

    Next, we configure OvS to use DPDK with the following command:

    sudo ovs-vsctl -no-wait set Open_vSwitch . other_config:dpdk-init=true

    After the OvS is set up to use DPDK, we will change one OvS setting, two important DPDK configuration settings, and bind our NIC devices to the DPDK.

    DPDK settings

    • dpdk-lcore-mask: Specifies the CPU cores on which dpdk lcore threads should be spawned. A hex string is expected.
    • dpdk-socket-mem: Comma-separated list of memory to preallocate from hugepages on specific sockets.

    OvS settings

    • pmd-cpu (poll mode drive-mask: PMD (poll-mode driver)) threads can be created and pinned to CPU cores by explicitly specifying pmd-cpu-mask. These threads poll the DPDK devices for new packets instead of having the NIC driver send an interrupt when a new packet arrives.

    Use the following commands to configure these settings:

    sudo ovs-vsctl -no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0xfffffbffffe
    sudo ovs-vsctl -no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
    sudo ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x800002
    

    For dpdk-lcore-mask we used a mask of 0xfffffbffffe to specify the CPU cores on which dpdk-lcore should spawn. For more information about masking, read Mask(computing). In our system, we have the dpdk-lcore threads spawn on all cores except cores 0, 21, 22, and 43. Those cores are reserved for the Linux scheduler. Similarly, for the pmd-cpu-mask, we used the mask 0x800002 to spawn two PMD threads for non-uniform memory access (NUMA) Node 0, and another two PMD threads for NUMA Node 1. Finally, since we have a two-socket system, we allocate 1 GB of memory per NUMA Node; that is, “1024, 1024”. For a single-socket system, the string would be “1024”.

    Binding Devices to DPDK

    To bind your NIC device to the DPDK, run the dpdk-devbind command. For example, to bind eth1 from the current driver and move to use the vfio-pci driver, run dpdk-devbind --bind=vfio-pci eth1. To use the vfio-pci driver, run modsprobe to load it and its dependencies.

    This is what it looked like on my system, with 2 x 10 Gb interfaces available:

    sudo modprobe vfio-pci
    sudo dpdk-devbind --bind=vfio-pci enp3s0f0
    sudo dpdk-devbind --bind=vfio-pci enp3s0f1
    

    To check whether the NIC cards you specified are bound to the DPDK, run the command:

    sudo dpdk-devbind --status

    If all is correct, you should have an output similar to the image below:

    Binding of NIC
    Binding of NIC

    Configuring Open vSwitch with Docker

    To have Docker use OvS as a logical networking switch, we must complete a few more steps. First we must install Open Virtual Network (OVN) and Consul*, a distributed key-value store. To do this, run the commands below. With OVN and Consul we will run in overlay mode, which will allow a multi-tenant multi-host network possible.

    sudo apt install ovn-common
    sudo apt install ovn-central
    sudo apt install ovn-host
    sudo apt install ovn-docker
    sudo apt install python-openvswitch
    wget https://releases.hashicorp.com/consul/1.0.6/consul_1.0.6_linux_amd64.zip
    unzip consul_1.0.6_linux_amd64.zip
    sudo mv consul /usr/local/bin
    

    The Open Virtual Network* (OVN) is a system that supports virtual network abstraction. This system complements the existing capabilities of OVS to add native support for virtual network abstractions, such as virtual L2 and L3 overlays and security groups. Services such as DHCP are also desirable features. Just like OVS, OVN’s design goal is to have a production-quality implementation that can operate at significant scale.

    Once we have OVN and Consul installed we will:

    1. Stop the docker daemon.
    2. Set up the consul agent.
    3. Restart the docker daemon.
    4. Configure OVN in overlay mode.

    Consul key-value store

    To start a Consul key-value store, open another terminal and run the following command:

    mkdir /tmp/consul
    sudo consul agent -ui -server -data-dir /tmp/consul -advertise 127.0.0.1 -bootstrap-expect 1
    

    This creates a consul agent that will only advertise locally, expect only one server in the consul cluster, and set the directory for the agent to save its state. A Consul key-store is needed for service discovery for multi-host, multi-tenant configurations.

    An agent is the core process of Consul. The agent maintains membership information, registers services, runs checks, responds to queries, and more. The agent must run on every node that is part of a Consul cluster.

    Restart docker daemon

    Once the consul agent has started, open another terminal and restart the Docker daemon with the following arguments and with $HOST_IP set to the local IPv4 address:

    sudo dockerd --cluster-store=consul://127.0.0.1:8500 --cluster-advertise=$HOST_IP:0

    --cluster-store option tells the Engine the location of the key-value store for the overlay network. More information can be found on Docker Docs.

    Configure ovn in overlay mode

    To configure OVN in overlay mode, follow the instructions :

    http://docs.openvswitch.org/en/latest/howto/docker/#the-overlay-mode

    After OVN has been set up in overlay mode, install Python* Flask and start the ovn-docker-overlay-driver.

    The ovn-docker-overlay-driver uses the Python flask module to listen to Docker’s networking api calls.

    sudo pip install flask
    sudo ovn-docker-overlay-driver --detach
    

    To create the logical switch using OvS- DPDK that Docker will use, run the following command:

    sudo docker network create -d openvswitch --subnet=192.168.22.0/24 ovs

    To list the logical network switch run:

    sudo docker network ls

    Pull iperf3 Docker Image

    To download the iperf3 docker container, run the following command:

    sudo docker pull networkstatic/iperf3

    Later we will create two iperf3 containers for a multi-tenant, single-host benchmark.

    Default versus OvS-DPDK Networking Benchmark

    First we will generate an artificial workload on our server, to simulate servers in production. To do this we install the following:

    sudo apt install stress

    After installation, run stress with the following arguments, or a set of arguments, to put a meaningful load on your server:

    sudo stress -c 6 -i 6 -m 6 -d 6 
    -c, --cpu N        spawn N workers spinning on sqrt()  
    -i, --io N         spawn N workers spinning on sync()  
    -m, --vm N         spawn N workers spinning on malloc()/free()  
    -d, --hdd N        spawn N workers spinning on write()/unlink()      
        --hdd-bytes B  write B bytes per hdd worker (default is 1GB)
    

    Default docker networking test

    With our system under load, our first test will be to run the iperf3 container using Docker’s default network configuration, which uses Linux bridges. To benchmark throughput using Docker's default networking open 2 terminals. In the first terminal run the iper3 container in server mode.

    sudo docker run  -it --rm --name=iperf3-server -p 5201:5201
    networkstatic/iperf3 -s

    Open another terminal, and then run iper3 in client mode.

    docker inspect --format "{{ .NetworkSettings.IPAddress }}" iperf3-server
    sudo docker run  -it --rm networkstatic/iperf3 -c ip_address
    

    We will obtain the Docker-assigned ip address of the container named "iperf3-server" and then create another container in client mode with the argument of the ip-address of the iperf3-server.

    Using Docker's default networking, we were able to achieve the network throughput shown below:

    iperf3 default console
    iperf3 benchmark default networking.

    OvS-DPDK docker networking test

    With our system still under load, we will repeat the same test as above, but will have our containers connect to the OvS-DPDK network.

    sudo docker run  -it --net=ovs --rm --name=iperf3-server -p 5201:5201 
    networkstatic/iperf3 -s
    

    Open another terminal and run iperf3 in client mode.

    sudo docker run  -it --net=ovs --rm --name=iperf3-server -p 5201:5201 
    sudo docker run  -it --net=ovs --rm networkstatic/iperf3 -c ip_address
    

    Using Docker with OvS-DPDK networking, we were able to achieve the network throughput shown below:

    iperf3 terminal
    iperf3 benchmark OvS-DPDK networking

    Summary

    By setting up Docker to use OvS-DPDK on our server, we achieved a approximately 1.5x increase of network throughput compared to Docker’s default networking. Also it is possible to use OvS-DPDK with OVN to create a multi-tenant, multi-host swarm/cluster in your server environment, with overlay-mode.

    About the Author

    Yaser Ahmed is a software engineer at Intel Corporation and has an MS degree in Applied Statistics from DePaul University and a BS degree in Electrical Engineering from the University of Minnesota.

    Using Intel® Optane™ Technology with Ceph* to Build High-Performance Cloud Storage Solutions on Intel® Xeon® Scalable Processors

    $
    0
    0

    Download Ceph* configuration file [2KB]

    Introduction

    Ceph* is the most popular block and object storage backend. It is an open source distributed storage software solution, widely adopted in the public and private cloud. As solid-state drives (SSDs) become more affordable, and cloud providers are working to provide high-performance, highly reliable, all-flash–based storage for their customers, there is a strong demand for Ceph-based, all-flash reference architectures, performance numbers, and optimization best-known methods.

    Intel® Optane™ technology provides an unparalleled combination of high throughput, low latency, high quality of service, and high endurance. It is a unique combination of 3D XPoint™ Memory Media, Intel® Memory Controllers and Intel® Storage Controllers, Intel® Interconnect IP, and Intel® software. Together, these building blocks deliver a revolutionary leap forward in decreasing latency and accelerating systems for workloads demanding large capacity and fast storage.

    The Intel® Xeon® Scalable processors with Intel® C620 series chipsets are workload-optimized to support hybrid cloud infrastructures and the most high-demand applications, providing high data throughput and low latency. Ideal for storage and data-intensive solutions, Intel Xeon Scalable processors offer a range of performance, scalability, and feature options to meet a wide variety of workloads in the data center, from the entry-level (Intel® Xeon® Bronze 3XXX processor family) to the most advanced (Intel® Xeon® Platinum 8XXX processor family).

    As a follow up to the work of our previous article, Use Intel® Optane™ Technology and Intel® 3D NAND SSDs Technology with Ceph to Build High-Performance Cloud Storage Solutions, we’d like to share the progress on Ceph all-flash storage system reference architectures and software optimizations based on Intel Xeon Scalable processors. In this paper, we present the latest Ceph reference architectures and performance results with the RADOS Block Device (RBD) interface using Intel Optane technology with the Intel Xeon Scalable processors family (Intel Xeon Platinum 8180 processor and Intel® Xeon® Gold 6140 processor). Moreover, we include several Ceph software tunings that resulted in significant performance improvement for random workloads.

    Ceph* Performance Optimization History

    Working closely with the community, ecosystem, and partners, Intel has kept track of Ceph performance since the Ceph Giant release. Figure.1 shows the performance optimization history for 4K random write workloads on Ceph major releases and different Intel platforms. With new Ceph major releases, backend storage, combined with core platform changes and SSD upgrades, the 4K random write performance of a single node was improved by 27x (3,673 input/output operations per second (IOPS) per node to 100,052 IOPS per node)! This makes it possible to use Ceph to build high performance storage solutions.

    Ceph* 4K RW Per Node Performance Optimization History

    Figure 1. Ceph 4K RW per node performance optimization history.

    Intel® Optane™ Technology with Ceph AFA on Intel® Xeon® Scalable Processors

    In this section, we present Intel Optane technology with a Ceph all-flash array (AFA) reference architecture on Intel Xeon Scalable processors, together with performance results and system characteristics for typical workloads.

    Configuration with Intel® Xeon® Platinum 8180 processor

    The Intel Xeon Platinum 8180 processor is the most advanced processor in the Intel Xeon Scalable processors family. Working with Intel® Optane™ Solid State Drives (SSD) as the WAL device, NAND-based SSD for the data drive, and Mellanox* 40 GbE network interface card (NIC) as high-speed Ethernet data ports provides the best performance (throughput and latency) configuration. It is ideally suited for an input/output heavily intensive workload.

    Cluster topology

    Figure 2. Cluster topology.

    Table 1. Cluster configuration.

    Ceph Configuration with Intel Xeon Platinum 8180 Processor
    CPUIntel Xeon Platinum 8180 processor @ 2.50 GHz
    Memory384 GB
    NICMellanox 2x 40 GbE (80 Gb for Ceph nodes),
    Mellanox 1x 40 GbE (40 Gb for client nodes)
    StorageData: 4x Intel® SSD DC P3520/2x 2.0 TB
    WAL: 1x Intel® Optane™ SSD DC P4800X 375 GB
    Software ConfigurationUbuntu* 16.04, Linux* Kernel 4.8, Ceph version 12.2.2

    The test system consists of five Ceph storage servers and five client nodes. Each storage node is configured with an Intel Xeon Platinum 8180 processor and 384 GB memory, using 1x Intel Optane SSD DC P4800X 375 GB as the BlueStore WAL device, 4x Intel® SSD DC P3520 2 TB as data drives, and 2x Mellanox 40 GbE NIC as separate cluster and public networks for Ceph.

    For clients, each node is set up with an Intel Xeon Platinum 8180 processor, 384 GB memory, and 1x Mellanox 40GbE NIC.

    Ceph 12.2.2 was used, and each Intel® SSD DC P3520 Series ran one object storage daemon (OSD). The RBD pool used for the testing was configured with two replications; the system topology is described in Figure 1.

    Testing methodology

    We designed four different workloads to simulate a typical all-flash Ceph cluster in the cloud, based on fio with librbd, including 4K random read and write, and 64K sequential read and write, to simulate the random workloads and sequential workloads, respectively. For each test case, the throughput (IOPS or bandwidth) was measured with the number of volumes scaling (to 100), with each volume configured to be 30 GB. The volumes were pre-allocated to eliminate the Ceph thin-provision mechanism’s impact to generate stable and reproducible results. The OSD page cache was dropped before each run to eliminate page cache impact. For each test case, fio was configured with a 300-second warm up and 300-second data collection. Detailed fio testing parameters are included in the downloadable Ceph configuration file.

    Performance overview

    The Intel Optane technology-based Ceph AFA cluster demonstrated excellent throughput and latency. The 64K sequential read and write throughput is 21,949 MB/s and 8,714 MB/s, respectively (maximums with 40 GbE NIC). The 4K random read throughput is 2,453K IOPS with 5.36 ms average latency, while 4K random write throughput is 500K IOPS with 12.79 ms average latency.

    Table 2. Performance overview.

     Peak PerformanceAvg. Latency (ms)Avg. CPU %IOPS/CPU
    4K Random Write500,259 IOPS12.795010005
    4K Random Read2,453,200 IOPS5.3660.8740302
    64K Sequential Read21,949 MB/s36.7830.4722
    64K Sequential Write8,714 MB/s45.8718.4474

    System characteristics

    To better understand the system characteristics and for performance projection, we did a deep investigation of system-level characteristics including CPU utilization, memory utilization, network bandwidth, disk IOPS, and latency.

    For the random workloads, the CPU utilization is 50 percent for 4K random writes and 60 percent for 4K random reads, while memory and network consumption are relatively low. The average IOPS on each P3520 drive is 20K for random write and 80K for random read, which still has lots of headroom for further performance improvement. For sequential workloads, CPU utilization and memory consumption of sequential write is quite low, and it is obvious that the NIC bandwidth is the bottleneck for sequential read cases.

    4K Random write characteristics

    CPU utilization of user space consumption is 37 percent, which is 75 percent of total CPU utilization. Profiling results showed most of the CPU cycles are consumed by the Ceph OSD process; the suspected reason for the CPU headroom is that the software threading and locking model implementation limits Ceph scale-up ability on a single node, which remains as next step optimization work.

    System metrics for 4 K random write charts

    Figure 3. System metrics for 4K random write.

    4K Random read characteristics

    CPU utilization is about 60 percent, among which IOWAIT takes about 15 percent, so the real CPU consumption is also about 45 percent; similar to a random write case. The OSD disk’s read IOPS is quite steady at 80K, and 40 GBbE NIC bandwidth is about 2.1 GB/s. No obvious hardware bottlenecks were observed; the suspected software bottleneck is similar to 4K random write cases and needs further investigation.

    System metrics for 4 K random read charts

    Figure 4. System metrics for 4K random read.

    64K Sequential write characteristics

    CPU utilization and memory consumption for sequential write is quite low. Since the OSD replication number is 2, transfer bandwidth from NIC data is twice that of received bandwidth, and transfer bandwidth consists of two NICs’ bandwidth, one for the public network and one for the cluster network, and each NIC takes about 1.8 GB/s per port. OSD disk AWAIT time suffers a serious fluctuation and the highest disk latency is over 4 seconds, while the disk IOPS is quite steady.

    System metrics of 64 K sequential write charts

    Figure 5. System metrics of 64K sequential write.

    64K Sequential read characteristics

    For the sequential read case, we observed that the bandwidth of one NIC reaches 4.4 GB/s, which is about 88 percent of total bandwidth. CPU utilization and memory consumption of sequential write is quite low. OSD disk read IOPS and latency is steady.

    System metrics of 64 K sequential read charts

    Figure 6. System metrics of 64K sequential read.

    Performance comparison with Intel® Xeon® processor E5 2699

    Table 3. Intel® Xeon® processor E5 2699 cluster configuration.

    Ceph Configuration with Intel® Xeon® Processor E5 2699
    CPUIntel Xeon processor E5-2699 v4 @ 2.2 GHz
    Memory128 GB
    NICMellanox 2x 40 GbE (80 Gb for Ceph nodes),
    Mellanox 1x 40 GbE (40 Gb for client nodes)
    StorageData: 4x Intel® SSD DC P3520/2x 2.0 TB
    WAL : 1x Intel® Optane™ SSD DC P4800X 375 GB
    Software ConfigurationUbuntu* 16.04, Linux* Kernel 4.8, Ceph version 12.2.2

    The test system with Intel® Xeon® processor E5 2699 shares the same cluster topology and hardware configuration as the test system with an Intel Xeon Platinum 8180 processor. The only difference is that each server or client node is set up with Intel Xeon processor E5-2699 v4 and 128 GB memory.

    For the software configuration, Ceph 12.0.3 was adopted in the Intel Xeon processor E5 2699 test, and each Intel® SSD DC P3520 Series runs four OSD daemons. This differs from the Intel Xeon Platinum 8180 processor configuration, which runs only one OSD daemon.

    Table 4. Performance comparison overview.

     Intel Xeon processor E5 2699 + 12.0.3Intel Xeon Platinum 8180 Processor + 12.2.2
    4K Random Write452,760 IOPS500,259 IOPS
    4K Random Read2,037,400 IOPS2,453,200 IOPS
    64K Sequential Write7,324 MB/s8,714 MB/s
    64K Sequential Read21,264 MB/s21,949 MB/s

    As shown in Table 4, all four input/output pattern test results with Intel Xeon Platinum 8180 processors are better than with Intel Xeon processor E5. Especially for 4K random write and read test, the throughputs using Intel Xeon Platinum 8180 processors improved by 10 percent and 20 percent, respectively.

    Performance comparison with Intel® Xeon® Gold 6140 processor

    Table 5. Cluster configuration.

    Ceph Configuration with Intel Xeon Gold 6140 Processor
    CPUIntel Xeon Gold 6140 processor @ 2.30 GHz
    Memory192 GB
    NICMellanox 2x 40 GbE (80 Gb for Ceph nodes),
    Mellanox 1x 40 GbE (40 Gb for client nodes)
    StorageData: 4x Intel® SSD DC P3520/2x 2.0 TB
    WAL: 1x Intel® Optane™ SSD DC P4800X 375 GB
    Software ConfigurationUbuntu* 16.04, Linux* Kernel 4.8, Ceph version 12.2.2

    The test system consists of five Ceph storage servers and five client nodes. For servers, each node is set up with an Intel Xeon Platinum 8180 processor and 384 GB memory, using 1x Intel Optane SSD DC P4800X 375 GB as a BlueStore WAL device, 4x Intel SSD DC P3520 2TB as a data drive, and 2x Mellanox 40 GbE NIC as separate cluster and public networks for Ceph.

    For clients, each node is set up with an Intel Xeon Gold 6140 processor with 192 GB memory and 1x Mellanox 40 GbE NIC.

    Ceph 12.2.2 was used, and each Intel SSD DC P3520 Series runs one OSD daemon. The RBD pool used for the testing was configured with two replications system topology, described in Figure 1.

    Table 6. Performance comparison.

     Intel Xeon Platinum 8180 ProcessorIntel Xeon Gold 6140 Processor
    4K Random Write500,259 IOPS450,553 IOPS
    4K Random Read2,453,200 IOPS2,025,400 IOPS
    64K Sequential Write8,714 MB/s7,379 MB/s
    64K Sequential Read21,949 MB/s22,182 MB/s

    As shown in Table 5, performance by Intel Xeon Platinum 8180 processors is better than with Intel Xeon Gold 6140 processors in 4K random read (1.21x), 4K random write (1.11x), and 64K sequential write (1.18x). Since 64K sequential read of these two configurations both hit the 40 GbE hardware limitation, bandwidth results of 64K sequential read are similar.

    Ceph Software Optimization

    Background

    A Ceph Block Device stripes a block device image over multiple objects in the Ceph Storage Cluster, where each object gets mapped to a placement group and distributed, and the placement groups are spread across separate Ceph OSD daemons throughout the cluster. This is to say when object requests are being processed, CRUSH (Controlled, Scalable, Decentralized Placement of Replicated Data) maps each object to a placement group separately, since requests to an OSD are sharded by their placement group identifier. Each shard has its own queue and these queues neither interact nor share information with each other. The number of shards can be controlled with the configuration options “osd_op_num_shards” and “osd_op_num_threads_per_shard”. A proper number makes better use of CPU and memory, and the impact on output performance.

    Table 7. Ceph OSD configuration description.

    ConfigurationDescription
    osd_op_num_shardsNumber of queues to requests
    osd_op_num_threads_per_shardThreads number for each queue

    Performance evaluation of osd_op_num_shards

    Tuning osd_op_num_shards chart

    Figure 7. Ceph OSD tuning performance comparison.

    From performance evaluation results, we observed a 1.19x performance improvement after tuning osd_op_num_shards to 64, while continuously increasing osd_op_num_shards from 64 to 128 showed a slight performance regression.

    Performance evaluation of osd_op_num_threads_per_shard

    Ceph OSD tuning performance comparison chart

    Figure 8. Ceph OSD tuning performance comparison.

    From performance evaluation results, we observed a 1.17x performance improvement after optimization.

    Ongoing and Future Optimizations

    Based on the above performance numbers and system metrics, we need further optimization on the Ceph software stack to resolve the OSD scale-up issues, to take full advantage of hardware capability and features.

    Better RDMA integration with Ceph Aio Messenger

    From the CPU utilization frame graph of 4K random read and 4K random write, 28.8 percent and 22.24 percent CPU is used to handle network-related work, respectively, using an Ethernet. With an increasing demand to replace the Ethernet with remote direct memory access (RDMA) and an optimized implementation in Ceph RDMA integration, these parts of CPU utilization will be brought down and freed for other applications.

    CPU profiling for 4K RR

    CPU profiling for 4 K R R

    CPU profiling for 4K RW

    CPU profiling for 4 K R W

    Figure 9. Ceph OSD CPU profiling.

    A new native SPDK/NVMe-Focused object store based on BlueStore

    A log-structured BTree object store is now under discussion by the Ceph Community, which aims to improve performance significantly for small objects input/output when using non-volatile memory express (NVMe) devices. This would involve moving the fast paths of the OSD into a reactive framework (Seastar*), and eliminating the severe rewrite performance limit by using RocksDB (log-structured merge-tree) as a foundational building block. See New ObjectStore.

    A new async-osd

    Ceph OSD is now being refactored to become an async-osd for future generations. The goal is to integrate Seastar, a futures-based, designed for shared-nothing userspace scheduling and networking framework into Ceph OSD codes, so it works better with the coming fast (non-volatile random-access memory-speed) devices.

    Summary

    In this paper, we presented performance results of Intel Optane technology with Ceph AFA reference architecture on Intel Xeon Scalable processors. This configuration demonstrated excellent throughput and latency. The 64K sequential read and write throughput is 21,949 MB/s and 8,714 MB/s, respectively (maximums with 40 GbE NIC). 4K random read throughput is 2,453K IOPS with 5.36 ms average latency, while 4K random write throughput is 500K IOPS with 12.79 ms average latency.

    For read-intensive workloads, especially with small blocks, a top-bin processor from the Intel Xeon Scalable processor family, such as the Intel Xeon Platinum 8180 processor, is recommended. It provides up to 20 percent performance improvement compared with the Intel Xeon Gold 6140 processor.

    Software tuning and optimization also provided up to 19 percent performance improvement for both read and write compared to default-configured Intel Optane technology with Ceph AFA cluster on Intel Xeon Scalable processors. Since hardware headroom is observed with the current hardware configuration, performance promises continuous improvement with ongoing Ceph optimizations like RDMA messenger, NVMe-focused object store, async-osd, and so on, in the near future.

    About the Authors

    Chendi Xue is a member of the Cloud Storage Engineering team from Intel Asia-Pacific Research & Development Ltd. She has five years’ experience in Linux cloud storage system development, optimization and benchmark, including Ceph benchmark and tuning, CeTune(a ceph benchmark tool) development, and HDCS(a hyper-converged distributed cache storage system) development.

    Jian Zhang manages the cloud storage engineering team in Intel Asia-Pacific Research & Development Ltd. The team’s focus is primarily on open source cloud storage performance analysis and optimization, and building reference solutions for customers based on OpenStack Swift and Ceph. Jian Zhang is an expert on performance analysis and optimization for many open source projects, including Xen, KVM, Swift and Ceph, and benchmarking workloads like SPEC*. He has worked on performance tuning and optimization for seven years and has authored many publications related to virtualization and cloud storage.

    Jianpeng Ma is a member of the Cloud Storage Engineering team from Intel Asia-Pacific Research & Development Ltd. He is currently focused on Ceph development and performance tuning for Intel platforms and reference architectures. Jianpeng gained software development and performance optimization experience for the md driver of linux kernel before joining Intel.

    Jack Zhang is currently a senior SSD Enterprise Architect in Intel’s NVM (non-volatile memory) solution group. He manages and leads SSD solutions and optimizations and next generation 3D XPoint solutions and enabling across various vertical segments. He also leads SSD solutions and optimizations for various open source storage solutions, including SDS, OpenStack, Ceph, and big data. Jack held several senior engineering management positions before joining Intel in 2005. He has many years’ design experience in firmware, hardware, software kernel and drivers, system architectures, as well as new technology ecosystem enabling and market developments.

    Reference

    1. Ceph website
    2. Architecture and technology: Intel® Optane™ Technology
    3. A 3D animation: Intel® 3D NAND Technology Transforms the Economics of Storage
    4. Our earlier article: Use Intel® Optane™ Technology and Intel® 3D NAND SSDs to Build High-Performance Cloud Storage Solutions
    5. The New ObjectStore
    6. Description of async-osd

    Different Approaches in Intel(R) IPP Resize

    $
    0
    0

     

    Intel® Integrated Performance Primitive (Intel® IPP) library exists more than 20 years. Together with this long history Resize functionality also continuously changes from release to release: - API became more convenient, scaling approach was changes.

    Difference between Resize APIs

    Historically there were 3 base Resize API in Intel® IPP library: ippiResizeSqrPixel, ippiResize, ippiResizeLinear (and for other interpolation types similarly).

    The main difference was introduced by switching from ippiResize to ippiResizeSqrPixel, when the “pixel-point” approach was changed to the “square-pixel” approach.

    ippiResize “point-pixel” approach:

    This approach represents an image as a grid with a pixel in each node. Working with such format we operate with grid.

    The approach is not natural. For example, consider an image 7x7 pixels. If we assume that distance between neighbor horizontal or vertical pixels is one, then the distance between extreme pixels is six. If we want to double image size, we must double the image grid, the result grid has the distance between extreme pixels is 12. Thus after such resize transformation of 7x7 pixel image we obtain the 13x13 pixel image. Many customers weren’t satisfied with algorithmic difference from popular Adobe® PhotoShop®.

    ippiResizeSqrPixel/ippiResize<Interpolation> “square-pixel” approach:

    This approach considers pixels as "squares" and resizes a number of such "squares". This model is more natural than previous. Thus after increasing twice the image with size 7x7 pixels we obtain the image with size 14x14 pixels.

    ippiResizeSqrPixel and ippiResizeLinear have the same scaling approaches. The difference is in supported border types (ippiResizeSqrPixel supports Replicate borders by default, ippiResizeLinear supports Replicate and InMem border types). Also ippiResizeSqrPixel has a possibility to set real scale factor but ippiResizeLinear supports integer scaling only.

    Difference between the coordinate systems used by ipprWarpAffine and ipprResize (2D and 3D cases)

    The function ipprResize considers an image as set of cubic pixels. Each pixel is a cube with unit volume (1x1x1). So the function works with center points of pixels. The transformation that maps a source image to the destination one can be expressed by the following formula:

    The function ipprWarpAffine considers an image as set of point pixels. Each pixel is a point with zero volume, the distance between neighbor pixels is 1 along each dimension. So the function just works with points. The transformation that maps a source image to the destination one can be expressed by the following formula:

    Related Link:

    Developer reference: https://software.intel.com/en-us/ipp-dev-reference-resizelinear

    ResizeChangesinIntel® IPP 7.1

    Setting Up a Time Series for Sensor Data Using Amazon Web Services (AWS)* Greengrass and the UP Squared* Board

    $
    0
    0

    Intro

    This article explores the method for setting up a time series for sensor data using the UP Squared* board and Grove shield, the Amazon Web Services (AWS)* Greengrass, and Plotly*. First, we will collect the data from Grove’s UV sensor then we will create a time series using Plotly. Finally, we will publish the time series URL to Greengrass’s IoT Cloud using the AWS Lambda function.

    Learn more about the AWS* Greengrass

    Learn more about the UP Squared board

    Learn more about Plotly

    Prerequisites

    UP Squared board:

    AWS Greengrass:

    Plotly:

    AWS Greengrass

    To install AWS Greengrass, follow these instructions

    Check that you have installed all the needed dependencies:

    sudo apt update
    git clone https://github.com/aws-samples/aws-greengrass-samples.git
    cd aws-greengrass-samples
    cd greengrass-dependency-checker-GGCv1.3.0
    sudo ./check_ggc_dependencies

    Code 1. Commands to Check AWS Dependencies

    On UP Squared board, start Greengrass:

    cd path-to-greengrass-folder/greengrass/ggc/core
    sudo ./greengrassd start

    Code 2. Commands to Start AWS Greengrass

    Time Series

    On UP Squared board, install Plotly and its dependencies:

    sudo pip install pandas
    sudo pip install flask
    sudo apt-get install sqlite3 qlite3-dev
    sudo pip install plotly

    Code 3. Commands to Plotly with Dependencies

    To update Plotly:

    sudo pip install plotly --upgrade

    Code 4. Commands to Update Plotly

    Go to your home directory and create a new directory:

    cd ~
    mkdir .plotly

    Code 5. Commands to Create a Directory

    Get the API key and your login info from the Plotly account page. Create .credentials file in the .plotly directory:

    {
        “username”: “your-username”,
        “stream_ids”: [],
        “api_key”: “your-api-key”
    }

    Code 6. .credentials file

    You will need your username and API key information in the next section to authenticate with Plotly.

    Grove Sensors

    To interface with Grove sensors, install MRAA and UPM libraries:

    sudo add-apt-repository ppa:mraa/mraa
    sudo apt-get update
    sudo apt-get install libmraa1 libmraa-dev mraa-tools python-mraa python3-mraa
    sudo apt-get install libupm-dev libupm-java python-upm python3-upm node-upm upm-example

    Code 7. Commands to Install Grove Dependencies

    UV Sensor

    This section shows how to collect data with the Grove UV sensor and save the sensor data in a CSV. The second script reads the CSV file and creates a time series using Plotly and Pandas Python* packages. The URL for time series is saved in a file that is used by the AWS Lambda function in the next section. Code used in this section was modified from the source code, which can be found here. Here’s the modified code we will use to retrieve and save the sensor data:

    from __future__ import print_function
    import time, sys, signal, atexit
    from upm import pyupm_si114x as upmSi114x
    import mraa
    
    def main():
        # Interface with sensors through the Grove shield
        mraa.addSubplatform(mraa.GROVEPI, "0")
    
        # Instantiate a SI114x UV Sensor on I2C bus 0
        myUVSensor = upmSi114x.SI114X(0)
    
        ## Exit handlers ##
        # This stops python from printing a stacktrace when you hit control-C
        def SIGINTHandler(signum, frame):
            raise SystemExit
    
        # This function lets you run code on exit,
        # including functions from myUVSensor
        def exitHandler():
            print("Exiting")
            sys.exit(0)
    
        # Register exit handlers
        atexit.register(exitHandler)
        signal.signal(signal.SIGINT, SIGINTHandler)
    
        # First initialize it
        myUVSensor.initialize()
    
        print("UV Index Scale:")
        print("---------------")
        print("11+        Extreme")
        print("8-10       Very High")
        print("6-7        High")
        print("3-5        Moderate")
        print("0-2        Low\n")
    
        # update every second and print the currently measured UV Index
        while (1):
            # update current value(s)
            myUVSensor.update()
            uv = myUVSensor.getUVIndex()
            # print detected value
            print("UV Index:", uv)
            
            with open("uv_data.csv", "a+") as uv_data_file:
                uv_data_file.write(str(uv) +',\n')
            time.sleep(10)
    
    if __name__ == '__main__':
        main()
    

    Code 8. uv_sensor.py, Python Code to Get and Record UV Sensor Data

    The Python code gets the data from Grove UV sensor and writes it to the uv_data.csv, with a timestamp.

    Save the code as uv_sensor.py on the UP Squared board. To collect the UV sensor data, run the code:

    sudo python uv_sensor.py 

    Code 9. Command to Run Python Code

    After you’ve collected enough data, press Ctrl+C to stop running code.

    Save create_plot.py:

    import plotly.plotly as py
    import plotly.graph_objs as go
    import pandas as pd
    from datetime import datetime
    
    
    # Plotly account authentication
    py.sign_in('plotly-username', 'plotly-api-key')
    
    # Reading CSV file using Pandas package
    df = pd.read_csv("uv_data.csv", header=None, parse_dates=True, infer_datetime_format=True, usecols=[0,1])
    
    # Creating a time series with Plotly package
    data = [go.Scatter(
              x=df[0],
              y=df[1])]
    url = py.plot(data)
    
    # Saving URL to a file
    with open("url_file", "a+") as url_file:
        url_file.write(str(url))

    Code 10. create_plot.py, Python Code to Create Time Series

    Replace plotly-username and plotly-api-key with your Plotly credentials.

    This code reads the UV sensor data stored in CSV file and creates a time series that is displayed online. The URL is written to a file, so AWS Lambda can use it later in the tutorial.

    Run the code:

    sudo python create_plot.py

    Code 11. Command to Run Python Code

    AWS Lambda

    This section shows you how to prepare for and create the AWS Lambda function. This function will read the time series URL from a file and then publish it via MQTT client to the Greengrass’s IoT Cloud. At the end, we will see the MQTT messages received and the time series with UV sensor data.

    Go to AWS console and then to the AWS IoT page and select Software from the bottom left. Download the AWS Greengrass Core SDK by clicking on Configure Download. Choose Python* 2.7 and click Download Greengrass Core SDK. After the package has loaded, untar it:

    tar –xzvf greengrass-core-python-sdk-1.0.0.tar.gz

    Code 12. Command to Untar a Package

    Go to the HelloWorld folder:

    cd aws_greengrass_core_sdk/examples/HelloWorld

    Code 13. Command to Go to the HelloWorld Folder

    Unzip the zip file:

    unzip greengrassHelloWorld.zip

    Code 14. Command to Unzip A Package

    Copy url_file to the same folder:

    cp /path-to-url-file/url_file .

    Code 15. Command to Copy a File

    Copy time_series.py and save it in the same folder:

    import greengrasssdk
    import platform
    from threading import Timer
    import time
    
    
    # Creating a greengrass core sdk client
    client = greengrasssdk.client('iot-data')
    
    # Retrieving platform information to send from Greengrass Core
    my_platform = platform.platform()
    
    
    def time_series_run():
        fopen = open("url_file", "r")
        url = fopen.read()
        if not my_platform:
            client.publish(topic='ts/uv', payload='View the time series: {} Sent from Greengrass Core.'.format(url))
        else:
            client.publish(topic='ts/uv', payload='View the time series: {} Sent from Greengrass Core running on platform'.format(url))
    
        # Asynchronously schedule this function to be run again in 10 seconds    Timer(10, time_series_run).start()
    
    
    # Start executing the function above
    time_series_run()
    
    def function_handler(event, context):
        return

    Code 16. time_series.py, Python Code to Publish Time Series URL to AWS Greengrass

    Create a zip file, timeseries.zip, which will be later uploaded to the AWS Lambda function:

    zip –r uv_timeseries.zip greengrass_common/ greengrass_ipc_python_sdk/ greengrasssdk/ time_series.py

    Code 17. Command to Create a Zip File

    Go to AWS console, click Services on top left, put Lambda in search bar and click on it. The Lambda Management Console will open.

    AWS Lambda Functions View
    Figure 1. AWS Lambda Functions View

    Click Create function.

    If not selected, select Author from scratch and fill out needed fields:

    AWS Lambda Create Function View
    Figure 2. AWS Lambda Create Function View

    Click Create function.

    Upload time_series.zip. Change handler name to time_series.function_handler. Click Save:

    Creating AWS Lambda Function View
    Figure 3. Creating AWS Lambda Function View

    Click on Actions, select Create new version and call it the first version in description:

    Publishing New Version of AWS Lambda Function
    Figure 4. Publishing New Version of AWS Lambda Function

    Click Publish.

    Go to IoT Core/AWS IoT console. Choose Greengrass from leftside menu, select Groups underneath it, and select your group from main window:

    AWS Greengrass Groups View
    Figure 5. AWS Greengrass Groups View

    Select Lambda from the leftside menu. Click Add Lambda on right top corner of Lambdas screen:

    AWS Greengrass Group View
    Figure 6. AWS Greengrass Group View

    Select Use Existing Lambda:

    Adding AWS Lambda Function for the Greengrass Group
    Figure 7. Adding AWS Lambda Function for the Greengrass Group

    Select time_series from the menu and click Next:

    Using Existing AWS Lambda
    Figure 8. Using Existing AWS Lambda

    Choose Version 1 and click Finish:

    selecting Lambda version
    Figure 9. Selecting Lambda Version

    Click on dotted area and select Edit Configuration:

    select dotted area and select Edit Configuration
    Figure 10. Lambda Functions Within Greengrass Group View

    Change Timeout to 25 seconds and choose Lambda lifecycle to be a long-lived function:

    Editing Lambda function view
    Figure 11. Editing Lambda Function View

    Click Update on the bottom of the page.

    Click the little grey back button, select Subscriptions.

    Click Add Subscription or Add your first Subscription:

    click add subscription
    Figure 12. Adding Subscription View

    For the source, choose from Lambdas tab, select time_series. For the target, select IoT Cloud:

    edit subscription view
    Figure 13. Editing Subscription View

    Click Next. Add ts/uv for the topic:

    click next and add ts/uv for the topic
    Figure 14. Editing Topic for Subscription View

    Click Finish.

    On the group header, click Actions, select Deploy and wait until it is successfully completed:

    subscriptions view
    Figure 15. Subscriptions View

    Go to the  AWS IoT console. Select Test from the leftside menu. Type ts/uv in the topic field, change MQTT payload display to display it as strings, and click Subscribe to topic:

    MQTT client view
    Figure 16. MQTT Client View

    After some time, messages should display on the bottom of the screen:

    MQTT messages view
    Figure 17. MQTT Messages View

    Copy the URL and paste it in browser. The time series of UV sensor data is shown below:

    UV sensor data time series
    Figure 18. UV Sensor Data Time Series

    About the author

    Rozaliya Everstova is a software engineer at Intel in the Software and Services Group working on scale enabling projects for Internet of Things.

    More on UP Squared*

    Security vulnerability found in the bleach module

    $
    0
    0

    A recent security vulnerability has been found in Mozilla's bleach library module, which is a common dependency for packages such as jupyter notebook.  In order to mitigate the issue, it is recommended that all users of the package (as well as all Intel® Distribution for Python* users) update this module to the latest version.  

    For more information about the security issue and the fixes that were made to the bleach module, please visit the link here: https://nvd.nist.gov/vuln/detail/CVE-2018-7753

    Instructions on how to update and install bleach to the latest version are below:

    To download the package manually, please go to https://anaconda.org/intel/bleach/files


    Conda

    • For Unix platforms:
      • Online mode: <install_location>/bin/conda install -c intel bleach=2.1.3                              
      • Offline mode: <install_location>/bin/conda install <absolute_path_to_conda_pkg>              
    • For Windows platforms:
      • Online mode: <install_location>\Scripts\conda install -c intel bleach=2.1.3                     
      • Offline mode: <install_location>\Scripts\conda install <absolute_path_to_conda_pkg>       

    pip

    • For Unix platforms:
      • <install_location>/bin/pip uninstall bleach                             
      • Online mode: <install_location>/bin/pip install --no-deps bleach   
      • Offline mode: <install_location>/bin/pip install --no-deps <absolute_path_to_local_bleach_whl>
    • For Windows platforms:
      • <install_location>\Scripts\pip uninstall bleach                 
      • Online Mode: <install_location>\Scripts\pip install --no-deps bleach  
      • Offline Mode: <install_location>\Scripts\pip install --no-deps <absolute_path_to_local_bleach_whl>  

    Iceberg Classification Using Deep Learning on Intel® Architecture

    $
    0
    0

    Abstract

    Poor detection and drifting icebergs are major threats to marine safety and physical oceanography. As a result of these factors, ships can sink, thereby causing a major loss of human life. To monitor and classify the object as a ship or an iceberg, Synthetic Aperture Radar (SAR) satellite images are used to automatically analyze with the help of deep learning. In this experiment, the Kaggle* iceberg dataset (images provided by the SAR satellite) was considered, and the images were classified using the AlexNet topology and Keras library. The experiments were performed on Intel® Xeon® Gold processor-powered systems, and a training accuracy of 99 percent and inference accuracy of 86 percent were achieved.

    Introduction

    An iceberg is a large chunk of ice that has been calved from a glacier. Icebergs come in different shapes and sizes. Because most of an iceberg’s mass is below the water surface, it drifts with the ocean currents. This poses risks to ships and their navigation and infrastructure. Currently, many companies and institutions are using aircrafts and shore-based support to monitor the risk from icebergs. This monitoring is challenging in harsh weather conditions and remote areas.

    To mitigate these risks, Statoil, an international energy company, is working closely with the Centre for Cold Ocean Resources Engineering (C-Core) to remotely sense icebergs and ships using SAR satellites. The SAR satellites are not light dependent and can capture images of the targets even in darkness, clouds, fog, and harsh weather conditions. The main objective of this experiment on Intel® architecture was to automatically classify a satellite image as an iceberg or a ship.

    For this experiment, AlexNet topology with the Keras library was used to train and inference an iceberg classification on an Intel® Xeon® Gold processor. The iceberg dataset was taken from Kaggle, and the approach was to train the model from scratch.

    Choosing the environment

    Hardware

    Experiments were performed on an Intel Xeon Gold processor-powered system, as described in Table 1.

    Table 1. Intel® Xeon® Gold processor configuration.

    ComponentsDetails
    Architecturex86_64
    CPU op-mode(s)32 bit, 64 bit
    Byte orderLittle-endian
    CPU(s)24
    Core(s) per socketSix
    Socket(s)Two
    CPU familySix
    Model85
    Model nameIntel® Xeon® Gold 6128 processor 3.40 GHz
    RAM92 GB

    Software

    The Keras framework along with the Intel® Distribution for Python* were used as the software configuration, as described in Table 2.

    Table 2. Software configuration.

    Software/LibraryVersion
    Keras*2.1.2
    Python* 3.6 (Intel optimized)

    Dataset

    The iceberg dataset was considered from the Statoil/C-CORE Iceberg Classifier Challenge. The machine-generated images were labeled by human experts with geographic knowledge of the target. All the images were 75x75 pixels.

    For train.json, each image consisted of the following fields:

    • Id: The ID of the image.
    • band_1, band_2: The flattened image data. Each band has 75x75 pixel values in the list, hence the list has 5,625 elements. These values are not the non-negative integers in image files because they have physical meanings. These are the float numbers with unit dB. Band 1 and Band 2 are the signals that are characterized by radar backscatter produced from different polarizations at a particular incidence angle. The polarizations correspond to HH (transmit or receive horizontally) and HV (transmit horizontally and receive vertically).
    • inc_angle: The incidence angle at which the image was taken. This field has missing data marked “na,” and those images with na incidence angles are all in the training data to prevent leakage.
    • is_iceberg: The target variable. S is set to “1” if it is an iceberg and “0” if it is a ship. This field only exists in train.json.

    In inferencing, the trained AlexNet model is used to predict is_iceberg field.

    AlexNet architecture

    AlexNet is one of the deep convolutional neural networks designed to deal with complex image classification tasks on an ImageNet dataset. AlexNet has five convolutional layers, three sub-sampling layers, and three fully connected layers. Its total trainable parameters are in crores. The arrangement and configuration of all the layers of AlexNet are shown in Figure 1.

    AlexNet architecture
    Figure 1. AlexNet architecture (credit: CV-Tricks4).

    Execution steps

    This section explains the steps followed in the end-to-end process for training and inferencing iceberg classification model on AlexNet architecture. The steps include:

    1. Preparing the input
    2. Model training and inference

    Preparing input

    Handling missing values

    In the dataset, the inc_angle field had more than 130 missing values. Therefore, the k-nearest neighbors imputation technique was used, where N=3. The chart in Figure 2 represents the missing values before and after applying the imputation technique.

    Before imputation
    Figure 2. Before imputation.

    After imputation
    Figure 3. After imputation.

    Model training and testing

    After installing Keras and the Intel® Distribution for Python, the next step was to train the model. The training technique adopted was to train all the layers from scratch.

    The following command was used to download the iceberg dataset and train the algorithm with imputed data toward classifying the iceberg images and producing the inference results:

    python ~/iceberg_kaggle/icebergchallenge.py

    Dataset download
    Figure 4. Dataset download.

    Training

    Training was performed in two runs. The first run was to decide on the optimizer to be used, and in the second run training was performed with the selected optimizer for 5,000 epochs.

    Run1:

    To achieve better accuracy, Keras optimizers, such as the stochastic gradient descent (SGD) and ADAM (adaptive moment estimation), were used to control the gradients. Table 3 lists the results for 250 epochs.

    Table 3. Results with optimizers.

    OptimizersResultsValue
    SGDTraining accuracy62.5 percent
    Loss0.37
    ADAMTraining accuracy61.6 percent
    Loss6.0

    Because the loss was less in SGD, as compared to ADAM, we proceeded with the second run using the SGD optimizer.

    Run2:

    Training was performed on 1,074 images, see Figure 5, with a batch size of 128 images for 5,000 epochs. Table 4 shows the results.

    Table 4. Training results.

    ResultsValue
    Accuracy99 percent
    Loss 0.0009

    Training snapshot
    Figure 5. Training snapshot.

    Inferencing

    Inference was performed with two different datasets. The first inference measured the inference accuracy on the subset of train.json. The second inference was performed on test.json to submit the predictions to Kaggle.

    Inference1:

    Inferencing was performed on 530 images, see Figure 6. Table 5 shows the results.

    Table 5. Inference results.

    ResultsValue
    Accuracy87 percent
    Loss 0.1009

    Inferencing
    Figure 6. Inferencing.

    Inference 2:

    Inferencing on test.json was performed to predict the is_iceberg value and submitted to Kaggle.

    Because there are no ground truth values, we cannot calculate the accuracy.

    For each test ID, the probability that the image is an iceberg is as shown in Table 6.

    Table 6. Sample output.

    Sample output

    Conclusion

    In this paper, we showed how training from scratch and the testing of the iceberg classification was performed using the AlexNet topology with Keras and an iceberg dataset in the Intel Xeon Gold processor environment. The experiment was extended by applying different imputation techniques on the inc_angle field because it had missing values. We also observed better accuracy using the Keras SGD optimization technique.

    About the Author

    Manda Lakshmi Bhavani and Rajeswari Ponnuru are part of the Intel and Tata Consultancy Services relationship, working on AI academia evangelization.

    References

    1. Keras tutorial:
    https://keras.io/models/model/

    2. AlexNet:
    http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf

    3. Kaggle Statoil/C-CORE Iceberg Classifier Challenge :
    https://www.kaggle.com/c/statoil-iceberg-classifier-challenge

    4. AlextNet architecture:
    http://cv-tricks.com/tensorflow-tutorial/understanding-alexnet-resnet-squeezenetand-running-on-tensorflow/

    Related Resources

    Keras optimizations: https://keras.io/optimizers/ 

    Understanding AlexNet: https://sushscience.wordpress.com/2016/12/04/understanding-alexnet/  

    Intel® VTune™ Amplifier Sampling Driver Downloads

    $
    0
    0

    Intel® VTune™ Amplifier uses kernel drivers to enable hardware event-based sampling and collect event-based sampling data from Performance Monitoring Units on the CPU.

    The VTune Amplifier installer automatically uses the Sampling Driver Kit  included with the package to build drivers for your kernel with the default installation options.

    Sometimes updates to the operating system may cause these drivers to fail at build or loading. In those instances, users may need updated versions of the drivers before official updates of the product are available. The latest available drivers are available below.

    After downloading the file, extract the contents and replace the <install directory>/sepdk directory with this new sepdk directory. The default install directory is /opt/intel/vtune_amplifier/ . Then follow the instructions to build and install the driver here: https://software.intel.com/en-us/vtune-amplifier-help-building-and-installing-the-sampling-drivers-for-linux-targets

    FileOSVTune Amplifier VersionTargeted Audience
    sepdk.tar.gz

    Linux

    2017 Update 4 or newerUsers with kernel version 4.4.73-5 and newer

     

     

     

      

    Speech Recognition Using Deep Learning on Intel® Architecture

    $
    0
    0

    Abstract

    This paper demonstrates how to train and infer the speech recognition problem using deep neural networks on Intel® architecture. A scratch training approach was used on the Speech Commands dataset that TensorFlow* recently released. Inference was done using test audio clips to detect the label. The experiments were run on an Intel® Xeon® Gold processor system.

    Introduction

    The audio classification tasks are divided into three sub domains: music classification, speech recognition (particularly for the acoustic model), and acoustic scene classification. With the rapid development of mobile devices, speech-related technologies are becoming increasingly popular. For example, Google offers the ability to search by voice on Android* phones. In this study, we approach the speech recognition problem building a basic speech recognition network that recognizes thirty different words using a TensorFlow-based implementation.

    To help with this experiment, TensorFlow recently released the Speech Commands datasets. It includes 65,000 one-second-long utterances of 30 short words by thousands of different people.

    Continued research in the deep learning space has resulted in the evolution of many frameworks to solve the complex problem of speech recognition. These frameworks have been optimized, specific to the hardware, where they are run for better accuracy, reduced loss, and increased speed. In these lines, Intel has optimized the TensorFlow library for better performance on its Intel® Xeon® processors. This paper discusses the training and inferencing speech recognition problem that is built using sample convolutional neural network (CNN) architecture with the TensorFlow framework on an cluster powered by Intel® processors. We have adopted an approach by training the model from scratch.

    Document Content

    This section describes the end-to-end steps, from choosing the environment to running the tests on the trained speech recognition model.

    Choosing the environment

    Hardware
    Experiments were performed on Intel Xeon Gold processor-powered systems. Table 1 list the hardware details.

    Table 1. Intel Xeon Gold processor configuration.

    Architecture

    x86_64

    CPU op-mode(s)

    32 bit, 64 bit

    Byte order

    Little endian

    CPU(s)

    24

    Core(s) per socket

    Six

    Socket(s)

    Two

    CPU family

    Six

    Model

    85

    Model name

    Intel Xeon Gold 6128 CPU at 3.40 GHz

    RAM

    92 GB

    Software
    Intel® Optimization for TensorFlow* framework, along with Intel® Distribution for Python*, were used as the software configuration. Tables 2 list the details of the software.

    Table 2. Software configuration – Intel Xeon Gold processor

    TensorFlow

    1.4.0 (optimized by Intel)

    Python*

    3.6

    TensorBoard*

    0.1.5

    The software configurations listed in Table 2 are available on the hardware environments chosen, and no source build for TensorFlow was necessary.

    Dataset

    The Speech Commands dataset (TAR file) is comprised of 65,000 WAVE audio files (.wav) of people saying 30 different words. This data was collected by Google and released under a CC BY license, and this archive is more than 1 GB. Each audio file is a 1-second audio clip as either silence, an unknown word, yes, no, up, down, left, right, on, off, stop, or go. Twelve different sounds of the entire dataset consisting of 30 sounds were used for this experiment.

    Total training data set: 23701

    Training: 80 percent -- 18961
    Validation: 10 percent -- 2370
    Testing: 10 percent -- 2370

    We used a hash-function-based split to prevent repeating the files from one set to another.

    We maintained a list of all words such as up, go, off, on, stop, and so on.Train and test split was done based on each word to ensure all classes were covered so that there was no class imbalance.

    CNN-TRAD-POOL3 architecture

    The architecture used is based on the Convolutional Neural Networks for Small-footprint Keyword Spotting paper. TensorFlow provides different approaches to building neural network models. We chose CNN-TRAD-POOL3, because it is comparatively simple, quick to train, and easy to understand. The CNN-TRAD-POOL3 network is made of two convolution layers, max-pooling layers, one linear low-rank layer, one DNN layer, and one softmax layer. Figure 1 shows the CNN-TRAD-POOL3 architecture.

    Figure 1. CNN-TRAD-POOL3 model.

    Execution steps

    This section describes the steps we used in the end-to-end process for training, validation, and testing the speech recognition model on Intel® architecture.

    1. Setup for training
    2. Model training
    3. Inference

    Setup for training

    1. Install the optimization for TensorFlow optimized by Intel.
    2. Clone the TensorFlow repository from https://github.com/tensorflow/tensorflow.

    Model training

    After cloning the TensorFlow repository, the next step is to train the model. We adopted a scratch training technique in order to retrain all layers from scratch.

    The following command downloads the speech commands dataset and trains the algorithm toward detecting audio samples:

    python tensorflow/examples/speech_commands/train.py

    Experimental runs with inference

    On the Intel Xeon Gold Processor – Intel® AI DevCloud Cluster

    To execute on the Intel AI DevCloud cluster, use the following command to submit the training job:

    qsub speech.sh -l walltime=24:00:00 

    On this cluster, there is a restriction on walltime of six hours to execute a job. The maximum value of walltime that can be set is 24 hours. As shown in the qsub command, the walltime is set to 24 hours.

    The job script speech.sh has the following code:

    #!/bin/sh
    #PBS -l walltime=24:00:00
    which python
    cd ~/tensorflow/
    export PATH=/glob/intel-python/python3/bin/:$PATH
    numactl --interleave=all python ~/tensorflow/tensorflow/examples/speech_commands/train.py

    The following shows the details of the steps and accuracies:

    TensorBoard* Graphs

    TensorBoard is an effective tool to use for visualizing the training progress. By default, the script saves events to /tmp/retrain_logs, and loads the scripts by running the following command:

    tensorboard --logdir /tmp/retrain_logs

    Figure 2 shows the TensorBoard graphs for the Intel Xeon Gold processor.

    Figure 2. TensorBoard graphs - Intel Xeon Gold processor.

    The script used to export the trained model file for inference is as follows:

    echo python ~/tensorflow/tensorflow/examples/speech_commands/freeze.py --start_checkpoint=~/kaggle-speech/speech_commands_train/conv.ckpt-68000 --output_file=~/kaggle-speech/my_frozen_graph_68000.pb | qsub

    After the frozen model has been created, using the following code, test it with the label_wav.py script:

    echo python ~/tensorflow/tensorflow/examples/speech_commands/label_wav.py  --graph=~/kaggle-speech/my_frozen_graph_68000.pb --labels=~/kaggle-speech/speech_commands_train/conv_labels.txt --wav=~/kaggle-speech/speech_dataset/left/a5d485dc_nohash_0.wav
    | qsub
    
    left (score = 0.96563)
    right (score = 0.02616)
    _unknown_ (score = 0.00717)

    left is the top score because it is the correct label.

    Intel® Xeon® Gold processor metrics

    Table 3. Intel Xeon Gold processor metrics.

    Properties

    Intel Xeon Gold Processor

    Total amount of time

    83,400 (seconds)

    Total number of steps

    68,000

    Batch size

    100

    Total Wav files

    6,800,000

    Wav files per second

    (Total Wav files / Total amount of time)

    81.53

    Training accuracy

    93 percent

    Validation accuracy

    92 percent

    Testing accuracy

    92.5 percent

    Conclusion

    In this paper, we showed how we trained and tested speech recognition from scratch using a sample CNN model and the TensorFlow audio recognition dataset on the Intel Xeon Gold processor-based environments. The experiment can be extended by applying different optimization algorithms, changing learning rates, and varying input sizes, further improving accuracy.

    About the Authors

    Rajeswari Ponnuru and Ravi Keron Nidamarty are members of the Intel team, working on evangelizing artificial intelligence in the academic environment.

    References

    1. Kaggle's TensorFlow speech recognition challenge
    2. TensorFlow for audio recognition tutorial

    Related Resources

    TensorFlow* Optimizations on Modern Intel® Architecture
    Build and Install TensorFlow* on Intel® Architecture

    Developer Success Stories Library

    $
    0
    0

    Intel® Parallel Studio XE | Intel® System Studio  Intel® Media Server Studio

    Intel® Advisor | Intel® Computer Vision SDK | Intel® Data Analytics Acceleration Library 

    Intel® Distribution for Python* | Intel® Inspector XEIntel® Integrated Performance Primitives

    Intel® Math Kernel Library | Intel® Media SDK  | Intel® MPI Library | Intel® Threading Building Blocks

    Intel® VTune™ Amplifer 

     


    Intel® Parallel Studio XE


    Altair Creates a New Standard in Virtual Crash Testing

    Altair advances frontal crash simulation with help from Intel® Software Development products.


    CADEX Resolves the Challenges of CAD Format Conversion

    Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.


    Envivio Helps Ensure the Best Video Quality and Performance

    Intel® Parallel Studio XE helps Envivio create safe and secured code.


    ESI Group Designs Quiet Products Faster

    ESI Group achieves up to 450 percent faster performance on quad-core processors with help from Intel® Parallel Studio.


    F5 Networks Profiles for Success

    F5 Networks amps up its BIG-IP DNS* solution for developers with help from
    Intel® Parallel Studio and Intel® VTune™ Amplifer.


    Fixstars Uses Intel® Parallel Studio XE for High-speed Renderer

    As a developer of services that use multi-core processors, Fixstars has selected Intel® Parallel Studio XE as the development platform for its lucille* high-speed renderer.


    Golaem Drives Virtual Population Growth

    Crowd simulation is one of the most challenging tasks in computer animation―made easier with Intel® Parallel Studio XE.


    Lab7 Systems Helps Manage an Ocean of Information

    Lab7 Systems optimizes BioBuilds™ tools for superior performance using Intel® Parallel Studio XE and Intel® C++ Compiler.


    Mentor Graphics Speeds Design Cycles

    Thermal simulations with Intel® Software Development Tools deliver a performance boost for faster time to market.


    Massachusetts General Hospital Achieves 20X Faster Colonoscopy Screening

    Intel® Parallel Studio helps optimize key image processing libraries, reducing compute-intensive colon screening processing time from 60 minutes to 3 minutes.


    Moscow Institute of Physics and Technology Rockets the Development of Hypersonic Vehicles

    Moscow Institute of Physics and Technology creates faster and more accurate computational fluid dynamics software with help from Intel® Math Kernel Library and Intel® C++ Compiler.


    NERSC Optimizes Application Performance with Roofline Analysis

    NERSC boosts the performance of its scientific applications on Intel® Xeon Phi™ processors up to 35% using Intel® Advisor.


    Nik Software Increases Rendering Speed of HDR by 1.3x

    By optimizing its software for Advanced Vector Extensions (AVX), Nik Software used Intel® Parallel Studio XE to identify hotspots 10x faster and enabled end users to render high dynamic range (HDR) imagery 1.3x faster.


    Novosibirsk State University Gets More Efficient Numerical Simulation

    Novosibirsk State University boosts a simulation tool’s performance by 3X with Intel® Parallel Studio, Intel® Advisor, and Intel® Trace Analyzer and Collector.


    Pexip Speeds Enterprise-Grade Videoconferencing

    Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.


    Schlumberger Parallelizes Oil and Gas Software

    Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.


    Ural Federal University Boosts High-Performance Computing Education and Research

    Intel® Developer Tools and online courseware enrich the high-performance computing curriculum at Ural Federal University.


    Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

    Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.


    Intel® System Studio


    CID Wireless Shanghai Boosts Long-Term Evolution (LTE) Application Performance

    CID Wireless boosts performance for its LTE reference design code by 6x compared to the plain C code implementation.


    GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

    GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.


    NERSC Optimizes Application Performance with Roofline Analysis

    NERSC boosts the performance of its scientific applications on Intel® Xeon Phi™ processors up to 35% using Intel® Advisor.


    Daresbury Laboratory Speeds Computational Chemistry Software 

    Scientists get a speedup to their computational chemistry algorithm from Intel® Advisor’s vectorization advisor.


    Novosibirsk State University Gets More Efficient Numerical Simulation

    Novosibirsk State University boosts a simulation tool’s performance by 3X with Intel® Parallel Studio, Intel® Advisor, and Intel® Trace Analyzer and Collector.


    Pexip Speeds Enterprise-Grade Videoconferencing

    Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.


    Schlumberger Parallelizes Oil and Gas Software

    Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.


    Intel® Computer Vision SDK


    GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

    GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.


    Intel® Data Analytics Acceleration Library


    MeritData Speeds Up a Big Data Platform

    MeritData Inc. improves performance—and the potential for big data algorithms and visualization.


    Intel® Distribution for Python*


    DATADVANCE Gets Optimal Design with 5x Performance Boost

    DATADVANCE discovers that Intel® Distribution for Python* outpaces standard Python.
     


    Intel® Inspector XE


    CADEX Resolves the Challenges of CAD Format Conversion

    Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.


    Envivio Helps Ensure the Best Video Quality and Performance

    Intel® Parallel Studio XE helps Envivio create safe and secured code.


    ESI Group Designs Quiet Products Faster

    ESI Group achieves up to 450 percent faster performance on quad-core processors with help from Intel® Parallel Studio.


    Fixstars Uses Intel® Parallel Studio XE for High-speed Renderer

    As a developer of services that use multi-core processors, Fixstars has selected Intel® Parallel Studio XE as the development platform for its lucille* high-speed renderer.


    Golaem Drives Virtual Population Growth

    Crowd simulation is one of the most challenging tasks in computer animation―made easier with Intel® Parallel Studio XE.


    Schlumberger Parallelizes Oil and Gas Software

    Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.


    Intel® Integrated Performance Primitives


    JD.com Optimizes Image Processing

    JD.com Speeds Image Processing 17x, handling 300,000 images in 162 seconds instead of 2,800 seconds, with Intel® C++ Compiler and Intel® Integrated Performance Primitives.


    Tencent Optimizes an Illegal Image Filtering System

    Tencent doubles the speed of its illegal image filtering system using SIMD Instruction Set and Intel® Integrated Performance Primitives.


    Tencent Speeds MD5 Image Identification by 2x

    Intel worked with Tencent engineers to optimize the way the company processes millions of images each day, using Intel® Integrated Performance Primitives to achieve a 2x performance improvement.


    Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

    Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.


    Intel® Math Kernel Library


    DreamWorks Puts the Special in Special Effects

    DreamWorks Animation’s Puss in Boots uses Intel® Math Kernel Library to help create dazzling special effects.


    GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

    GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.

     


    MeritData Speeds Up a Big Data Platform

    MeritData Inc. improves performance―and the potential for big data algorithms and visualization.


    Qihoo360 Technology Co. Ltd. Optimizes Speech Recognition

    Qihoo360 optimizes the speech recognition module of the Euler platform using Intel® Math Kernel Library (Intel® MKL), speeding up performance by 5x.


    Intel® Media SDK


    NetUP Gets Blazing Fast Media Transcoding

    NetUP uses Intel® Media SDK to help bring the Rio Olympic Games to a worldwide audience of millions.


    Intel® Media Server Studio


    ActiveVideo Enhances Efficiency

    ActiveVideo boosts the scalability and efficiency of its cloud-based virtual set-top box solutions for TV guides, online video, and interactive TV advertising using Intel® Media Server Studio.


    Kraftway: Video Analytics at the Edge of the Network

    Today’s sensing, processing, storage, and connectivity technologies enable the next step in distributed video analytics, where each camera itself is a server. With Kraftway* video software platforms can encode up to three 1080p60 streams at different bit rates with close to zero CPU load.


    Slomo.tv Delivers Game-Changing Video

    Slomo.tv's new video replay solutions, built with the latest Intel® technologies, can help resolve challenging game calls.


    SoftLab-NSK Builds a Universal, Ultra HD Broadcast Solution

    SoftLab-NSK combines the functionality of a 4K HEVC video encoder and a playout server in one box using technologies from Intel.


    Vantrix Delivers on Media Transcoding Performance

    HP Moonshot* with HP ProLiant* m710p server cartridges and Vantrix Media Platform software, with help from Intel® Media Server Studio, deliver a cost-effective solution that delivers more streams per rack unit while consuming less power and space.


    Intel® MPI Library


    Moscow Institute of Physics and Technology Rockets the Development of Hypersonic Vehicles

    Moscow Institute of Physics and Technology creates faster and more accurate computational fluid dynamics software with help from Intel® Math Kernel Library and Intel® C++ Compiler.


    Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

    Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.


    Intel® Threading Building Blocks


    CADEX Resolves the Challenges of CAD Format Conversion

    Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.


    Johns Hopkins University Prepares for a Many-Core Future

    Johns Hopkins University increases the performance of its open-source Bowtie 2* application by adding multi-core parallelism.


    Mentor Graphics Speeds Design Cycles

     

    Thermal simulations with Intel® Software Development Tools deliver a performance boost for faster time to market.

     


    Pexip Speeds Enterprise-Grade Videoconferencing

    Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.


    Quasardb Streamlines Development for a Real-Time Analytics Database

    To deliver first-class performance for its distributed, transactional database, Quasardb uses Intel® Threading Building Blocks (Intel® TBB), Intel’s C++ threading library for creating high-performance, scalable parallel applications.


    University of Bristol Accelerates Rational Drug Design

    Using Intel® Threading Building Blocks, the University of Bristol helps slash calculation time for drug development—enabling a calculation that once took 25 days to complete to run in just one day.


    Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

    Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.


    Intel® VTune™ Amplifer


    CADEX Resolves the Challenges of CAD Format Conversion

    Parallelism brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.


    F5 Networks Profiles for Success

    F5 Networks amps up its BIG-IP DNS* solution for developers with help from
    Intel® Parallel Studio and Intel® VTune™ Amplifer.


    GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

    GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.


    Mentor Graphics Speeds Design Cycles

    Thermal simulations with Intel® Software Development Tools deliver a performance boost for faster time to market.

     


    Nik Software Increases Rendering Speed of HDR by 1.3x

    By optimizing its software for Advanced Vector Extensions (AVX), Nik Software used Intel® Parallel Studio XE to identify hotspots 10x faster and enabled end users to render high dynamic range (HDR) imagery 1.3x faster.


    Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

    Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.


     


    Science without Constraints

    $
    0
    0

    Fueling the Next Great Wave of Data-Driven Innovation in the Life Sciences

    We are on the cusp of a revolution in the biological and medical sciences. A whole human genome can now be sequenced in a matter of hours and for as little as USD 1,000, and we are moving quickly toward the USD 100 genome. Meanwhile, technologies such as Cryo-Electron Microscopy (Cryo-EM) and Molecular Dynamics are helping researchers visualize and understand cellular processes at the molecular level.

    These and other technologies are opening a window into the most fundamental processes of life. Biological pathways can be illuminated, disease mechanisms can be identified, and drug  discovery can be transformed from a multiyear, multi-billion dollar, trial and-error process to an efficient, data-driven workflow. Perhaps most importantly, precision medicine, with molecular level profiles and personalized treatments, will ultimately transform the way we diagnose and treat injury and disease.

    Professor Knut Reinert, PhD, and his team at the Free University of Berlin are collaborating with Intel to accelerate genome analysis by optimizing critical algorithms so they run efficiently on multicore and many-core Intel® processors.

    Download complete Solution Brief (PDF).

    Intel® Data Analytics Acceleration Library 2019 Beta Installation Guide

    $
    0
    0

    Please see the following links to the online resources and documents for the latest information regarding Intel DAAL:

    ·         Intel® DAAL Product Page

    ·         Intel® DAAL 2019 Beta Release Notes

    ·         Intel® DAAL 2019 Beta System Requirements

    These instructions assume a standalone installation of Intel® Data Analytics Acceleration Library (Intel® DAAL). If your copy of Intel® DAAL was included as part of one of our "suite products" (e.g., Intel® Parallel Studio XE) your installation procedure may be different than that described below; in which case, please refer to the readme and installation guides for your "suite product" for specific installation details.

    Before installing Intel® DAAL, check the Product Downloads section of Intel® Registration Center to see if a newer version of the library is available. The version listed in your electronic download license letter may not be the most current version available.

    The installation of the product requires a valid license file or serial number. If you are evaluating the product, you can also choose the "Evaluate this product (no serial number required)" option during installation.
    If you have a previous version of Intel® DAAL installed you do not need to uninstall it before installing a new version. If you choose to uninstall the older version, you may do so at any time. 

    Note: Installation on 32-bit hosts is no longer supported. However, the 32-bit library continues to exist, and can be used on 64-bit hosts.

    Installing Intel® DAAL on Windows* OS

    You can install multiple versions of Intel® DAAL and any combination of 32-bit and 64-bit variations of the library on your development system.

    These instructions assume you to have an Internet connection. The installation program will automatically download a license key to your system. If you do not have an internet connection, see the manual installation instructions below.

    Interactive installation on Windows* OS

    1. If you received the Intel® DAAL product as a download, double-click on the downloaded file to begin.
    2. You will be asked to choose a target directory ("c:\Users\<Username>\Downloads\"  by default) in which the contents of the self-extracting setup file will be placed before the actual library installation begins. You can choose to remove or keep temporarily extracted files after installation is complete. You can safely remove the files in this "downloads"  directory if you need to free up disk space; however, deleting these files will impact your ability to change your installation options at a later time using the add/remove applet, you will always be able to uninstall.)
    3. Click Next when the installation wizard appears.
    4. If you agree with the End User License Agreement, click Next to accept the license agreement.
    5. License Activation Options:
      • If you do have an Internet connection, skip this step and proceed to the next numbered step (below).
      • If you do not have an Internet connection, or require a floating or counted license installation, choose Alternative Activation and click Next; there will be two options to choose from:
        • Activate Offline:  requires a License File.
        • Use a License manager: Floating License activation
    6. Enter your serial number to activate and install the product.
    7. Activation completed. Click Next to continue.
    8. If there is package from another update of Parallel Studio XE installed, you will be able to select update mode on Choose Product Update Mode dialog:
      1. I want to apply this update to the existing version.
        Using this option will result in the original version being replaced by the updated version.
      2. I want to install this update separate from the existing version.
        Using this option will result in the update being installed in a different location, leaving the existing version unchanged.
    9. The Installation Summary dialog box opens to show the summary of your installation options (chosen components, destination folder, etc.). Click Install to start installation (proceed to step 15) or click Customize to change settings. If you select "Customize", follow steps 10-14.Installation summary

    10. In the Architecture Selection dialog box, select the architecture of the platform where your software will run.
    11. In the Choose a Destination Folder dialog box, choose the installation directory. By default, it is C:\Program Files\IntelSWTools. You may choose a different directory. All files are installed into the Intel Parallel Studio XE 2019 subdirectory (if you chose I want to install this update separate from the existing version, all files are installed into the parallel_studio_xe_2019.0.xxx directory, where xxx is the package number).
    12. Package contains components for integration into Microsoft Visual Studio*. You are able to select the Microsoft Visual Studio product(s) for integration on the Choose Integration target dialog box.
    13. If Microsoft Compute Cluster Pack* is present, and the installation detects that the installing system is a member of a cluster, the dialog box will be shown which provides you an option to install the product on all visible  nodes of the cluster or on the current node only(by default installation on all visible nodes is performed).
    14. The Installation Summary dialog box opens to show the summary of your installation options (chosen components, destination folder, etc.). Click Install to start installation.
    15. Click Finish in the final screen to exit the Intel Software Setup Assistant.

    Online Installation on Windows* OS

    The default electronic installation package for Intel® DAAL for Windows now consists of a smaller installation package that dynamically downloads and then installs packages selected to be installed. This requires a working internet connection and potentially a proxy setting if you are behind an internet proxy. Full packages are provided alongside where you download this online install package if a working internet connection is not available.

    Silent Installation on Windows* OS

    Silent installation enables you to install Intel® DAAL on a single Windows* machine in a batch mode, without input prompts. Use this option if you need to install on multiple similarly configured machines, such as cluster nodes.

    To invoke silent installation:

    1. Go to the folder where the Intel® DAAL package was extracted during unpacking; by default, it is the C:\Program Files\Intel\Download\w_daal_2019.y.xxx folder.
    2. Run setup.exe, located in this folder: setup.exe [command arguments]

    If no command is specified, the installation proceeds in the Setup Wizard mode. If a command is specified, the installation proceeds in the non-interactive (silent) mode.

    The table below lists possible values of  and the corresponding arguments.

    Command

    Required Arguments

    Optional Arguments

    Action

    install

    output=<file>,
    eula={accept|reject}
    installdir=<installdir>,
    license=<license>,
    sn=<s/n>,
    log=<log file>

    Installs the product as specified by the arguments.

    Use the output argument to define the file where the output will be redirected. This file contains all installer's messages that you may need: general communication, warning, and error messages.

    Explicitly indicate by eula=accept that you accept the End-user License Agreement.

    Use the license argument to specify a file or folder with the license to be used to activate the product. If a folder is specified, the installation program searches for *.lic files in the specified folder. You can specify multiple files/folders by supplying this argument several times or by concatenating path strings with the ";" separator.

    Use the sn argument to choose activation of the product through a serial number. This activation method requires Internet connection.

    Do not use the sn and license arguments together because they specify alternative activation methods. If you omit both arguments, the installer only checks whether the product is already activated.

    Use the log argument to specify the location for a log file. This file is used only for debugging. Support Engineers may request this file if your installation fails.

    remove

    output=<file>log=<log file>

    Removes the product. See the description of the install command for details of the output and log arguments.

    repair

    output=<file>

    log=<log file>

    Repairs the existing product installation. See the description of the install command for details of the output and log arguments.

    For example, the command line
     setup.exe install -output=C:\log.txt -eula=accept
    launches silent installation that prints output messages to the C:\log.txt file.

    License File Installation for Windows* OS

    If you have an evaluation license and decide to upgrade to a commercial license, you must complete the following steps after obtaining the commercial serial number:

    1. Replace your evaluation license file (.lic file) with the commercial license file you received in the license file directory (the default license directory is "C:\Program Files(x86)\Common Files\Intel\Licenses").
    2. Register the new serial number at https://registrationcenter.intel.com.
    3. Re-installation of Intel® DAAL is not required.

    Uninstalling Intel® DAAL for Windows* OS

    To uninstall Intel® DAAL, select Add or Remove Programs from the Control Panel and locate the version of Intel® DAAL you wish to uninstall.

    Note: Uninstalling Intel® DAAL does not delete the corresponding license file.

    Installing Intel® DAAL on Linux* OS

    You can install multiple versions of Intel® DAAL and any combination of 32-bit and 64-bit variations of the library on your development system.

    These instructions assume you to have an Internet connection. The installation program will automatically download a license key to your system. If you do not have an Internet connection, see the manual installation instructions below.

    Interactive installation on Linux* OS

    1. If you received the product as a downloadable archive, first unpack the Intel® DAAL package
      tar -zxvf name_of_downloaded_file
    2. Change the directory (cd) to the folder containing unpacked files.
    3. Run the installation script and follow the instructions in the dialog screens that are presented:
      > ./install.sh
    4. The install script checks your system and displays any optional and critical prerequisites necessary for a successful install. You should resolve all critical issues before continuing the installation. Optional issues can be skipped, but it is strongly recommended that you fix all issues before continuing with the installation.

    GUI installation on Linux* OS

    If on a Linux* system with GUI support, the installation will provide a GUI-based installation. If a GUI is not supported (for example if running from an ssh terminal), a command-line installation will be provided.

    To install Intel® DAAL for Linux* OS  in GUI mode, run shell script (install_GUI.sh).
    If a GUI is not supported (for example, if running from an ssh terminal), a command-line installation will be provided.

    Silent Installation on Linux* OS

    To run the silent install, follow these steps:

    1.  If you received the product as a downloadable archive, first unpack the Intel® DAAL package
      >tar -zxvf name_of_downloaded_file
    2. Change the directory (cd) to the folder containing unpacked files.
    3. Edit the configuration file silent.cfg following the instructions in it:
      1.  Accept End User License Agreement by specifying ACCEPT_EULA=accept instead of default "decline" value;
      2. Specify activation option for the installation.
        • Default option is to use existing license (ACTIVATION_TYPE=exist_lic), please make sure that a working product license file is in place before beginning. The file should be world-readable and located in a standard Intel license file directory, such as /opt/intel/licenses or ~/licenses.
        • To use another way of activation, change the value of ACTIVATION_TYPE variable. You may also need to change the value of ACTIVATION_SERIAL_NUMBER and ACTIVATION_LICENSE_FILE variable for specific activation options.
    4. Run the silent install:
      >./install.sh --silent ./silent.cfg

    Tip: You can run install interactively and record all the options into custom configuration file using the following command.
    >./install.sh  --duplicate "./my_silent_config.cfg"
    After this you can install the package on other machines with the same installation options using
    >./install.sh --silent "./my_silent_config.cfg"

    License File Installation for Linux* OS

    If you have an evaluation license and decide to upgrade to a commercial license, you must complete the following steps after obtaining the commercial serial number:

    1. Replace your evaluation license file (.lic file) with the commercial license file you received in the license file directory (the default license directory is /opt/intel/licenses).
    2. Register the new serial number at https://registrationcenter.intel.com.
    3. Re-installation of Intel® DAAL is not required.

    Online Installation on Linux* OS

    The default electronic installation package for Intel® DAAL for Linux consists of a smaller installation package that dynamically downloads and then installs packages selected to be installed. This requires a working internet connection and potentially a proxy setting if you are behind an internet proxy. Full packages are provided alongside where you download this online install package if a working internet connection is not available.

    Offline Installation on Linux* OS

    If the system where Intel® DAAL will be installed disconnected from internet, product may be installed in offline mode.
    To install product offline user must provide to installer full path to license file.

    License file (.lic file) is included as an attachment to email which sends after purchasing and registration product on IRC. User may request to resend .lic file from IRC. To achieve this go to "My Intel Products" page, select needed update for Intel® DAAL from "Download Latest Update" column. When page with information about selected product update will be opened, click on "Manage" reference in "Licenses" column. When "Manage License" page will be opened, press button "Resend license file to my email".

    1. If product installs in GUI mode: on "Activation options" dialog select "Choose alternative activation" radio button, press "Next" button. On following dialog select "Activate offline" radio button, press "Next" button. On next dialog type full path to license file and press "Next" button.
    2. If product installs in interactive mode: on step 3 "Activation step" select point 4 - "I want to activate by using a license file, or by using Intel(R) Software". On next step choose point 1 - "Activate offline [default]" and type full path to license file.
    3. If product installs in silent mode: in the file silent.cfg set value: license_file for variable: ACTIVATION_TYPE, set full path to license file to variable: ACTIVATION_LICENSE_FILE

    Uninstalling Intel® DAAL for Linux* OS

    If you installed as root, you will need to log in as root.

    To uninstall Intel® DAAL run the uninstall script: <DAAL-install-dir>/uninstall.sh.

    Alternatively, you may use GUI mode for uninstall Intel® DAAL for Linux* OS. First, run shell script install_GUI.sh, then select Remove option from menu and press "next" button.

    If you installed in the default directory, use:
    > /opt/intel/compilers_and_libraries_2017/linux/daal

    Uninstalling Intel® DAAL will not delete your license file(s).

    Installing Intel® DAAL on macOS*

    There are several different product suites available, for example, Intel® Data Analytics Acceleration Library for macOS*, Intel® Parallel Studio XE Composer Edition for C++ macOS*, each including Intel DAAL as one of components. Please read the download web page carefully to determine which product is appropriate for you.

    If you will be using Xcode*, please make sure that a supported version of Xcode is installed. If you install a new version of Xcode in the future, you must reinstall the Intel DAAL afterwards.

    The installation of the product requires a valid license file or serial number. If you are evaluating the product, you can also choose the “Evaluate this product (no serial number required)” option during installation.

    These instructions assume you to have an Internet connection. The installation program will automatically download a license key to your system. If you do not have an Internet connection, see the manual installation instructions below.

    Interactive installation on macOS*

    1. If you received the Intel DAAL product as a download, double-click on the downloaded file to begin the installation.
    2. You will be asked to select installation mode. The option Install as root is recommended. Click Next and enter the password. The install wizard will proceed automatically.
    3. If you agree with the End User License Agreement, check the radio button of I accept the terms of the license agreement, and click Next
    4. License Activation Options:
      • Use serial number

        If you do have an Internet connection, skip this step and proceed to the next numbered step (below).

      • Evaluate this product (no serial number required or if you want to activate at a later time).

      • Alternative Activation

        If you do not have an Internet connection, choose Alternative Activation and click Next; there will be two options to choose from:

        • Activate Offline: requires a License File.
        • Use Intel® Software License manager: floating License activation

          Intel® Software License manager
    5. Enter your serial number to activate and install the product.
    6. Activation completed. Click Next to continue.
    7. The Installation Summary dialog box opens to show the summary of your installation options (chosen components, destination folder, etc.). Click Install to start installation (proceed to step 10) or click Customize installation to change settings. If you select "Customize", follow steps 8-10.Installation summary

    8. In the Choose a Destination Folder dialog box, choose the installation directory. By default, it is /opt/intel. But you may choose a different directory. All files are installed into the Intel Parallel Studio XE 2017 subdirectory (by default/opt/intel/compilers_and_libraries_2017/mac/daal).
    9. If you install DAAL from a Parallel Studio XE product, the package contains components for integration into Xcode *. You are able to select the integration to Xcode* on the Choose Integration target dialog box.
    10. The Installation Summary dialog box opens to show the summary of your installation options (chosen components, destination folder, etc.). Click Install to start installation.
    11. Click Finish in the final screen to exit the Intel Software Setup Assistant.

    Silent installation on macOS*

    Silent installation enables you to install Intel DAAL on a single macOS* machine in a batch mode without input prompts. Use this option if you need to install on multiple similarly configured machines, such as cluster nodes. For information on automated or “silent” install capability, please seehttp://intel.ly/1gcW0Bl

    Support of Non-Interactive Custom Installation

    Intel DAAL can save user install choices during an ‘interactive’ install in a configuration file that can then be used for silent installs. This configuration file is created when the following option is used from the command line install:

    • export INTEL_SWTOOLS_DUPLICATE_MODE=config_file_name: it specifies the configuration file name. If the full path is specified, the INTEL_SWTOOLS_DOWNLOAD_DIR environment variable is ignored and the installable package is created in the directory with the configuration file.
    • export INTEL_SWTOOLS_DOWNLOAD_DIR=dir_name: optional, it specifies where the configuration file will be created. If this option is omitted, the installation package and the configuration file will be created in the default download directory: /tmp/intel/downloads/<package_id>

    License File Installation for macOS*

    If you have an evaluation license and decide to upgrade to a commercial license, you must complete the following steps after obtaining the commercial serial number:

    1. Replace your evaluation license file (.lic file) with the commercial license file you received in the license file directory (the default license directory is /opt/intel/licenses).
    2. Register the new serial number at https://registrationcenter.intel.com.
    3. Re-installation of Intel® DAAL is not required.

    Uninstalling Intel® DAAL for macOS*

    It is not possible to remove the compiler while leaving any of the performance library components installed.

    1. Open the file 
      <install_dir>/parallel_studio_xe_2019.<n>.<pkg>/uninstall.app
    2. Follow the prompts

    If you are not currently logged in as root you will be asked for the root password.

    Uninstalling Intel® DAAL will not delete your license file(s).

    Legal Information

    Intel, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

    *Other names and brands may be claimed as the property of others.

    Java is a registered trademark of Oracle and/or its affiliates.

    © Copyright 2018, Intel Corporation

    Intel® Data Analytics Acceleration Library 2019 Beta System Requirements

    $
    0
    0

    Please see the following links to the online resources and documents for the latest information regarding Intel® DAAL:

    ·         Intel® DAAL Product Page

    ·         Intel® DAAL 2019 Beta Release Notes

    ·         Intel® DAAL 2019 Beta Installation Guide

    System Requirements

    The Intel® DAAL supports the IA-32 and Intel® 64 architectures. For a complete explanation of these architecture names please read the following article:
    Intel Architecture Platform Terminology for Development Tools.

    The lists below pertain only to the system requirements necessary to support application development with Intel® DAAL. Please review your compiler (gcc*, Microsoft Visual Studio* or Intel® C++ Compiler ) hardware and software system requirements, in the documentation provided with that product to determine the minimum development system requirements necessary to support your compiler product.

    Supported Operating Systems

    • Windows 10* 
    • Windows 8*
    • Windows 8.1* 
    • Windows 7* - Note: SP1 is required for use of Intel® AVX instructions
    • Windows Server* 2012 
    • Windows Server* 2016
    • Red Hat* Enterprise Linux* 6 
    • Red Hat* Enterprise Linux* 7 
    • Red Hat Fedora* 25
    • Red Hat Fedora* 26
    • SUSE Linux Enterprise Server* 12 SP1
    • SUSE Linux Enterprise Server* 12 SP2
    • Debian* GNU/Linux 8 
    • Ubuntu* 16.04 
    • Ubuntu* 16.10 
    • Ubuntu* 17.04 
    • macOS* 10.12
    • macOS* 10.13

    Note: Intel® DAAL is expected to work on many more Linux distributions as well. Let us know if you have trouble with the distribution you use.

    Supported C/C++* compilers for Windows* OS:

    • Intel® C++ Compiler 17.0 for Windows* OS
    • Intel® C++ Compiler 18.0 for Windows* OS
    • Intel® C++ Compiler 19.0 Beta for Windows* OS
    • Microsoft Visual Studio* 2013 - help file and environment integration
    • Microsoft Visual Studio* 2015 - help file and environment integration
    • Microsoft Visual Studio* 2017 - help file and environment integration

    Supported C/C++* compilers for Linux* OS:

    • Intel® C++ Compiler 16.0 for Linux* OS
    • Intel® C++ Compiler 17.0 for Linux* OS
    • Intel® C++ Compiler 18.0 for Linux* OS
    • Intel® C++ Compiler 19.0 Beta for Linux* OS
    • GNU Compiler Collection 5.0 and later

    Supported C/C++* compilers for macOS*:

    • Intel® C++ Compiler 16.0 for macOS*
    • Intel® C++ Compiler 17.0 for macOS*
    • Intel® C++ Compiler 18.0 for macOS*
    • Intel® C++ Compiler 19.0 Beta for macOS*
    • Xcode* 8
    • Xcode* 9

    Supported Java* compilers:

    • Java* SE 7 from Sun Microsystems, Inc.
    • Java* SE 8 from Sun Microsystems, Inc.
    • Java* SE 9 from Sun Microsystems, Inc.

    Supported Python versions:

    • Intel® Distribution for Python 3.5 (64-bit) for Windows* OS
    • Intel® Distribution for Python 3.6 (64-bit) for Windows* OS
    • Intel® Distribution for Python 2.7 (64-bit) for Linux* OS
    • Intel® Distribution for Python 3.5 (64-bit) for Linux* OS
    • Intel® Distribution for Python 3.6 (64-bit) for Linux* OS
    • Intel® Distribution for Python 2.7 (64-bit) for macOS*
    • Intel® Distribution for Python 3.5 (64-bit) for macOS*
    • Intel® Distribution for Python 3.6 (64-bit) for macOS*
    • Python* 2.7 (64-bit) for Linux* OS
    • Python* 3.5 (64-bit) for Linux* OS
    • Python* 3.6 (64-bit) for Linux* OS
    • Python* 2.7 (64-bit) for macOS*
    • Python* 3.5 (64-bit) for macOS*
    • Python* 3.6 (64-bit) for macOS*

    MPI implementations that Intel® DAAL for Windows* OS has been validated against:

    MPI implementations that Intel® DAAL for Linux* OS has been validated against:

    Database

    • MySQL 5.x
    • KDB+ 3.4

    Hadoop* implementations that Intel® DAAL has been validated against:

    • Hadoop* 2.7

    Note: Intel® DAAL is expected to work on many more Hadoop* distributions as well. Let us know if you have trouble with the distribution you use.

    Spark* implementations that Intel® DAAL has been validated against:

    • Spark* 2.0

    Note: Intel® DAAL is expected to work on many more Spark* distributions as well. Let us know if you have trouble with the distribution you use.

    Intel® Parallel Computing Center at Purdue University

    $
    0
    0

    Purdue University logo

    Principal Investigator

    Alex Pothen portraitAlex Pothen is a professor of computer science at Purdue University. He led the founding of the Combinatorial Scientific Computing (CSC) research community, which now holds biennial conferences organized through the Society for Industrial and Applied Mathematics (SIAM). He directed the CSCAPES Institute, a pioneering multi-institutional research center for developing parallel graph algorithms on leadership-class supercomputers. He is currently involved in the Exascale Computing Project of the U.S. Department of Energy. He is an editor of the Journal of the ACM, and was editor of SIAM Review and SIAM Books. He has mentored more than twenty PhD students and postdoctoral scholars. He is a Fellow of SIAM.

    Description

    We consider a paradigm for designing parallel algorithms for computing significant subgraphs of graphs through approximation algorithms. Algorithms for solving these problems exactly are impractical for massive graphs, and possess little concurrency.

    We explore this paradigm by considering a matching problem and an edge cover problem. Given natural number b(v) for every vertex v in a graph, a maximum weight b-matching is a set of edges M such that at most b(v) edges in M have v as an endpoint, and subject to this restriction, the sum of weights of the edges in M is maximum. Similarly a minimum weight edge cover in the graph is a set of edges C such that at least b(v) edges in C have v as an endpoint.

    Algorithms for solving these problems exactly require only polynomial time, but they are still impractical on massive graphs. Approximation algorithms can deliver solutions that are guaranteed to be within than a constant factor of the optimal solution, and can do so in time nearly linear in the size of the graph. However, these algorithms do not have much parallelism, and hence we explore the design of new approximation algorithms with high levels of concurrency.

    For b-matching, we have designed a b-Suitor algorithm that is based on vertices making proposals to match to their neighbors. This is related to a Suitor algorithm for computing 1-matchings designed by Fredrik Manne and Mahantesh Halappanavar. This algorithm is also related to the classical Gale-Shapley algorithm for the Stable Matching problem. Our implementations show that the b-Suitor algorithm is currently the fastest algorithm on serial, shared memory and distributed memory computers.

    The b-edge cover problem has a Greedy approximation algorithm with an approximation ratio of 3/2; however, here the effective weight of edges need to be dynamically updated, which limits the concurrency severely. We have reduced this problem to one of computing a b’-matching for a suitable value of b’(v), and this avoids the dynamic weight update problem. However, this is accomplished with a worse approximation ratio of 2. The minimum weight b-edge cover problem is also rich in the space of approximation algorithms, and we consider nine such algorithms in this project. We are implementing the new approximation algorithms on a multicore shared-memory and distributed memory multiprocessors.

    The b-edge cover problem and b-matching problem have applications in graph-based semi-supervised machine learning ad well as adaptive data anonymization problems. We are exploring the effectiveness of these algorithms and comparing them to earlier approaches.

    Publications:

    Arif Khan, Alex Pothen and S M Ferdous, 2018, Designing Parallel Algorithms via Approximation: b-Edge Cover, Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), 12 pp.

    S M Ferdous, Alex Pothen and Arif Khan, 2018, New Approximation Algorithms for Minimum Weighted Edge Cover, Proceedings of SIAM Workshop on Combinatorial Scientific Computing, 12 pp.

    Ariful Azad, Aydin Buluc and Alex Pothen, Jan. 2017, Computing maximum cardinality matchings in parallel on bipartite graphs via tree-grafting, IEEE Transactions on Parallel and Distributed Systems, 28(1) 44-59, (doi 10.1109/TPDS.2016.2546258)

    Arif Khan, Alex Pothen, Mostofa Ali Patwary, Mahantesh Halappanavar, Nadathur Satish, and Narayanan Sundaram, Nov. 2016, Designing Scalable b-Matching Algorithms on Distributed Memory Multiprocessors via Approximation, Proceedings of ACM/IEEE Supercomputing Conference (SC16), pp. 773-783. (doi 10.1109/SC.2016.65)

    Arif Khan, Alex Pothen, Mostofa Patwary, Nadathur Satish, Narayanan Sunderam, Fredrik Manne, Mahantesh Halappanavar and Pradeep Dubey, 2016, Efficient approximation algorithms for weighted b-Matching, SIAM Journal on Scientific Computing, 38(5), S593-S619. (doi 10.1137/15M1026304)

    Related Websites:

    https://www.cs.purdue.edu/homes/apothen/software.html

    Training an Agent to Play Pong* Using neon™ Framework

    $
    0
    0

    Abstract

    The purpose of this article is to showcase the implementation of an agent to play the game Pong* using an Intel® architecture-optimized neon™ framework, and to serve as an introduction to the Policy Gradients algorithm.

    Introduction

    You may have noticed recent speculation regarding AlphaGo Zero*, the latest evolution of AlphaGo*, the first computer program to defeat a world champion at the ancient Chinese game Go. AlphaGo Zero is arguably the strongest Go player in history. It is able to attain that status by using a novel form of Reinforcement Learning and search algorithms. (AlphaGo used Policy Gradients combined with Monte Carlo Tree Search.) While games like Go are an interesting platform to test your strategies on, Atari* games have been a widely accepted standard benchmark for quite a while because of their simplicity. Humans can conceptually beat Atari games very easily.

    In this article, we'll implement a Gradient Policy-based algorithm to train an agent to play Pong* and explore the Autodiff interface.

    Why neon™ framework?

    The neon framework is impressive and it is optimized for all hardware. I did not run benchmarks tests specifically, but a report published by Nervana Systems proved the neon framework to be quite promising. I was curious to try this framework and was amazed at its ease of use.

    Experiment Setup

    Pong gameEnvironment

    • Ubuntu* 16.04 LTS 64-bit.
    • Python* 2.7

    Dependencies

    • Numpy 1.12.1
    • Gym
    • neon framework version 2.2

    Network Topology and Model Training

    Policy Gradients

    Policy Gradient methods are Reinforcement Learning techniques that rely on optimizing parameterized policies with respect to the expected return (long-term cumulative reward) by gradient descent. At any current time step k, taking into account possible stochasticity in the model, we denote the state corresponding to the current action using the probability distribution xk+1∼ p(xk+1| xk, uk) where uk is the current action, and xk, xk+1∈ Rn denote the current and the next state, respectively. The actions uk are sampled from a probability distribution to incorporate exploratory actions uk ~ ΠΘ(uk|xk). At each instant of time, the learning system receives a reward denoted by rk = r(xk, uk) ∈ R.

    Our main goal in Reinforcement Learning is to optimize policy parameters Θk∈ Rk to maximize the expected return given by

    J(Θ) = E{Σk=0H akrk}

    where ak denote time-step dependent weight factors, often set to ak = ϒk where ϒ ∈ [0, 1] for discounted reinforcement learning. A higher discount factor enables the model to learn more long-term policies while a smaller discount factor means the model will look for more immediate rewards. The gradient update rule for policy parameterization is given by:

    Θh+1 = Θh + αhΘJ|Θ = Θh

    Where α ∈ R+ denotes a learning rate and h ∈ {0,1...} the current update number.

    The main problem in a Policy Gradient algorithm is to find a good estimator for ∇ΘJ|Θ=Θh. It gives rise to deterministic Policy Gradients and Stochastic Policy Gradients, which this article won't discuss in much detail. To learn more, read an interesting, recently published paper here.

    With this quick overview of Policy Gradient methods, we are ready to look at some code. While I may have omitted several methods and an in-depth analysis, this roughly represents the Policy Gradient algorithm.

    Goal: beat the game.
    State: raw pixels of the game.
    Actions: move up, move down.
    Rewards: +1 if the agent wins the game, -1 for losing the game.

    game diagram

    Code Walkthrough

    We'll start by importing all the necessary dependencies:

    import numpy as np
    import gym
    
    from neon.backends import gen_backend
    from neon.backends import Autodiff
    import random
    import os

    Next, we set up the backend and define the class containing our network. Here gamma (Discount factor) is set to be 0.99. We set its value closer to 1 to prioritize rewards in the distant future. The higher the gamma, the more the algorithm looks into long-time rewards. The network consists of two layers containing 'W1' and 'W2' initialized randomly with its gradients initialized to zero.

    class Network:
        def __init__(self, D=80*80, H = 200, gamma = 0.99, restore_model = False):
            """
            D: No. of Image pixels
            H: No. of hidden units in first layer of Neural Network
            gamma: discount factor
            """
            self.gamma = gamma
            self.ll = {}
            self.learning_rate = 0.00001
            
            if restore_model and os.path.exists('model_weights.npy'):
                self.ll['W1'] = np.load('model_weights.npy').item()['W1']
                self.ll['W2'] = np.load('model_weights.npy').item()['W2']
            else:
                self.ll['W1'] = be.array(np.random.randn(H,D) / np.sqrt(D)) #random initialization of weight parameters followed by scaling
                self.ll['W2'] = be.array(np.random.randn(H,1) / np.sqrt(H))
            self.dW1 = be.array(np.zeros((H,D))) #random initialization of gradients
            self.dW2 = be.array(np.zeros((H,1)))

    The forward propagation step generates a policy given a visual representation of the environment we're working on. A larger number of hidden units will enable the network to learn more states.

    def policy_forward(self, x):
            # map visual input to the first hidden layer of a neural network
            
            h = be.dot(self.ll['W1'], be.array(x))
            h = be.sig(h)
            dlogp = be.dot(h.transpose(), self.ll['W2'])
            
            p = be.sig(dlogp)
            
            p_val = be.empty((1,1))         # Initialize an empty tensor of size 1X1
            h_val = be.empty((200,1))
            p_val[:] = p         # Set values of the tensor to p
            h_val[:] = h
            return p_val.get(), h_val.get(), p, h

    The back propagation function updates the policy parameters modulating the loss function values with discounted rewards. We will use the Autodiff interface to perform automatic differentiation and obtain gradients from an op-tree.

    An Op-tree is a graph representation of numerical operations. It is a tuple consisting of an op dictionary ( for e.g. {‘shape’: (2, 2), ‘op’: ‘add’} ) containing the operations, properties of the action and the shape of the output. The other nodes are the numeric nodes, containing tensors or constants.

    Automatic Differentiation exploits the fact that no matter how complex a function is, it executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.) and elementary functions (exp, log, sin, cos, etc.). By applying the chain rule repeatedly to these operations, derivatives of arbitrary order can be computed automatically and accurately to the working precision.

    def policy_backward(self, losses_op, episode_dlogps, episode_rewards):
            
            discounted_rewards = self.discount_rewards(episode_rewards)
            
            # to reduce the variance of the gradient estimator and avoid potential vanishing problems
            discounted_rewards -= np.mean(discounted_rewards)
            discounted_rewards /= np.std(discounted_rewards)
            
            episode_dlogps *= discounted_rewards        # Modulating gradients with discount factor 
            
            """
            Compute gradients using neon Backend
            """
            for i in range(len(losses_op)):
                ad = Autodiff(op_tree=losses_op[i]*be.array(episode_dlogps[i]), be = be, next_error=None)
                # compute gradients and assign them to self.dw1 and self.dw2
                ad.back_prop_grad([self.ll['W2'], self.ll['W1']], [self.dW2, self.dW1])
                # weights update:
                self.ll['W2'][:] = self.ll['W2'].get() -self.learning_rate *self.dW2.get()/len(losses_op)
                self.ll['W1'][:] = self.ll['W1'].get() -self.learning_rate *self.dW1.get()/len(losses_op)
            return

    We assign reward > 0, if agent won the game, reward < 0, if agent missed the ball and hence lost the game, reward = 0, if game is in progress. The agent receives rewards generated by the game and implements discounted rewards backwards with exponential moving average. More weight is given to earlier rewards and reset to zero when the game ends. During preprocessing, we take the raw pixels as input and process them before feeding into the network. We start by down sampling a cropped frame by a factor of 2. We then set all the boundary pixels to 0 except the paddles and the ball (which are set to 1). The sample action function introduces stochasticity into our optimization objective. Action 2 corresponds to our agent going up while Action 3 corresponds to a downward action.

        def discount_rewards(self, r):
            discounted_r = np.zeros_like(r)
            running_add = 0
            for t in reversed(range(0, r.size)):
                # if reward at index t is nonzero, then there is a positive/negative reward. This also marks a game boundary
                # for the sequence of game_actions produced by the agent
                if r[t] != 0.0: running_add = 0.0 
                # moving average given discount factor gamma, it assigns more weight to recent game actions
                running_add = running_add * self.gamma + r[t]
                discounted_r[t] = running_add
            return discounted_r
        
        # Preprocess a single frame before feeding it to the model
        def prepro(self, I):
            """
            Dimensions of the Image 210x160x3
            We'll downsample the image into a 6400 (80x80) 1D float vector
            """
            I = I[35:195]         # crop
            I = I[::2, ::2, 0]     # down sample by a factor of 2
            I[I == 144] = 0     # erase background type 1
            I[I == 109] = 0     # erase background type 2
            I[I!=0] = 1            # Everything else (paddles, ball) equals to 1
            return I.astype(np.float).ravel()    # Flattens
        
        # Stochastic process to choose an action ( moving up ) proportional to its predicted probability
        # Probability of choosing the opposite action is (1 - probability_up)
        # action == 2, moving up
        # action == 3, moving down
        def sample_action(self, up_probability):
            stochastic_value = np.random.uniform()
            action = 2 if stochastic_value < up_probability else 3
            return action

    We then move to initialization of variables. At each time step, the agent chooses an action, and the environment returns an observation and a reward.

    render = False                      # to visualize agent 
    restore_model = True        # to load a trained model when available
    
    random.seed(2017)
    
    D = 80 * 80                 # number of pixels in input
    H = 200                       # number of hidden layer neurons
    # Game environment
    env = gym.make("Pong-v0")
    network = Network(D=D, H=H, restore_model=restore_model)
    
    # Each time step, the agent chooses an action, and the environment returns an observation and a reward.
    # The process gets started by calling reset, which returns an initial observation
    observation = env.reset()
    prev_x = None
    
    # hidden state, gradient ops, gradient values, rewards
    hs, losses_op, dlogps, rewards = [],[],[], []
    running_reward = None       # current reward
    reward_sum = 0.0                 # sum rewards
    episode_number = 0
    
    game_actions = []
    game_rewards = []
    game_gradients = []

    Training

    Our objective is to train an agent to win Pong against its opponent. Rewards: +1 for winning, -1 for losing. Actions available: Up/Down. We don’t have the correct labels yi so as a “fake label” we substitute the action we happened to sample from the policy when it saw xi. An optimal set of actions will maximize the rewards received throughout the game.

    cur_x = network.prepro(observation)
        x = cur_x - prev_x if prev_x is not None else np.zeros(D)
        prev_x = cur_x
    
        up_probability, h_value, p, h = network.policy_forward(x)
        action = network.sample_action(up_probability)                              
    
        # assign a fake label, this decreases uncertainty and
        # this is one of the beauties of Reinforcement Learning
        y_fake = 1 if action == 2 else 0     
        
        # loss function gets closer to assigned label, the smaller difference
        # between probabilities the better
        # store gradients: derivative(log(p(x|theta)))       
        dlogp = np.abs(y_fake - up_probability)    
        # loss value
        dlogps.append(dlogp) 
        # loss op
        losses_op.append(be.absolute(y_fake - p))
        
        if render:
            env.render()
        
        #action: 
        #    0: no movement
        #    1: no movement
        #    2: up
        #    3: down
        #    4: up
        #    5: down
        observation, reward, done, info = env.step(action)
        
        # modifying rewards to favor longer games and thus to increase number of
        # positive rewards.
        reward = 0.0 if reward == 0.0 else reward
        reward = 1.0*len(game_rewards) if reward!=0.0 and len(game_rewards)>80 else reward
        reward = -1.0*len(game_rewards) if reward!=0.0 and len(game_rewards)<=50 else reward
    
        rewards.append(reward)
        reward_sum += reward
        
        game_actions.append(action)
        game_rewards.append(reward)
        game_gradients.append(dlogp[0][0])
    
        # end of a game
        # Pong has either +1 or -1 as reward when game ends.
        if reward != 0:  
            message = "Episode %d: game finished." % (episode_number)
            if reward < 0:
                message += "\x1b[0;31;40m  (RL loses)\x1b[0m"
            elif reward > 0:
                message += "\x1b[0;32;40m  (RL wins)\x1b[0m"
            print(message)
            print('Game duration: %d steps | Sum rewards: %f | Sum errors: %f' %(len(game_actions), np.sum(game_rewards), np.sum(game_gradients)))
            print('------------------------------------')
            game_actions = []
            game_rewards = []
            game_gradients = []
            
        # to save model
        if (episode_number+1)%10==0:
            np.save('model_weights.npy', network.ll)
            
        # end of an episode (minibatch of games)
        if done:
            episode_number +=1
            dlogps = np.vstack(dlogps)
            rewards = np.vstack(rewards)
            
            network.policy_backward(losses_op, dlogps, rewards)
            mean_loss = np.sum([x * x for x in dlogps])
            running_reward = reward_sum if running_reward is None else running_reward * 0.99 + reward_sum * 0.01
            print('-----------------------------------------------')
            print('Episode %d has finished, time to backpropagate.' % (episode_number - 1))
            print('Total reward was %f Running_reward: %f Mean_loss: %f' % (reward_sum, running_reward, mean_loss))
            print('-----------------------------------------------')
    
            # reset game environment
            observation = env.reset()  
            reward_sum = 0
            prev_x = None        
            dlogps, rewards = [], []
            losses_op = []

    The entire code for the article can be found here. This article is a modification of the amazing works of Andrej Karpathy, which can be found here. I hope this has been helpful. If you have any feedback or questions, I would love to answer them.

    Viewing all 3384 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>