Inference Engine Developer Guide

Deployment Challenges

Deploying deep learning networks from the training environment to embedded platforms for inference is a complex task that introduces technical challenges, such as:

Several deep learning frameworks are widely used in the industry, such as Caffe*, TensorFlow*, MXNet*, among others
Training deep learning networks is typically performed in data centers or server farms and the inference often take place on embedded platforms that are optimized for performance and power consumption.
These platforms are typically limited from the software perspective:
- programming languages
- third party dependencies
- memory consumption
- supported operating systems
and the platforms are limited from the hardware perspective:
- different data types
- limited power envelope
Because of these limitations, it is usually not recommended, and sometimes not possible, to use original training framework for inference. As an alternative, use dedicated inference APIs that are optimized for specific hardware platforms.

For these reasons, ensuring the accuracy of the transforms networks can be a complex task.

Deployment Workflow

The Inference Engine deployment process assumes you used the Model Optimizer to convert your trained model to an Intermediate Representation. The scheme below illustrates the typical workflow for deploying a trained deep learning model.

Intel Computer Vision Basic Workflow

A summary of the steps for optimizing and deploying a trained model:

Configure the Model Optimizer for your framework.
Convert a trained model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and biases values.
Test the model in the Intermediate Representation format using the Inference Engine in the target environment via provided Inference Engine Validation application or the sample applications.
Integrate the Inference Engine in your application to deploy the model in the target environment.

Introduction to the Inference Engine

After you have used the Model Optimizer to create an Intermediate Representation, use the Inference Engine to infer input data.

The Inference Engine is a C++ library with a set of C++ classes to infer input data (images) and get a result. The C++ library provides an API to read the Intermediate Representation, set the input and output formats, and execute the model on devices.

NOTE:

This section talks about API information. For more information about APIs, see the offline documentation that was included in your package. To locate the current API:
1. Go to <INSTALL_DIR>/deployment_tools/documentation/ where <INSTALL_DIR> is the directory in which the Intel® CV SDK is installed.
2. Open index.html in an Internet browser.
3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
This document refers to APIs from previous releases as "legacy" API. It is best to stop using the legacy API since it will be removed in a future product release. To locate the legacy API:
1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the Intel® CV SDK is installed.
2. Open index.html in an Internet browser.
3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
Complete API documentation is also in the full offline package documentation.
1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the Intel® CV SDK is installed.
2. Open index.html in an Internet browser.
3. Select Open Data Structures from the menu at the top of the screen.

Modules in the Inference Engine Package

Your application must link to the core Inference Engine library and C++ header files in the include directory.

The library contains the classes for:

Linux: libinference_engine.so
Windows: inference_engine.dll

Using Plugins, Depending on the Target

Each supported target device has a plugin. The Heterogeneous plugin is also available for distributing a calculation workload across devices. Each plugin is a DLL/shared library. Make sure those libraries are in your computer's path or in the place you pointed to in the plugin loader. Make sure each plugin's related dependencies are in the:

Linux: LD_LIBRARY_PATH
Windows: PATH

On Linux, use the script bin/setupvars.sh to set the environment variables.

The table below shows the relationship between libraries and targets.

Target	Linux Library Name	Linux Dependency Libraries	Windows Library Name	Windows Dependency Libraries
CPU	`libMKLDNNPlugin.so`	`libmklml_tiny.so, libiomp5md.so`	`MKLDNNPlugin.dll`	`mklml_tiny.dll, libiomp5md.dll`
Intel® Integrated Graphics	`libclDNNPlugin.so`	`libclDNN64.so`	`clDNNPlugin.dll`	`clDNN64.dll`
FPGA	`libdliaPlugin.so`	`libdla.so`	Not supported	Not supported
Intel® Movidius™ Myriad™ 2 Vision Processing Unit (VPU)	`libmyriadPlugin.so`	No dependencies	Not supported	Not supported
Heterogeneous	`libHeteroPlugin.so`	Same as selected plugins	`HeteroPlugin.dll`	Same as selected plugins

When using the Heterogeneous plugin, use the literal strings in the Target column in the getPluginByDevice method. For more information, see the getPluginByDevice API.

Common Workflow for Using the Inference Engine API

Read the Intermediate Representation - Using the InferenceEngine::CNNNetReader class, read an Intermediate Representation file into a CNNNetwork class. This class represents the network in host memory.
Prepare inputs and outputs format - After loading the network, specify input and output precision, and the layout on the network. For these specification, use the CNNNetwork::getInputInfo() and CNNNetwork::getOutputInfo()
Select Plugin - Select the plugin on which to load your network. Create the plugin with the InferenceEngine::PluginDispatcher load helper class. Pass per device loading configurations specific to this device, and register extensions to this device.
Compile and Load - Use the plugin interface wrapper class InferenceEngine::InferencePlugin to call the LoadNetwork() API to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation.
Set input data - With the network loaded, you have an ExecutableNetwork object. Use this object to create an InferRequest in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy.
Execute - With the input and output memory now defined, choose your execution mode:
- Synchronously - Infer() method. Blocks until inference finishes.
- Asynchronously - StartAsync() method. Check status with the wait() method (0 timeout), wait, or specify a completion callback.
Get the output - After inference is completed, get the output memory or read the memory you provided earlier. Do this with the InferRequest GetBlob API.

For more information about integrating the Inference Engine in your your application, see How to integrate the Inference Engine in your application.

Using Inference Engine Samples

The Inference Engine sample applications are simple console applications that demonstrate how to use Intel's Deep Learning Inference Engine in your applications.

Samples in the Samples Directory

The following sample applications are available in the samples directory in the Inference Engine installation directory:

Sample	Description
CPU Extensions	Library with topology-specific layers, like `DetectionOutput` used in the SSD
Image Classification Sample	Inference of image classification networks like AlexNet and GoogLeNet (the sample supports only images as inputs)
Image Classification Sample, pipelined	Maximize performance via pipelined execution, the sample supports only images as inputs
Security Barrier Camera Sample	Vehicle Detection followed by the Vehicle Attributes
Object Detection for Faster R-CNN Sample	Inference of object detection networks like Faster R-CNN (the sample supports only images as inputs)
Image Segmentation Sample	Inference of image segmentation networks like FCN8 (the sample supports only images as inputs)
Object Detection for SSD Demonstration, Async API Performance Showcase	Demonstration application for SSD-based Object Detection networks, new Async API performance showcase, and simple OpenCV interoperability (supports video and camera inputs)
Object Detection for SSD Sample	Inference of object detection networks based on the SSD, this sample is simplified version that supports only images as inputs
Neural Style Transfer Sample	Style Transfer sample (the sample supports only images as inputs)
Hello Infer Request Classification Sample	Inference of image classification networks via Infer Request API (the sample supports only images as inputs)
Interactive Face Detection Sample	Face Detection coupled with Age-Gender and Head-Pose, supports video and camera inputs
Security Barrier Camera Example	Supports images/video and camera inputs
Validation Application	Infers a pack of images, resulting in total accuracy (only images as inputs)

Samples That Support Pre-Trained Models Shipped With the Product

You are provided several pre-trained models. The table below shows the correlation between models and samples/devices. The samples are available in <INSTALL_DIR>/deployment_tools/inference_engine/samples

Model	Sample Supported on the Model	CPU	Intel® Integrated Graphics	HETERO:FPGA,CPU	Intel® Movidius™ Myriad™ 2 VPU
face-detection-adas-0001	Interactive Face Detection Sample	x	x		x
age-gender-recognition-retail-0013	Interactive Face Detection Sample	x	x	x	x
head-pose-estimation-adas-0001	Interactive Face Detection Sample	x	x	x	x
vehicle-license-plate-detection-barrier-0007	Security Barrier Camera Sample	x	x	x	x
vehicle-attributes-recognition-barrier-0010	Security Barrier Camera Sample	x	x	x	x
license-plate-recognition-barrier-0001	Security Barrier Camera Sample	x	x	x	x
person-detection-retail-0001	Object Detection Sample	x	x	x
person-detection-retail-00012	Any sample that supports SSD-based models	x	x		x
face-detection-retail-0004	Any sample that supports SSD-based models	x	x	x	x
person-vehicle-bike-detection-crossroad-0066	Any sample that supports SSD-based models	x	x		x

Inferring Your Model with the Inference Engine Samples

Building the Sample Applications on Linux

Supported Linux build environment:

Ubuntu* 16.04 LTS 64-bit or CentOS* 7.4 64-bit
GCC* 5.4.0 (for Ubuntu* 16.04) or GCC* 4.8.5 (for CentOS* 7.4)
CMake* version 2.8 or higher.
OpenCV* 3.3 or later (required for some samples and demonstrations). Use the Intel® CV SDK installation download and instructions to complete this installation.

Follow these steps to prepare your Linux computer for the samples:

Go to the samples directory: <INSTALL_DIR>/deployment_tools/inference_engine/samples/
Create a directory. This example uses a directory named build
```
mkdir build
```
Go to the new directory:
```
cd build
```

Run CMake to generate the Make files with or without debug information:

Without debug information:

cmake -DCMAKE_BUILD_TYPE=Release <path_to_inference_engine_samples_directory>

With debug information:

cmake -DCMAKE_BUILD_TYPE=Debug <path_to_inference_engine_samples_directory>

Build the application:
```
make
```

The sample application binaries are in <INSTALL_DIR>/deployment_tools/inference_engine/samples/intel64/Release/

Building the Sample Applications on Windows*

Supported Windows build environment:

Microsoft Windows* 10
Microsoft Visual Studio* 2015
CMake* 2.8 or later
OpenCV* 3.3 or later. Use the Intel® CV SDK installation download and instructions to complete this installation.
Intel C++ Compiler 2017 Redistributable package for Windows

Follow these steps to prepare your Windows computer for the samples:

Go to the samples directory.
Double-click create_msvc_solution.bat
Open Microsoft Visual Studio* 2015
Build samples\build\Samples.sln

Set Your Environment Variables

Use these steps to make sure your application can find the Interface Engine libraries.

For Linux, execute the following command to set the environment variable:

source <INSTALL_DIR>/deployment_tools/inference_engine/bin/setupvars.sh

where <INSTALL_DIR> is the Intel CV SDK installation directory.

Running the Samples

Image Classification Sample

Description

The Image Classification sample application does inference using image classification networks, like AlexNet* and GoogLeNet*. The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

$ ./classification_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
classification_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>""<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet*
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to Intel® Integrated Graphics custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU, Intel® Integrated Graphics, or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -nt "<integer>"         
                            Number of top results (default 10)
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on an image using a trained AlexNet network on Intel® Processors:

$ ./classification_sample -i <path_to_image]/cat.bmp -m <path_to_model]/alexnet_fp32.xml

Output Description

By default the application outputs top-10 inference results. Add the -nt option to the previous command to modify the number of top output results. For example, to get the top-5 results on Intel® HD Graphics, use the command:

$ ./classification_sample -i <path_to_image]/cat.bmp -m <path_to_model]/alexnet_fp32.xml

Image Classification - Pipelined

Description

This sample demonstrates how to build and execute inference in pipelined mode on example of classifications networks.

The pipelined mode might increase the throughput of the pictures. The latency of one inference will be the same as for syncronous execution. The throughput is increased due to follow reasons:

Some plugins have heterogenity inside themselves. Transferring of data, execution on remote device, pre-processing and post-processing on the host
Using of explicit heterogenious plugin with execution of different parts of network on differnet devices

When two and more devices are involved in inference process of one picture, creation of several infer requests and starting of asynchronious inference allows to utilize devices the most efficient way. If two devices are involved in execution, the most optimal value for -nireq2.

To do this efficiently, the Classification Sample Async uses a round-robin algorithm for inference requests. It starts by the executing the current inference request and switches to waiting for the previous request results. After finishing the wait, the application switches inference requests and repeats the procedure.

Another required aspect for good throughput is the number of iterations. Only with a large number of iterations can you emulate the application work and see performance results.

The batch mode is an independent attribute on the pipelined mode. The pipelined mode works efficiently with any batch size.

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. Then the application creates several infer requests pointed in -nireq parameter and loads pictures for inference.

Then in the loop it starts inference for the current infer request and switch for waiting of another one. When results are ready, inference requests are swapped.

When inference is done, the application outputs data to the standard output stream.

Running the Application

Running the application with the -h option results in the message:

./classification_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
classification_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>""<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with Intel® MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to Intel® Integrated Graphics custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU, Intel® Integrated Graphics or MYRIAD is acceptable. Sample will look for a suitable plugin for device specified
    -nt "<integer>"         
                            Number of top results (default 10)
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on an image using a trained AlexNet network on FPGA with a fallback to Intel® Processors:

$ ./classification_sample_async -i <path_to_image]/cat.bmp -m <path_to_model]/alexnet_fp32.xml -nt 5 -d HETERO:FPGA,CPU -nireq 2 -ni 200

Output Description

By default the application outputs top-10 inference results for each infer request. In addition to this information it will provide throughput value measured in frames per seconds.

Security Barrier Camera Sample

Description

Showcases Vehicle Detection, followed by Vehicle Attributes and License Plate Recognition applied on top of Vehicle Detection. The results are in the intel_models directory:

vehicle-license-plate-detection-barrier-0007: The primary detection network to find the vehicles and licence-plate
vehicle-attributes-recognition-barrier-0010: Executed on top of the results from vehicle-license-plate-detection-barrier-0007. The vehicle attributes execution barrier reports the general vehicle attributes, like the vehicle type and color, where type is something like car, van, or bus.
license-plate-recognition-barrier-0001: Executed on top of the results from vehicle-license-plate-detection-barrier-0007. The license plate recognition barrier network reports a string for each recognized license plate. For topology details, see the descriptions in the intel_models

Other demonstration objectives:

Show images/video/camera as inputs, via OpenCV*
Show an example of simple network pipelining: Attributes and LPR networks are executed on top of the Vehicle Detection results
Show vehicle attributes and licence plate information for each detected vehicle

How it Works

The application reads command line parameters and loads the specified networks. The Vehicle/License-Plate Detection network is required, and the other two are optional.

Upon getting a frame from the OpenCV's VideoCapture the app performs inference of Vehicles/License-Plates, then performs another two inferences using Vehicle Attributes and LPR detection networks (if those specified in command line) and displays the results.

Running the Application

Running the application with the -h option results in the message:

$ ./security_barrier_sample -h 
InferenceEngine:
        API version ............ 1.0
    [ INFO ] Parsing input parameters
    interactive_vehicle_detection [OPTION]
    Options:
        -h                         Print a usage message.
        -i "<path>"                Required. Path to a video or image file. Default value is "cam" to work with camera.
        -m "<path>"                Required. Path to the Vehicle/License-Plate Detection model (.xml) file.
        -m_va "<path>"             Optional. Path to the Vehicle Attributes model (.xml) file.
        -m_lpr "<path>"            Optional. Path to the License-Plate Recognition model (.xml) file.
          -l "<absolute_path>"     For Intel® MKL-DNN (CPU)-targeted custom layers, if any. Absolute path to a shared library with the kernels impl.
              Or
          -c "<absolute_path>"     For Intel® Integrated Graphics-targeted custom kernels, if any. Absolute path to the xml file with the kernels desc.
        -d "<device>"              Specify the target device for Vehicle Detection (CPU, Intel® Integrated Graphics, FPGA, MYRYAD, or HETERO).
        -d_va "<device>"           Specify the target device for Vehicle Attributes (CPU, Intel® Integrated Graphics, FPGA, MYRYAD, or HETERO).
        -d_lpr "<device>"          Specify the target device for License Plate Recognition (CPU, Intel® Integrated Graphics, FPGA, MYRYAD, or HETERO).
        -pc                        Enables per-layer performance statistics.
        -r                         Output Inference results as raw values.
        -t                         Probability threshold for Vehicle/Licence-Plate detections.

Running the application with an empty list of options results in an error message and the usage list above.

Demonstration Output

The demonstration uses OpenCV* to display the resulting frame with detections rendered as bounding boxes and text:

Automobile driving

Object Detection for Faster R-CNN Sample

Description

VGG16-Faster-RCNN is a public CNN that can be easily obtained from GitHub.

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream.

Downloading and Converting a Caffe* Model

Download test.prototxt from https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt
Download the pretrained models from https://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0
Unzip the archive and make sure you have the file named VGG16_faster_rcnn_final.Caffe*model.

For correctly converting the source model, run the Model Optimizer with the extension for the Python proposal layer. To convert the source model:

python3 ${MO_ROOT_PATH}/mo_Caffe*.py --input_model <path_to_model]/VGG16_faster_rcnn_final.Caffe*model --input_proto <path_to_model]/deploy.prototxt --extensions <path_to_object_detection_sample]/fasterrcnn_extensions

Running the Application

Running the application with the -h option results in the message:

$ ./object_detection_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
object_detection_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path>"
                            Required. Path to an image file.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to Intel® Integrated Graphics custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU or Intel® Integrated Graphics is acceptable. The sample looks for a suitable plugin for the device specified
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Processors on an image using a trained Faster R-CNN network:

$ ./object_detection_sample -i <path_to_image>/inputImage.bmp -m <path_to_model>/faster-rcnn.xml -d CPU

Output Description

The application outputs an image named out_0.bmp with detected objects enclosed in rectangles. It outputs the list of classes of the detected objects along with the respective confidence values and the coordinates of the rectangles to the standard output stream.

Using this Sample with the Intel Person Detection Model

This model has a non-default (for Faster-RCNN) output layer name. To score it correctly, add the option --bbox_name detector/bbox/ave_pred to the command line.

Usage example:

./object_detection_sample -i /home/user/people.jpg -m /<ie_path]/intel_models/person-detection-retail-0001/FP32/person-detection-retail-0001.xml --bbox_name detector/bbox/ave_pred -d CPU

Object Detection SSD, Async API Performance Showcase Sample

Description

This demonstration showcases Object Detection with SSD and new Async API. Async API usage can improve overall frame-rate of the application, because rather than wait for inference to complete, the app can continue doing things on the host, while accelerator is busy. Specifically, this demonstration keeps two parallel infer requests and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall framerate is rather determined by the MAXIMUM(detection time, input capturing time) and not the SUM(detection time, input capturing time).

The technique can be generalized to any available parallel slack, such as doing inference while simultaneously encoding the resulting (previous) frames, or running further inference, like emotion detection on top of the face detection results.

Be aware of performance caveats though. When running tasks in parallel, avoid over-using shared compute resources. For example, if performing inference on the FPGA with a mostly idle CPU, perform parallel tasks on the CPU. When doing inference on Intel® Integrated Graphics, you have little gain in tasks like having resulting video encoding on the same Intel® Integrated Graphics in parallel because the device is already busy.

For more performance implications and tips for the Async API, see the Optimization Guide

Other demonstration objectives:

Video as input support via OpenCV*
Visualization of the resulting bounding boxes and text labels (from the .labels file) or class number (if no file is provided)
OpenCV* provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine samples helpers into your application.
Demonstrate the Async API in action. For this, the demonstration features two modes with a Tab key toggle.
- Old-style "Sync" way - The frame capturing with OpenCV* executes back-to-back with Detection
- "Truly Async" way - The Detection is performed on the current frame, while the OpenCV* captures the next frame.

How it Works

The application reads command line parameters and loads a network to the Inference Engine. Upon getting a frame from the OpenCV*'s VideoCapture it performs inference and displays the results.

New "Async API" operates with new notion of the "Infer Request" that encapsulates the inputs/outputs and separates scheduling and waiting for result, next section. And here what makes the performance look different:

In the default ("Sync") mode the frame is captured and then immediately processed, below in pseudo-code:

while(true) {
    capture frame
    populate CURRENT InferRequest
    start CURRENT InferRequest //this call is async and returns immediately
    wait for the CURRENT InferRequest
    display CURRENT result
}

This is a reference implementation in which the new Async API is used in a serialized/synch fashion.

In "true" ASync mode, the frame is captured and then immediately processed:

while(true) {
        capture frame
        populate NEXT InferRequest
        start NEXT InferRequest //this call is async and returns immediately
            wait for the CURRENT InferRequest (processed in a dedicated thread)
            display CURRENT result
        swap CURRENT and NEXT InferRequests
    }

In this case, the NEXT request is populated in the main (app) thread, while the CURRENT request is processed. This is handled in the dedicated thread, internal to the Inference Engine runtime.

Async API

In this release, the Inference Engine offers a new API based on the notion of Infer Requests. With this API, requests encapsulate input and output allocation. You access the blob with the GetBlob method.

You can execute a request asynchronously in the background and wait until you need the result. In the meantime your application can continue:

// load plugin for the device as usual
  auto enginePtr = PluginDispatcher({"../../../lib/intel64", ""}).getSuitablePlugin(
                getDeviceFromStr("GPU"));
// load network
CNNNetReader network_reader;
network_reader.ReadNetwork("Model.xml");
network_reader.ReadWeights("Model.bin");
// populate inputs etc
auto input = async_infer_request.GetBlob(input_name);
...
// start the async infer request (puts the request to the queue and immediately returns)
async_infer_request->StartAsync();
// Continue execution on the host until you need the request results
//...
async_infer_request.Wait(IInferRequest::WaitMode::RESULT_READY);
auto output = async_infer_request.GetBlob(output_name);

You have no direct way to measure execution time of the infer request that is running asynchronously, unless you measure the Wait executed immediately after the StartAsync. But this essentially would mean the serialization and synchronous execution.

This is what sample does for the default "SYNC" mode and reports as a Detection time/fps message on the screen. In the truly asynchronous ("ASYNC") mode the host continues execution in the master thread, in parallel to the infer request. If the request is completed before than the Wait is called in the main thread (i.e. earlier than OpenCV* decoded a new frame), that reporting the time between StartAsync and Wait would obviously incorrect. That is why in the "ASYNC" mode the inference speed is not reported.

For more information about the new, request-based Inference Engine API, including ASYNC execution, see the information about integrating a customer application new request API.

Running the Application

Running the application with the -h option results in the message:

$ ./object_detection_demo_ssd_async -h
InferenceEngine: 
    API version ............ [version]
    Build .................. 
object_detection_demo_ssd_async [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "[path]"
                            Required. Path to an video file. Use "cam" to capture input from the camera).
    -m "[path]"             
                            Required. Path to an .xml file with a trained model.
        -l "[absolute_path]"    
                            Optional. Absolute path to library with Intel® MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "[absolute_path]"
                            Optional. Absolute path to Intel® Integrated Graphics custom layers config (*.xml).
    -d "[device]"
                            Specify the target device to infer on; CPU, Intel® Integrated Graphics, FPGA, and Intel® Movidius™ Myriad™ 2 Vision Processing Unit are accepted.
    -pc
                            Enables per-layer performance report.
    -t
                            Probability threshold for detections (default is 0.5).
    -r
                            Output inference results as raw values to the console.

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on a Intel® Integrated Graphics with an example pre-trained GoogleNet based SSD* available at https://software.intel.com/file/609199/download

Command Description

After reading through this demonstration, use this command to perform inference on a Intel® Integrated Graphics with the SSD you download from https://software.intel.com/file/609199/download

$ ./object_detection_demo_ssd_async -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/ssd.xml -d GPU

The network must be converted from the Caffe* (*.prototxt + *.model) to the Inference Engine format (*.xml + *bin) before using this command. See the Model Optimizer Developer Guide.

The only GUI knob is using 'Tab' to switch between the synchronized execution and the true Async mode.

Output Description

The output uses OpenCV* to display the resulting frame with detections rendered as bounding boxes and labels, if provided. In default mode, the sample reports:

OpenCV* time: Frame decoding + time to render the bounding boxes, labels, and display of the results.
Detection time: Inference time for the objection network. This is reported in SYNC mode.
Wallclock time: The combined application-level performance.

Object Detection with SSD-VGG Sample

Description

How to run the Object Detection sample application, which does inference using object detection networks like SSD-VGG on Intel® Processors and Intel® HD Graphics.

Running the Application

Running the application with the -h option results in the message:

$./object_detection_sample_ssd -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
object_detection_sample_ssd [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path>"
                            Required. Path to an image file.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to Intel® Integrated Graphics custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU, Intel® Integrated Graphics or MYRIAD is acceptable. The sample looks for a suitable plugin for the specified device.
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

Use the following command to do inference on Intel® Processors on an image using a trained SSD network:

$ ./object_detection_sample_ssd -i <path_to_image>/inputImage.bmp -m <path_to_model>/VGG_ILSVRC2016_SSD.xml -d CPU

Output Description

Neural Style Transfer Sample

Description

How to build and run the Neural Style Transfer sample (NST sample) application, which does inference using models of style transfer topology.

Running the Application

Running the application with the -h option results in the message:

$ ./style_transfer_sample --h
InferenceEngine:
    API version ............ <version>
    Build .................. <number>
style_transfer_sample [OPTION]
Options:
    -h
                            Print a usage message.
    -i "<path1>""<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to Intel® Integrated Graphics custom layers config (*.xml).
    -pp "<path>"
                            Path to a plugin directory.
    -p "<name>"
                            Plugin name. For example Intel® MKL-DNN. If this parameter is pointed, the sample looks for this plugin only
    -d "<device>"
                            Specify the target device to infer on; CPU or Intel® Integrated Graphics is acceptable. The sample looks for a suitable plugin for the specified device.
    -nt "<integer>"
                            Number of top results (default 10)
    -ni "<integer>"
                            Number of iterations (default 1)
    -pc
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on an image using a trained model of NST network on Intel® Processors using the following command:

$ ./style_transfer_sample -i <path_to_image>/cat.bmp -m <path_to_model>/1_decoder_FP32.xml

Output Description

The application outputs one or more styled image, starting with named out1.bmp, which were redrawn in style of model which used for inference. Style of output images depend on models which use for sample.

Hello Infer Request Classification

Description

How to run the Hello Infer Classification sample application. The sample is simplified version of the Image Classification Sample. It's intended to demonstrate using of new Infer Request API of Inference Engine in applications. See Integrate with customer application New Request API for details.

Running the Application

To do inference on an image using a trained AlexNet network on Intel® Processors:

$ ./hello_request_classification <path_to_model>/alexnet_fp32.xml <path_to_image>/cat.bmp CPU

Output Description

The top-10 inference results

Interactive Face Detection

Description

Showcases the Object Detection task applied to face recognition using a sequence of neural networks. The Async API can improve the overall frame-rate of the application because the application can continue operating while the accelerator is busy. This demonstration maintains two parallel inferance requests for the Age Gender and Head Pose detection that are run simultaneously.

Other demonstration objectives:

Video as input support via OpenCV*.
Visualization of the resulting face bounding boxes from Face Detection network.
Visualization of age gender and head pose information for each detected face.
The OpenCV* provides resulting bounding boxes, labels, and other information. You can copy and paste this code without pulling Inference Engine sample helpers into your application.

How it Works

The application loads up to three networks, depending on the -d option.
The application gets a frame from the OpenCV's video capture
The application performs inference on the frame detection network
The application performs two simultaneous inferences, using the Age Gender and Head Pose detection networks, if these are specified in the command-line.
The application displays the results.

The new Async API operates with new notion of the Infer Request that encapsulates the inputs/outputs and separates scheduling and waiting for result. This operation changes the performance, as follows:

In the default mode (Sync mode), the frame is captured and immediately processed:

while(true) {
    capture frame
    populate FaceDetection InferRequest
    wait for the FaceDetection InferRequest
    populate AgeGender InferRequest using dyn batch technique
    populate HeadPose InferRequest using dyn batch technique
    wait AgeGender
    wait HeadPose
    display detection results
}

Running the Application

Running the application with the -h option results in the message:

$ ./interactive_face_detection -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
interactive_face_detection [OPTION]
Options:
    -h                         Print a usage message.
    -i "<path>"                Optional. Path to an video file. Default value is "cam" to work with camera.
    -m "<path>"                Required. Path to an .xml file with a trained face detection model.
    -m_ag "<path>"             Optional. Path to an .xml file with a trained age gender model.
    -m_hp "<path>"             Optional. Path to an .xml file with a trained head pose model.
      -l "<absolute_path>"     Required for Intel® MKL-DNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernels impl.
          Or
      -c "<absolute_path>"     Required for Intel® Integrated Graphics-targeted custom kernels.Absolute path to the xml file with the kernels desc.
    -d "<device>"              Specify the target device for Face Detection (CPU, Intel® Integrated Graphics, FPGA, or MYRYAD. The sample looks for a suitable plugin for the specified device.
    -d_ag "<device>"           Specify the target device for Age Gender Detection (CPU, Intel® Integrated Graphics, FPGA, or MYRYAD. The sample looks for a suitable plugin for the specified device.
    -d_hp "<device>"           Specify the target device for Head Pose Detection (CPU, Intel® Integrated Graphics, FPGA, or MYRYAD. The sample looks for a suitable plugin for the specified device.
    -pc                        Enables per-layer performance report.
    -r                         Inference results as raw values.
    -t                         Probability threshold for detections.

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on a Intel® Integrated Graphics with an example pre-trained GoogleNet based SSD* available at example pre-trained GoogLeNet-based SSD:

./object_detection_demo_ssd_async -i <path_to_video>/inputVideo.mp4 -m <path_to_model>/ssd.xml -d Intel® Integrated Graphics

Before using this, use the Model Optimizer to convert the network from the Caffe* (*.prototxt + *.model) to the Inference Engine format (*.xml + *bin)

Demonstration Output

The demonstration uses OpenCV* to display the resulting frame with detections that are rendered as bounding boxes. Labels are included if available. In default mode, the sample reports:

OpenCV* time: frame decoding + time to render the bounding boxes, labels, and displaying the results
Face detection time: inference time for the face Detection network
Age Gender + Head Pose time: combined inference time of simultaneously executed age gender and head pose networks

Image Segmentation Sample

Description

How to run the Image Segmentation sample application, which does inference using image segmentation networks like FCN8.

The sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. When inference is done, the application creates an output image.

Running the Applicaiton

Running the application with the -h option results in the message:

$ ./segmentation_sample -h
InferenceEngine: 
    API version ............ <version>
    Build .................. <number>
segmentation_sample [OPTION]
Options:
    -h                      
                            Print a usage message.
    -i "<path1>""<path3>"
                            Required. Path to a directory with images or path to an image files: a .ubyte file for LeNet
                            and a .bmp file for the other networks.
    -m "<path>"             
                            Required. Path to an .xml file with a trained model.
        -l "<absolute_path>"    
                            Optional. Absolute path to library with MKL-DNN (CPU) custom layers (*.so).
        Or
        -c "<absolute_path>"
                            Optional. Absolute path to Intel® Integrated Graphics custom layers config (*.xml).
    -pp "<path>"            
                            Path to a plugin directory.
    -d "<device>"           
                            Specify the target device to infer on; CPU or Intel® Integrated Graphics is acceptable. The sample looks for a suitable plugin for the specified device.
    -ni "<integer>"         
                            Number of iterations (default 1)
    -pc                     
                            Enables per-layer performance report

Running the application with an empty list of options results in an error message and the usage list above.

To do inference on Intel® Processors using an image from a trained FCN8 network:

$ ./segmentation_sample -i <path_to_image>/inputImage.bmp -m <path_to_model>/fcn8.xml

Output Description

The application outputs are a segmented image named out.bmp.

How to Integrate the Inference Engine in Your Application

This section talks about API information. For more information about APIs, see the offline documentation that was included in your package. To locate the current API:
1. Go to <INSTALL_DIR>/deployment_tools/documentation/ where <INSTALL_DIR> is the directory in which the Intel® CV SDK is installed.
2. Open index.html in an Internet browser.
3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
This document refers to APIs from previous releases as "legacy" API. It is best to stop using the legacy API since it will be removed in a future product release. To locate the legacy API:
1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the Intel® CV SDK is installed.
2. Open index.html in an Internet browser.
3. Select Integrating Inference Engine in Your Application (legacy API) from the contents.
Complete API documentation is also in the full offline package documentation.
1. Go to <INSTALL_DIR>/deployment_tools/documentation/ under the directory in which the Intel® CV SDK is installed.
2. Open index.html in an Internet browser.
3. Select Open Data Structures from the menu at the top of the screen.

Integration With the API

This section provides a high-level description of the process of integrating the Inference Engine into your application. See Using Inference Engine Samples for examples of using the Inference Engine in applications.

Using the Inference Engine API in Your Code

The core libinference_engine.so library implements loading and parsing a model Intermediate Representation, and triggers inference using a specified plugin. The core library has the following API:

InferenceEngine::IInferencePlugin - The main plugin interface. Every Inference Engine plugin implements this method. Use it through an InferenceEngine::InferenceEnginePluginPtr instance.
InferenceEngine::PluginDispatcher - This class allows find suitable plugin for specified device in given directories.
InferenceEngine::CNNNetReader
InferenceEngine::CNNNetwork
InferenceEngine::Blob, InferenceEngine::TBlob
InferenceEngine::BlobMap
InferenceEngine::InputInfo, InferenceEngine::InputsDataMap

The Integration Process

Load a plugin by creating an instance of InferenceEngine::InferenceEnginePluginPtr. Specify the plugin or let the Inference Engine choose it with InferenceEngine::PluginDispatcher. See the selectPlugin() function in the samples.
```
InferenceEngine::PluginDispatcher dispatcher(pluginDirs);
InferenceEngine::InferenceEnginePluginPtr enginePtr (dispatcher.getSuitablePlugin(TargetDevice::eCPU);
```
Create an Intermediate Representation reader by creating an instance of InferenceEngine::CNNNetReader and read a model Intermediate Representation:
```
auto netBuilder = new InferenceEngine::CNNNetReader();
netBuilder->ReadNetwork("Model.xml");
netBuilder->ReadWeights("Model.bin");
```

Request information about inputs (an image and any other input data required), using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getInputsInfo() methods. Set the input number format (precision) using InferenceEngine::InputInfo::setInputPrecision to match the input data format (precision). Allocate input blobs of the appropriate types and feed an image and the input data to the blobs:

/** Taking information about all topology inputs **/
InferenceEngine::InputsDataMap inputInfo(netBuilder.getNetwork().getInputsInfo());
/** Stores all input blobs data **/
InferenceEngine::BlobMap inputBlobs;
/** Iterating over all input blobs **/
for (auto & item : inputInfo) {
    /** Creating input blob **/
    item.second->setInputPrecision(Precision::U8);
    InferenceEngine::TBlob[unsigned char]::Ptr input;
    input = InferenceEngine::make_shared_blob[unsigned char, InferenceEngine::SizeVector](Precision::U8, item.second->getDims());
    input->allocate();
    inputBlobs[item.first] = input;
    /** Fill input tensor with planes. First b channel, then g and r channels **/
    ...
}

Request information about outputs, using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getOutputsInfo() methods. Allocate output blobs of the appropriate types:

InferenceEngine::OutputsDataMap outputInfo(netBuilder.getNetwork().getOutputsInfo());
InferenceEngine::BlobMap outputBlobs;
for (auto & item : outputInfo) {
    InferenceEngine::TBlob[float]::Ptr output;
    output = InferenceEngine::make_shared_blob(Precision::FP32, item.second->dims);
    output->allocate();
    outputBlobs[item.first] = output;
}

Load the model to the plugin using InferenceEngine::IInferencePlugin::LoadNetwork():

InferenceEngine::StatusCode status = enginePtr->LoadNetwork(netBuilder.getNetwork(), &resp);
if (status != InferenceEngine::OK) {
    throw std::logic_error(resp.msg);
}

Do inference by calling the InferenceEngine::IInferencePlugin::Infer method:
```
enginePtr->Infer(inputBlobs, outputBlobs, &resp);
```

Go over the output blobs and process the results.

/** Pointer to the output blob **/
const TBlob[float]::Ptr fOutput = std::dynamic_pointer_cast<tblob[float>>(outputBlobs.begin()->second);
/** fOutput->data()[] - accessing output blob data **/</tblob[float]>

Building Your Application

For details about building your application, see the CMake files for the sample applications. All samples reside in the samples directory in the Inference Engine installation directory.

Running the Application

Before running compiled binary files:

Make sure your application can find the Inference Engine libraries. On Linux* operating systems, the LD_LIBRARY_PATH environment variable specifies the library directories. Update LD_LIBRARY_PATH with paths to the directories in the Inference Engine installation directory in which the libraries reside.

Add a path the directory containing the core and plugin libraries:

For Inference Engine installed within the Intel® CV SDK package:

$ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH

For Intel's Deep Learning Deployment Toolkit installation:

$ export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH

Add paths to the directories containing the required third-party libraries:

For Inference Engine installed within the Intel® CV SDK package:

$ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/external/mklml_lnx/lib:$LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/external/cldnn/lib:$LD_LIBRARY_PATH

For Intel Deep Learning Deployment Toolkit installation:

$ export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/external/mklml_lnx/lib:$LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/external/cldnn/lib:$LD_LIBRARY_PATH

As an alternative, use the following scripts in the Inference Engine directory of the Intel® CV SDK and Deep Learning Deployment Toolkit installation directoriess, respectively:

opt/intel/computer_vision_sdk_<version>/bin/setupvars.sh

/opt/intel/deep_learning_sdk_<version>/deployment_tools/inference_engine/bin/setvars.sh

To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and <INSTALL_DIR>/bin/intel64/Release/*.dll files are placed to the application directory or accessible via PATH% environment variable.

Integration With the Legacy API

NOTE: The subject of this section is Legacy APIs. Legacy APIs are deprecated and will be removed in a future release. It is best to use the current APIs.

Using the Inference Engine API in Your Code

The core libinference_engine.so library implements loading and parsing a model Intermediate Representation, and triggers inference using a specified plugin. The core library has the following API:

InferenceEngine::IInferencePlugin - The main plugin interface. Every Inference Engine plugin implements this method. Use it through an InferenceEngine::InferenceEnginePluginPtr instance.
InferenceEngine::PluginDispatcher - This class finds the suitable plugin for a specified device in given directories.
InferenceEngine::CNNNetReader
InferenceEngine::CNNNetwork
InferenceEngine::Blob, InferenceEngine::TBlob
InferenceEngine::BlobMap
InferenceEngine::InputInfo, InferenceEngine::InputsDataMap

The Integration Process

Load a plugin by creating an instance of InferenceEngine::InferenceEnginePluginPtr.

Specify the plugin or let the Inference Engine choose it with InferenceEngine::PluginDispatcher. See the selectPlugin() function in the samples.

InferenceEngine::PluginDispatcher dispatcher(pluginDirs);
InferenceEngine::InferenceEnginePluginPtr enginePtr (dispatcher.getSuitablePlugin(TargetDevice::eCPU);

Create an Intermediate Representation reader by creating an instance of InferenceEngine::CNNNetReader and read a model Intermediate Representation:
```
auto netBuilder = new InferenceEngine::CNNNetReader();
netBuilder->ReadNetwork("Model.xml");
netBuilder->ReadWeights("Model.bin");
```
Request information about inputs (an image and any other input data required), using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getInputsInfo() methods.

Set the input number format (precision) using InferenceEngine::InputInfo::setInputPrecision to match the input data format (precision). Allocate input blobs of the appropriate types and feed an image and the input data to the blobs:

/** Taking information about all topology inputs **/
InferenceEngine::InputsDataMap inputInfo(netBuilder.getNetwork().getInputsInfo());
/** Stores all input blobs data **/
InferenceEngine::BlobMap inputBlobs;
/** Iterating over all input blobs **/
for (auto & item : inputInfo) {
    /** Creating input blob **/
    item.second->setInputPrecision(Precision::U8);
    InferenceEngine::TBlob[unsigned char]::Ptr input;
    input = InferenceEngine::make_shared_blob[unsigned char, InferenceEngine::SizeVector](Precision::U8, item.second->getDims());
    input->allocate();
    inputBlobs[item.first] = input;
    /** Fill input tensor with planes. First b channel, then g and r channels **/
    ...
}

Request information about outputs, using the InferenceEngine::CNNNetReader::getNetwork() and InferenceEngine::CNNNetwork::getOutputsInfo() methods. Allocate output blobs of the appropriate types:

InferenceEngine::OutputsDataMap outputInfo(netBuilder.getNetwork().getOutputsInfo());
InferenceEngine::BlobMap outputBlobs;
for (auto & item : outputInfo) {
    InferenceEngine::TBlob[float]::Ptr output;
    output = InferenceEngine::make_shared_blob[float, InferenceEngine::SizeVector](Precision::FP32, item.second->dims);
    output->allocate();
    outputBlobs[item.first] = output;
}

Load the model to the plugin using InferenceEngine::IInferencePlugin::LoadNetwork():

InferenceEngine::StatusCode status = enginePtr->LoadNetwork(netBuilder.getNetwork(), &resp);
if (status != InferenceEngine::OK) {
    throw std::logic_error(resp.msg);
}

Do inference by calling the InferenceEngine::IInferencePlugin::Infer method:
```
enginePtr->Infer(inputBlobs, outputBlobs, &resp);
```

Go over the output blobs and process the results.

/** Pointer to the output blob **/
const TBlob[float]::Ptr fOutput = std::dynamic_pointer_cast[TBlob[float]](outputBlobs.begin()->second);
/** fOutput->data()[] - accessing output blob data **/

Building Your Application

For details about building your application, see the CMake files for the sample applications. All samples reside in the samples directory in the Inference Engine installation directory.

Running the Application

Before running compiled binary files:

Make sure your application can find the Inference Engine libraries. On Linux* operating systems, the LD_LIBRARY_PATH environment variable specifies the library directories.

Update LD_LIBRARY_PATH with directory paths under the Inference Engine installation directory in which the libraries reside.

Add a path the directory containing the core and plugin libraries:

For Inference Engine installed within the Intel® CV SDK package:

$ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH
</linux_version>

For Intel's Deep Learning Deployment Toolkit installation:

$ export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/inference_engine/lib/<linux_version>/intel64:$LD_LIBRARY_PATH
/<linux_version>

Add paths the directories containing the required third-party libraries:

For Inference Engine installed within the Intel® CV SDK package:

$ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/external/mklml_lnx/lib:$LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=/opt/intel/computer_vision_sdk_<version>/inference_engine/external/cldnn/lib:$LD_LIBRARY_PATH

For Intel Deep Learning Deployment Toolkit installation:

$ export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/external/mklml_lnx/lib:$LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=/opt/intel/deep_learning_sdk_<version>/deployment_tools/external/cldnn/lib:$LD_LIBRARY_PATH

As an alternative, use scripts under the Inference Engine directory for the Intel® CV SDK and Deep Learning Deployment Toolkit installations respectively:

/opt/intel/computer_vision_sdk_<version>/bin/setupvars.sh

/opt/intel/deep_learning_sdk_<version>/deployment_tools/inference_engine/bin/setvars.sh

To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and INSTALL_DIR/bin/intel64/Release/*.dll files are in the application directory or accessible through the PATH% environment variable.

Adding Your Own Kernels in the Inference Engine

A Layer is a CNN building block is implemented in the training framework, such as "Convolution" in Caffe*. Kernel is defined as the corresponding implementation in Inference Engine.

Plug your kernel implementations into the Inference Engine and map them to the layers in the original framework. See the Model Optimizer guide for information about how a mapping between framework's layers and Inference Engine kernels is registered.

The rest of the section covers custom kernels and how to integrate them into the Inference Engine.

Example of Custom Kernels Support in the Samples

Every sample uses the Inference Engine API to load custom kernels depending on the device type. Specifically, for the CPU this is a shared library that exports certain interface that registers the kernels. For Intel® Integrated Graphics, it is an xml file that lists the kernels along with params that the kernels accept, and how these map to the specific Intermediate Representation values.

Example Custom Kernels

The "extension" directory in the "samples" dir comes with few real example of CPU-targeted kernels, like DetectionOutput (used in SSD*), etc.

Bunch the Intel® Integrated Graphics-targeted kernels to the binaries upon compiling the samples so the samples' applications can easily load them. See the cldnn_global_custom_kernels directory in the GPU plugin installation directory.

How to Implement Custom Intel® Integrated Graphics Layers

You must provide the kernel code in the OpenCL C, and the configuration file that connects the kernel and its parameters to the params of the layer.

You have two options for using the custom layer configuration file.

Include a section with your kernels into the global auto-loading file cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml

Second one is to provide a separate configuration file and load it using IInferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value, before loading the network that features the custom layers:

// Load the Intel® Integrated Graphics plugin InferenceEngine::InferenceEnginePluginPtr plugin_ptr(selectPlugin({…, “GPU”)); InferencePlugin plugin(plugin_ptr); 
// Load the Intel® Integrated Graphics Extensions plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});

For details about the configuration parameters and OpenCL kernel see the tutorial at https://software.intel.com/en-us/cvsdk-custom-layers-support-in-inference-engine-tutorial-custom-layers-workflow

How to Implement Custom CPU Layers

The instructions below are a brief summary of the Custom Layers tutorial available at https://software.intel.com/en-us/cvsdk-custom-layers-support-in-inference-engine-tutorial-custom-layers-workflow

For more details, see the sample source.

Create a custom layer factory CustomLayerFactory class.

// custom_layer.h
// A CustomLayerFactory class is an example layer which make exponentiation by 2 for the input and doesn't change dimensions
class CustomLayerFactory {
};

Inherit it from the abstract class InferenceEngine::ILayerImplFactory:

// custom_layer.h
class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
};

Create constructor and virtual destructor, and a data member to keep the layer info

// custom_layer.h
class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
public:
    explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {}
private:
    CNNLayer cnnLayer;
};

Overload and implement the abstract methods (getShapes, getImplementations) of the InferenceEngine::ILayerImplFactory class

// custom_layer.h
class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
public:
    // ... constructor and destructor
    StatusCode getShapes(const std::vector& inShapes, std::vector& outShapes, ResponseDesc *resp) noexcept override {
        if (cnnLayer == nullptr) {
            std::string errorMsg = "Cannot get cnn layer!";
            errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
            return GENERAL_ERROR;
        }
        if (inShapes.size() != 1) {
            std::string errorMsg = "Incorrect input shapes!";
            errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
            return GENERAL_ERROR;
        }
        outShapes.clear();
        outShapes.emplace_back(inShapes[0]);
        return OK;
    }
    StatusCode getImplementations(std::vector& impls, ResponseDesc *resp) noexcept override {
        // You can put cnnLayer to implimentation if it is necessary.
        impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl()));
        return OK;
    }
};

Create your custom layer implementation CustomLayerImpl class:

// custom_layer.h
// A CustomLayerImpl class is an example implementation
class CustomLayerImpl {
};

Because the layer uses the execute method to change data, inherit it from the abstract class InferenceEngine::ILayerExecImpl, and overload and implement the abstract methods of this class.

// custom_layer.h
// A CustomLayerImpl class is an example implementation
class CustomLayerImpl: public ILayerExecImpl {
public:
    explicit CustomLayerImpl(const CNNLayer *layer): cnnLayer(*layer) {}
    StatusCode getSupportedConfigurations(std::vector& conf, ResponseDesc *resp) noexcept override;
    StatusCode init(LayerConfig& config, ResponseDesc *resp) noexcept override;
    StatusCode execute(std::vector& inputs, std::vector& outputs, ResponseDesc *resp) noexcept override;
private:
    CNNLayer cnnLayer;
};

Implement the getSupportedConfigurations to return all supported configurations for this implementation. To specify formats of data use InferenceEngine::TensorDesc:

// custom_layer.cpp
virtual StatusCode CustomLayerImpl::getSupportedConfigurations(std::vector& conf, ResponseDesc *resp) noexcept {
    try {
        // This layer can be in-place but not constant!!!
        if (cnnLayer == nullptr)
            THROW_IE_EXCEPTION << "Cannot get cnn layer";
        if (cnnLayer->insData.size() != 1 || cnnLayer->outData.empty())
            THROW_IE_EXCEPTION << "Incorrect number of input/output edges!";
        LayerConfig config;
        DataPtr dataPtr = cnnLayer->insData[0].lock();
        if (!dataPtr)
            THROW_IE_EXCEPTION << "Cannot get input data!";
        DataConfig dataConfig;
        dataConfig.inPlace = -1;
        dataConfig.constant = false;
        SizeVector order;
        for (size_t i = 0; i < dataPtr->getTensorDesc().getDims().size(); i++) {
            order.push_back(i);
        }
        // Planar formats for N dims
        dataConfig.desc = TensorDesc(dataPtr->getTensorDesc().getPrecision(),
                                     dataPtr->getTensorDesc().getDims(),
                                     {dataPtr->getTensorDesc().getDims(), order});
        config.inConfs.push_back(dataConfig);
        DataConfig outConfig;
        outConfig.constant = false;
        outConfig.inPlace = 0;
        order.clear();
        for (size_t i = 0; i < cnnLayer->outData[0]->getTensorDesc().getDims().size(); i++) {
            order.push_back(i);
        }
        outConfig.desc = TensorDesc(cnnLayer->outData[0]->getTensorDesc().getPrecision(),
                                    cnnLayer->outData[0]->getDims(),
                                    {cnnLayer->outData[0]->getDims(), order});
        config.outConfs.push_back(outConfig);
        config.dynBatchSupport = 0;
        conf.push_back(config);
        return OK;
    } catch (InferenceEngine::details::InferenceEngineException& ex) {
        std::string errorMsg = ex.what();
        errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
        return GENERAL_ERROR;
    }
}

Implement init and execute methods. init is necessary to get selected configuration and check parameters.

// custom_layer.cpp
virtual StatusCode CustomLayerImpl::init(LayerConfig& config, ResponseDesc *resp) noexcept {
    StatusCode rc = OK;
    if (config.dynBatchSupport) {
        config.dynBatchSupport = 0;
        rc = NOT_IMPLEMENTED;
    }
    for (auto& input : config.inConfs) {
        if (input.inPlace >= 0) {
            input.inPlace = -1;
            rc = NOT_IMPLEMENTED;
        }
        for (auto& offset : input.desc.getBlockingDesc().getOffsetPaddingToData()) {
            if (offset) {
                return GENERAL_ERROR;
            }
        }
        if (input.desc.getBlockingDesc().getOffsetPadding()) {
            return GENERAL_ERROR;
        }
        for (size_t i = 0; i < input.desc.getBlockingDesc().getOrder().size(); i++) {
            if (input.desc.getBlockingDesc().getOrder()[i] != i) {
                if (i != 4 || input.desc.getBlockingDesc().getOrder()[i] != 1)
                    return GENERAL_ERROR;
            }
        }
    }
    for (auto& output : config.outConfs) {
        if (output.inPlace < 0) {
            // NOT in-place
        }
        for (auto& offset : output.desc.getBlockingDesc().getOffsetPaddingToData()) {
            if (offset) {
                return GENERAL_ERROR;
            }
        }
        if (output.desc.getBlockingDesc().getOffsetPadding()) {
            return GENERAL_ERROR;
        }
        for (size_t i = 0; i < output.desc.getBlockingDesc().getOrder().size(); i++) {
            if (output.desc.getBlockingDesc().getOrder()[i] != i) {
                if (i != 4 || output.desc.getBlockingDesc().getOrder()[i] != 1)
                    return GENERAL_ERROR;
            }
        }
    }
    return rc;
}
virtual StatusCode CustomLayerImpl::execute(std::vector& inputs, std::vector& outputs, ResponseDesc *resp) noexcept {
    if (inputs.size() != 1 || outputs.empty()) {
        std::string errorMsg = "Incorrect number of input or output edges!";
        errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
        return GENERAL_ERROR;
    }
    const float* src_data = inputs[0]->buffer();
    float* dst_data = outputs[0]->buffer();
    for (size_t o = 0; o < outputs->size(); o++) {
        if (dst_data == src_data) {
            dst_data[o] *= dst_data[o];
        } else {
            dst_data[o] = src_data[o]*src_data[o];
        }
    }
}

Create a factory for your own primitives, inherited from the abstract class InferenceEngine::IExtension

// custom_extension.h
class CustomExtention : public InferenceEngine::IExtension {
}; 
Implement the utility methods Unload, Release, SetLogCallback:
// custom_extension.h
class CustomExtention : public InferenceEngine::IExtension {
public:
    // could be used to cleanup resources
    void Unload() noexcept override {
    }
    // is used when destruction happens
    void Release() noexcept override {
        delete this;
    }
    // logging is used to track what is going on inside
    void SetLogCallback(InferenceEngine::IErrorListener &listener) noexcept override {}
};

Implement the utility method GetVersion:

// custom_extension.h
class CustomExtention : public InferenceEngine::IExtension {
private:
    static InferenceEngine::Version ExtensionDescription = {
        {1, 0},             // extension API version
        "1.0",              
        "CustomExtention"   // extension description message
    };
public:
    // gets extension version information
    void GetVersion(const InferenceEngine::Version *& versionInfo) const noexcept override {
        versionInfo = &ExtensionDescription;
    }
}; 
Implement main extension methods:
// custom_extension.h
class CustomExtention : public InferenceEngine::IExtension {
public:
    // ... utility methods
    StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override {
        std::string type_name = "CustomLayer";
        types = new char *[1];
        size = 1;
        types[0] = new char[type_name.size() + 1];
        std::copy(type_name.begin(), type_name.end(), types[0]);
        types[0][type_name.size()] = '\0';
        return OK;
    }
    StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override {
        if (cnnLayer->type != "CustomLayer") {
            std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!";
            errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
            return NOT_FOUND;
        }
        factory = new CustomLayerFactory(cnnLayer);
        return OK;
    }
};

To use your custom layers, compile the code as the shared library, and then use the AddExtension method of the general plugin interface to load your primitives:

auto extension_ptr = make_so_pointer<inferenceengine::iextension>(“<shared lib path>”);
// Add extension to the plugin’s list
plugin.AddExtension(extension_ptr);</inferenceengine::iextension>

Using the Validation Application to Check Accuracy on a Dataset

The Inference Engine Validation application lets you score common topologies with standard inputs and outputs configuration. These topologies include AlexNet and SSD. The Validation application allows the user to collect simple validation metrics for the topologies. It supports Top-1/Top-5 counting for classification networks and 11-points mAP calculation for object detection networks.

Possible Validation application uses:

Check if Inference Engine scores the public topologies well
Verify if the user's custom topology compatible with the default input/output configuration and compare its accuracy with the public ones
Using Validation application as another sample: although the code is much more complex than in classification and object detection samples, it's still open and could be re-used

The application loads a network to the Inference Engine plugin. Then:

The application reads the validation set (the -i option):
- If -i specifies a directory. The application tries to load labels first. To do so, the application searches for a file with the same base name as the model, but with a .labels extension. The application then searches the specified directory and adds all images from sub-directories whose names are equal to a known label to the validation set. If there are no sub-directories whose names are equal to known labels, the validation set is considered empty.
- If -i specifies a .txt file. The application reads the .txt file, considering every line that has the format: <relative_path_from_txt_to_img] <ID] where ID is the image number that the network should classify.
The application reads the number of images specified by -b and loads the images to the plugin. When all images are loaded, the plugin does inference and the Validation application collects the statistics.

NOTE: Image load time is not part of of the inference time reported by the application.

As an option, use the -dump option to retrieve the inference results. This option creates an inference report with the name in as dumpfileXXXX.csv. in this format, using semicolon separated values:

Image_path
Flag representing correctness of prediction
ID of the Top-1 class
Probability that the image belongs to the Top-1 class
ID of the Top-2 class
Probability that the image belongs to the Top-x class, where x is an integer

CLI Options

Usage: validation_app [OPTION]
Available options:
    -h                        Print a usage message
    -t                  Type of the network being scored ("C" by default)
      -t "C" for classification
      -t "OD" for object detection
    -i [path]                 Required. Directory with validation images, directorys grouped by labels or a .txt file list for classification networks or a VOC-formatted dataset for object detection networks
    -m [path]                 Required. Path to an .xml file with a trained model
    -l [absolute_path]        Required for Intel® MKL-DNN (CPU)-targeted custom layers.Absolute path to a shared library with the kernel implementations
    -c [absolute_path]        Required for Intel® Integrated Graphics-targeted custom kernels.Absolute path to the xml file with the kernel descriptions
    -d [device]               Specify the target device to infer on; CPU, Intel® Integrated Graphics, FPGA or MYRIAD is acceptable. The sample looks for a suitable plugin for the specified device. The plugin is CPU by default.
    -b N                      Batch size value. If not specified, the batch size value is determined from IR
    -ppType             Preprocessing type. One of "None", "Resize", "ResizeCrop"
    -ppSize N                 Preprocessing size (used with ppType="ResizeCrop")
    -ppWidth W                Preprocessing width (overrides -ppSize, used with ppType="ResizeCrop")
    -ppHeight H               Preprocessing height (overrides -ppSize, used with ppType="ResizeCrop")
    --dump                    Dump filenames and inference results to a csv file

    Classification-specific options:
      -Czb true               "Zero is a background" flag. Some networks are trained with a modified dataset where the class IDs are enumerated from 1, but 0 is an undefined "background" class (which is never detected)

    Object detection-specific options:
      -ODkind           Kind of an object detection network: SSD
      -ODa [path]             Required for OD networks. Path to the directory containing .xml annotations for images
      -ODc              Required for OD networks. Path to the file containing classes list
      -ODsubdir         Directory between the image path (-i) and image name, specified in the .xml. Use JPEGImages for VOC2007

Option Categories

Common options are usually named with a single letter or word, such as -b or –dump. These options have a common sense in all validation_app modes.
Network type-specific options are named as an acronym of the network type (such as C or OD, followed by a letter or a word addendum. These options are specific for the network type. For instance, ODa makes sense only for an object detection network.

The next section shows how to use the Validation application in classification mode to score a classification CNN on a pack of images.

Running the Application in Classification Mode

This section demonstrates how to run the Validation application in classification mode to score a classification CNN on a pack of images.

To do inference of a chosen pack of images:

$ ./validation_app -t C -i <path to images main directory or .txt file] -m <model to use for classification] -d <CPU|Intel® Integrated Graphics]

Source dataset format: directories as classes

A correct list of files looks similar to:

<path]/dataset
    /apron
        /apron1.bmp
        /apron2.bmp
    /collie
        /a_big_dog.jpg
    /coral reef
        /reef.bmp
    /Siamese
        /cat3.jpg

To score this dataset put the -i <path]/dataset option in the command line.

Source dataset format: a list of images

This example uses a single list file in the format image_name-tabulation-class_index. The correct list of files:

<path]/dataset
    /apron1.bmp
    /apron2.bmp
    /a_big_dog.jpg
    /reef.bmp
    /cat3.jpg
    /labels.txt

where labels.txt:

apron1.bmp 411
apron2.bmp 411
cat3.jpg 284
reef.bmp 973
a_big_dog.jpg 231

To score this dataset put the -i <path>/dataset/labels.txt option in the command line.

Output Description

A progress bar shows the inference progress. Upon completion, the common information is displayed.

Network load time: time spent on topology load in ms
Model: path to chosen model
Model Precision: precision of a chosen model
Batch size: specified batch size
Validation dataset: path to a validation set
Validation approach: Classification networks
Device: device type

You see statistics such as the average inference time, and top-1 and top-5 accuracy:

Average infer time (ms): 588.977 (16.98 images per second with batch size = 10)

Top1 accuracy: 70.00% (7 of 10 images were detected correctly, top class is correct)
Top5 accuracy: 80.00% (8 of 10 images were detected correctly, top five classes contain required class)

Using Object Detection with the Validation Application

Description

Running the Validation application in object detection mode to score an object detection on the SSD CNN pack of images.

Running SSD on the VOC Dataset

Use these steps to score SSD on the original dataset that was used to test it during its training.

./validation_app -d CPU -t OD -ODa "<...>/VOCdevkit/VOC2007/Annotations" -i "<...>/VOCdevkit" -m "<...>/vgg_voc0712_ssd_300x300.xml" -ODc "<...>/VOC_SSD_Classes.txt" -ODsubdir JPEGImages

Go to the SSD author's github page to select the pre-trained SSD-300.

From the same page, download the VOC2007 test dataset:

$wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
tar -xvf VOCtest_06-Nov-2007.tar

Use the Model Optimizer to convert the model. For help, see https://software.intel.com/en-us/articles/CVSDK-ModelOptimizer
Create a proper class file (made from the original labelmap_voc.prototxt) none_of_the_above 0 aeroplane 1 bicycle 2 bird 3 boat 4 bottle 5 bus 6 car 7 cat 8 chair 9 cow 10 diningtable 11 dog 12 horse 13 motorbike 14 person 15 pottedplant 16 sheep 17 sofa 18 train 19 tvmonitor 20
Save it as VOC_SSD_Classes.txt
Score the model on the dataset:

You see a progress bar followed by your data:

Progress: [....................] 100.00% done    
[ INFO ] Processing output blobs
Network load time: 27.70ms
Model: /home/user/models/ssd/withmean/vgg_voc0712_ssd_300x300/vgg_voc0712_ssd_300x300.xml
Model Precision: FP32
Batch size: 1
Validation dataset: /home/user/Data/SSD-data/testonly/VOCdevkit
Validation approach: Object detection network

Average infer time (ms): 166.49 (6.01 images per second with batch size = 1)
Average precision per class table: 

Class   AP
1   0.796
2   0.839
3   0.759
4   0.695
5   0.508
6   0.867
7   0.861
8   0.886arXiv
9   0.602
10  0.822
11  0.768
12  0.861
13  0.874
14  0.842
15  0.797
16  0.526
17  0.792
18  0.795
19  0.873
20  0.773
Mean Average Precision (mAP): 0.7767

The Mean Value Precision is in a table on the SSD author's page and in the arXiv paper.

Advanced Topics

Key terms in this section

Acronym/Term	Description
C, CHW, NC	Tensor memory layout. For example, the CHW value at index (c,h,w) is physically located at index (c * H + h) * W = w, for others by analogy.
DL	Deep Learning
FP16 format	Half-precision floating-point format
FP32 format	Single-precision floating-point format
I16 format	2-byte unsigned integer format
NCHW, NHWC	Image data layout. Refers to the representation of batches of images. N - Number of images in a batch H - Number of pixels in the vertical dimension W - Number of pixels in the horizontal dimension C - Channels
U16 format	2-byte signed integer format
U8 format	1-byte unsigned integer format

Supported Model Formats

Device	FP32	FP16
CPU	Supported and Preferred	Not Supported
Intel® Integrated Graphics	Supported	Supported and Preferred
FPGA	Supported	Supported
Intel® Movidius™ Myriad™ 2 Vision Processing Unit	Not Supported	Supported

Understanding Inference Engine Memory Primitives

Blobs

InferenceEngine::Blob is the main class intended for working with memory. This class lets you read and write memory and get information about the memory structure, among other tasks.

To create Blob objects with a specific layout, use constructors with InferenceEngine::TensorDesc.

InferenceEngige::TensorDesc tdesc(FP32, {1, 3, 227, 227}, InferenceEngine::Layout::NCHW);
InferenceEngine::Blob::Ptr blob = InferenceEngine::make_shared_blob(tdesc);

Layouts

InferenceEngine::TensorDesc is a special class that provides layout format description.

This class allows to create planar layouts using the standard formats, like InferenceEngine::Layout::NCHW, InferenceEngine::Layout::NC, InferenceEngine::Layout::C, and non-planar layouts using InferenceEngine::BlockingDesc.

To create a complex layout, use InferenceEngine::BlockingDesc, which allows to define the blocked memory with offsets and strides.

Examples

Define a blob with dimensions, {N: 1, C: 25, H: 20, W: 20}, and format, NHWC:

InferenceEngine::BlockingDesc({1, 20, 20, 25}, {0, 2, 3, 1}); // or
InferenceEngine::BlockingDesc({1, 20, 20, 25}, InferenceEngine::Layout::NHWC);

If you have memory with real dimensions {N: 1, C: 25, H: 20, W: 20}, but with channels that are blocked by 8, define the memory with parameters:
```
InferenceEngine::BlockingDesc({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1})
```

Set strides and offsets if the layout contains them. If your blob layout is complex and you don't want to calculate the real offset to data, use InferenceEngine::TensorDesc::offset(size_t l) or InferenceEngine::TensorDesc::offset(SizeVector v).
For example:

InferenceEngine::BlockingDesc blk({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1});
InferenceEngine::TensorDesc tdesc(FP32, {1, 25, 20, 20}, blk);
tdesc.offset(0); // = 0
tdesc.offset(1); // = 8
tdesc.offset({0, 0, 0, 2}); // = 16
tdesc.offset({0, 1, 0, 2}); // = 17

If you want to create a TensorDesc with a planar format and for N dimensions (N can be different 1, 2, 4 and etc), use:

InferenceEngine::TensorDesc::getLayoutByDims.
InferenceEngine::TensorDesc::getLayoutByDims({1}); // InferenceEngine::Layout::C
InferenceEngine::TensorDesc::getLayoutByDims({1, 2}); // InferenceEngine::Layout::NC
InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4}); // InferenceEngine::Layout::NCHW
InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3}); // InferenceEngine::Layout::BLOCKED
InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, ...}); // InferenceEngine::Layout::BLOCKED
Documentation for Intel(R) Deep Learning Deployment Toolkit Developer Guide

Supported Devices

The Inference Engine can infer models in different formats with various input and output formats. This section provides supported and optimal configurations per device.

The Inference Engine provides unique capabilities to infer deep learning models on these device types:

CPU
Intel® Integrated Graphics
FPGA
Myriad
Heterogeneous execution

Supported Input Precision

Device	FP32	FP16	U8	U16	I16
CPU	Supported	Not Supported	Supported	Supported	Supported
Intel® Integrated Graphics	Supported	Supported* - See NOTE below	Supported*	Supported*	Supported*
FPGA	Supported	Supported*	Supported	Supported	Supported
Intel® Movidius™ Myriad™ 2 Vision Processing Unit	Supported	Supported	Supported and Preferred	Not Supported	Not Supported

* NOTE: Supported through SetBLob only. GetBlob returns FP32. Supported without mean image.

Supported Output Precision

Plugin	FP32	FP16
CPU	Supported	Not Supported
Intel® Integrated Graphics	Supported	Supported
FPGA	Supported	Supported
Intel® Movidius™ Myriad™ 2 Vision Processing Unit	Supported	Supported and Preferred

Supported Input Layout

Plugin	FP32	FP16
CPU	Supported	Not Supported
Intel® Integrated Graphics	Supported	Not Supported
FPGA	Supported	Not Supported
Intel® Movidius™ Myriad™ 2 Vision Processing Unit	Supported	Supported and Preferred

Supported Output Layout

Number of Dimension	4	3	2	1
Layout	NCHW	CHW	NC	C

Intel CPU Plugin

The Intel CPU plugin provides an opportunity for high-performance scoring of neural networks on the CPU, using the Intel® MKL-DNN library.

The Intel CPU plugin uses OpenMP* to parallelize calculations.

Supported Layers

BatchNorm
Clamp
Concat
Convolution
Crop
Deconvolution
Eltwise
ELU
FullyConnected
Logistic
LRN
Permute
Pooling
Power
ReLU
Reshape
ROIPooling
ScaleShift
Softmax
Split
TanH
Tile

The set of supported layers can be expanded with the extensibility library. To add a new layer in this library, use the extensibility mechanism.

Supported Platforms

The Intel® Computer Vision SDK is supported and validated on these platforms:

Host	64-bit OS
Development	Ubuntu* 16.04 CentOS 7.4/MS Windows* 10
Target	Ubuntu* 16.04 CentOS 7.4/MS Windows* 10

The CPU plugin supports inference on Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors with Intel® SSE.

Use the -pc flag for samples to learn which configuration is used by a layer. -pc shows execution statistics to use for information about the layer name, execution status, layer type, execution time, and the type of the execution primitive.

Internal Intel CPU Plugin Optimizations

The Intel CPU Plugin supports several graph optimization algorithms:

Merging of group convolutions - If topology contains the next pipeline. The Intel® MK-DNN Plugin merges it into the one Convolution with the group parameter (Convolutions should have the same parameters).

Fusing Convolution + Sum or Convolution + Sum + Relu

Fusing Convolution with ReLU or ELU. Intel CPU plugin is fusing all Convolution with ReLU or ELU layers if these layers are located after the Convolution layer.
Removing the power layer. Intel CPU plugin removes Power layer from topology if it has next parameters: power = 1, scale = 1, offset = 0.
Fusing Convolution + Sum or Convolution + Sum + Relu. To improve performance, the Intel CPU plugin fuses the next structure:
This fuse allows you to upgrade the graph to the following structure:

Upgrade the graph optimization algorithm graph

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::IInferencePlugin::LoadNetwork().

Parameter Name	Parameter Values	Default	Description
`KEY_CPU_BIND_THREAD`	`YES/NO`	`YES`	This parameter allows to bind OpenMP threads. It means that the number of OpenMP threads are equal to the number of HW cores if the value is YES.
`KEY_DYN_BATCH_LIMIT`	number	Network batch size	This key allows to set the batch size to all following Infer calls. If the input blob has sizes 32x3x224x224 after applying `plugin.SetConfig({KEY_DYN_BATCH_LIMIT, 10})` Inference Engine primitives process only beginner sub blobs with size 10x3x224x224. This value can be changed before any `Infer` call to specify a new limit
`EXCLUSIVE_ASYNC_REQUESTS`	`YES/NO`	`NO`	This key enables exclusive mode for async requests of different executable networks and the same plugin.
`PERF_COUNT/`	`YES/NO`	`NO`	This key enables performance counters option

CPU Extensions

The CPU extensions library contains code of important layers that do not come with the CPU plugin. You should compile this library and use the AddExtension method in your application to load the extensions when for models featuring layers from this library. See other samples for AddExtension code examples.

When you compile the entire list of the samples, the cpu_extension library is also compiled.

For performance, the library's cmake script detects your computer configuration and enables platform optimizations. Alternatively, you can explicitly use cmake flags: -DENABLE_AVX2=ON, -DENABLE_AVX512F=ON or -DENABLE_SSE42=ON when cross-compiling this library for another platform.

List of layers that come in the library:

ArgMax
CTCGreedyDecoder
DetectionOutput
GRN
Interp
MVN
Normalize
PowerFile
PReLU
PriorBox
PriorBoxClustered
Proposal
PSROIPooling
Resample
SimplerNMS
SpatialTransformer

Use the extensibility mechanism to add a layer. For information, see Adding Your Own Kernels in the Inference Engine.

Intel® Integrated Graphics Plugin

The Intel® Integrated Graphics plugin uses the Intel® Compute Library for Deep Neural Networks to infer deep neural networks. This is an open source performance library for Deep Learning applications intended for acceleration of deep learning inference on Intel® Processor Graphics, including HD Graphics and Iris® Graphics.

Supported Layers

Activation (ReLU, Sigmoid, Logistic, TanH, ELU, Clamp)
BatchNormalization
Concatenate
Convolution
Copy
Crop
Deconvolution
DetectionOutput
Eltwise
Flatten
FullyConnected
LRN
Normalize
Permute
Pooling
Power
PReLU
PriorBox
Proposal
PSROIPooling
Reshape
ROIPooling
ScaleShift
SimplerNMS
SoftMax
Split
Upsampling

Supported Optimizations

Fused layers:
- Convolution - Activation
- Deconvolution - Activation
- Eltwise - Activation
- Fully Connected - Activation
Layers optimized out when conditions allow:
- Crop
- Concatenate
- Reshape
- Flatten
- Split
- Copy
Layers executed during load time (not during inference):
- PriorBox

CPU Executed Layers

The following layers aren't accelerated on the Intel® Integrated Graphics and instead are executed on the host CPU.

Proposal
SimplerNMS
PriorBox
DetectionOutput

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. All parameters must be set before calling InferenceEngine::IInferencePlugin::LoadNetwork().

Name	Value	Default	Description
`KEY_PERF_COUNT`	`YES / NO`	`NO`	Collect performance counters during inference
`KEY_CONFIG_FILE`	`"file1 [file2 ...]"`	""	Load custom layer configuration files
`KEY_DUMP_KERNELS`	`YES / NO`	NO	Dump the final kernels used for custom layers
`KEY_TUNING_MODE`	`TUNING_DISABLED` `TUNING_CREATE` `TUNING_USE_EXISTING`	`TUNING_DISABLED`	Disable inference kernel tuning Create tuning file (expect much longer runtime) Use an existing tuning layer
`KEY_TUNING_FILE`	`"filename"`	""	Tuning file to create / use
KEY_PLUGIN_PRIORITY	<0-3>	0	OpenCL queue priority
KEY_PLUGIN_THROTTLE	<0-3>	0	OpenCL queue throttling

Debug Capabilities in the Intel® Integrated Graphics Plugin

The Intel® Integrated Graphics plugin provides the possibility to dump the user custom OpenCL™ kernels to a file to allow you to debug compilation issues in your custom kernels.

The application can use the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::YES. Then during network loading, all custom layers print their OpenCL kernels with the JIT instrumentation added by the plugin. The kernels are stored in the working directory under files named in the format: clDNN_program0.cl, clDNN_program1.cl

The Debug option is disabled by default. Additionally, the application can call the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::NO before network loading.

How to Verify Debug is Disabled

Delete all clDNN_program*.cl files from the current directory
Run your application to load a network
Examine the working directory for the presence of any kernel file, such as clDNN_program0.cl

FPGA Plugin

The FPGA plugin provides an opportunity for high-performance scoring of neural networks on Intel® FPGA devices.

Supported Layers

Batch_norm (being converted by Model Optimizer to ScaleShift layer)
Concat
Convolution (dilated convolutions are supported, depthwise are not supported)
Eltwise (operation sum is supported)
Fully Connected
LRN Normalization
Pooling
Power (scale and offset parameters are supported)
ReLu (with negative slope)
ScaleShift

NOTE: Support is limited to the specific parameters, and depends on the bitstream.

Heterogeneous Execution

If a topology contains layers that aren't supported on the FPGA, use the Heterogeneous plugin with a dedicated fallback device.

If a network has layers that aren't supported by either the FPGA plugin or in fallback plugin, implement a custom layer with the CPU or Intel® Integrated Graphics and use the extensibility mechanism described in Inference Engine Kernels Extensibility. In addition of adding custom kernels, point to the CPU or Intel® Integrated Graphics plugin as fallback devices for the Heterogeneous plugin.

Supported Platforms

The Intel® Computer Vision SDK is officially supported and validated on the following FPGA setup:

Host	64-bit OS	Platform
Development	Ubuntu* 16.04 CentOS* 7.4	6th Generation Intel® Core™ Processors
Target	Ubuntu* 16.04 CentOS* 7.4	Intel® Arria® 10GX/A10PL4 FPGA

How to Interpret Performance Counters

As a result of collecting performance counters using InferenceEngine::IInferencePlugin::GetPerformanceCounts performance data is available for execution on FPGA, pre- and post-processing data, and transferring data from and to the FPGA card.

If your network is divided in two with CPU execution, performance data is available about Intel® MKL-DNN kernels, their types and other useful information.

FPGA Support Limitations for CNN

The FPGA Beta release has limitations for the network topologies, kernel parameters, and batch size.

Depending on the bitstream loaded on the target device, FPGA actually performs calculations with precision rates ranging from FP11 to FP16. This may have potential accuracy implications. Use the Validation application to verify the network accuracy on validation data set.
Networks having many non supported layers on FPGA stayed in topologies between supported layers might lead to dividing of graph to many subgraphs that might lead to CL_OUT_OF_HOST_MEMORY error. These topologies are non FPGA friendly for this release.
During the using of Heterogeneous plugin, the affinity and distribution of nodes by devices depends on bitstream. Some layers might not be supported by bitstream or parameters of the layer are not supported by bitstreams.
Any fully-connected layer can only be followed by another fully-connected (possibly with the ReLU) layer. No convolution layer can follow a fully-connected layer, otherwise the graph verification fails and returns an error message.
Single output from a fully-connected layer (potentially coupled with ReLU) is supported.
Several outputs from Convolution (and other layers except fully-connected) layer are supported, but this output cannot be passed to the other layers on the FPGA.
When executing on the FPGA, the first iteration is much slower than the next iterations. You can perform multiple iterations when assessing inference performance.
Consider batching for performance conclusions. Depending on the bitstream loaded on the FPGA, the batch size is typically limited to 96.

Bitstream Availability

Various FPGA bitstreams that support CNN are available in Intel® CV SDK package for FPGA.

Intel® Movidius™ Myriad™ 2 Vision Processing Unit Stick Plugin

High performance scoring is available on neural networks that use the Intel® Movidius™ Myriad™ 2 Vision Processing Unit.

Supported Layers

BatchNormalization
Bias
Concatenate
Convolution
Copy
Crop
CTCDecoder
Deconvolution
DepthwiseConvolution
DetectionOutput
Eltwise (SUM, MAX, MUL)
ELU
Flatten
FullyConnected
Leaky ReLU
LRN
Normalize
Permute
Pooling (MAX, AVG)
Power
PReLU
PriorBox
PriorBoxClustered
ReLU
Reshape
Scale
ScaleShift
Sigmoid
Slice
SoftMax
Split
TanH
Tile

Installing USB Rules

To do inference on the Intel® Movidius™ Myriad 2™ Vision Processing Unit install USB rules by running the commands:

cat <<EOF > 97-usbboot.rules
SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
EOF
sudo cp 97-usbboot.rules /etc/udev/rules.d/
sudo udevadm control --reload-rules
sudo udevadm trigger
sudo ldconfig
rm 97-usbboot.rules

Supported Configuration Parameters

Name	Values	Default	Description
`KEY_VPU_LOG_LEVEL`	`LOG_WARNING LOG_INFO LOG_DEBUG`	`LOG_NONE`	Set log level for devices
`KEY_VPU_INPUT_NORM`	real number	`1.0`	Normalization coefficient for the network input
`KEY_VPU_INPUT_BIAS`	real number	`0.0`	Bias value that is added to each element of the network input
`KEY_VPU_PRINT_RECEIVE_TENSOR_TIME`	`YES/NO`	`NO`	Add device-side time spent to receive input to PerformanceCounts

Heterogeneous Plugin

The Heterogeneous plugin enables computing for inference on one network on several devices. Purposes to execute networks in Heterogeneous mode:

To utilize accelerators power and calculate heaviest parts of network on accelerator and execute not supported layers on fallback devices like CPU
To utilize all available hardware more efficiently during one inference

The execution through the Heterogeneous plugin can be divided into two steps:

Setting of affinity to layers (binding them to devices in InferenceEngine::ICNNNetwork)
Loading the Network to the Heterogeneous plugin, splitting the network into parts and their execution through dedicated plugin.

These steps are decoupled. The setting of affinity can be done automatically using fallback policy or in manual mode.

The fallback automatic policy means greedy behavior and assigns all layers which can be executed on certain device on that device follow priorities.

Some topologies are not friendly or cannot be executed in heterogeneous execution on some devices. These networks might be have activation layers that are't supported on the primary device. If transmitting data from one part of the network to another in heterogeneous mode is time-consuming, then it does not make sense to execute these data in heterogeneous mode on these devices. Instead, define the heaviest part manually and set affinity to avoid sending data back and forth several times in one inference.

Annotation of Layers per Device and Default Fallback Policy

Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (FPGA, Intel® Integrated Graphics, CPU).

Alternative way to annotate network - to set affinity manually using CNNLayer::affinity field. This field accepts string values of devices like "CPU" or "FPGA".

The fallback policy does not work if even one layer has initialized affinity. The sequence should be calling of automating affinity settings and then fix manually.

// This example demonstrate how to do default affinity initialization and then
// correct affinity manually for some layers
InferenceEngine::PluginDispatcher dispatcher({ FLAGS_pp, archPath , "" });
InferenceEngine::InferenceEnginePluginPtr enginePtr;
enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
HeteroPluginPtr hetero(enginePtr);
hetero->SetAffinity(network, { }, &resp);
network.getLayerByName("qqq")->affinity = "CPU";
InferencePlugin plugin(enginePtr);
auto executable_network = plugin.LoadNetwork(network, {});

If you rely on default affinity distribution, you can avoid calling IHeteroInferencePlugin::SetAffinity by calling ICNNNetwork::LoadNetwork

InferenceEngine::PluginDispatcher dispatcher({ FLAGS_pp, archPath , "" });
InferenceEngine::InferenceEnginePluginPtr enginePtr;
enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
InferencePlugin plugin(enginePtr);
CNNNetReader reader;
reader.ReadNetwork("Model.xml");
reader.ReadWeights("Model.bin");
auto executable_network = plugin.LoadNetwork(network, {});

Splitting the Network and Execution

During loading of the network to Heterogeneous plugin, network is divided to separate parts and loaded to dedicated plugins. Intermediate blobs between these sub graphs are allocated automatically in the most efficient way.

Execution Precision

Precision for inference in Heterogeneous plugin is defined by:

Precision of the Intermediate Representation
Ability of final plugins to execute in precision defined in the Intermediate Representation

Examples:

To execute Intel® Integrated Graphics with a CPU fallback with the FP16 on Intel® Integrated Graphics, use only FP16 for the Intermediate Representation. The Heterogeneous plugin converts the weight from FP16 to FP32 for execution on the CPU.
To execute on FPGA with a CPU fallback, use any precision for the Intermediate Representation. The execution on FPGA is defined by bitstream, the execution on CPU happens in FP32.

Use these samples with the command:

 ./object_detection_sample_ssd -m <path_to_model]/ModelSSD.xml -i <path_to_pictures]/picture.jpg -d HETERO:FPGA,CPU

where:

HETERO is the Heterogeneous plugin and FPGA
CPU is the fallback policy with the priority on FPGA and the fallback to the CPU

To point to more than two devices, use -d HETERO:FPGA,GPU,CPU

Analyzing With the Heterogeneous Execution

After enabling the KEY_HETERO_DUMP_GRAPH_DOT config key, dump the GraphViz dot files with annotations of devices per layer.

The Heterogeneous plugin can generate two files:

hetero_affinity.dot - annotation of affinities per layer. This file is written to the disk only if the default fallback policy is executed.

hetero_subgraphs.dot - annotation of affinities per graph. This file is written to the disk during the execution of ICNNNetwork::LoadNetwork() for the hetero plugin.

#include "ie_plugin_config.hpp"
#include "hetero/hetero_plugin_config.hpp"
using namespace InferenceEngine::PluginConfigParams;
using namespace InferenceEngine::HeteroConfigParams;
...
enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU");
InferencePlugin plugin(enginePtr);
plugin.SetConfig({ {KEY_HETERO_DUMP_GRAPH_DOT, YES} });

Use the graphviz utility or converters to create png formats. Ubuntu* utilities:

sudo apt-get install xdot
xdot hetero_subgraphs.dot

Use option -pc, in sample data to get performance data on each subgraph.

Output example for Googlenet v1 running on FPGA with a fallback to the CPU:

subgraph1: 1. input preprocessing (mean data/FPGA):EXECUTED       layerType:                    realTime: 129        cpu: 129            execType:
subgraph1: 2. input transfer to DDR:EXECUTED       layerType:                    realTime: 201        cpu: 0              execType:
subgraph1: 3. FPGA execute time:EXECUTED       layerType:                    realTime: 3808       cpu: 0              execType:
subgraph1: 4. output transfer from DDR:EXECUTED       layerType:                    realTime: 55         cpu: 0              execType:
subgraph1: 5. FPGA output postprocessing:EXECUTED       layerType:                    realTime: 7          cpu: 7              execType:
subgraph1: 6. softmax/copy:   EXECUTED       layerType:                    realTime: 2          cpu: 2              execType:
subgraph2: out_prob:          NOT_RUN        layerType: Output             realTime: 0          cpu: 0              execType: unknown
subgraph2: prob:              EXECUTED       layerType: SoftMax            realTime: 10         cpu: 10             execType: ref
Total time: 4212     microseconds

Known Issues

Multiple OpenMP Loadings

If the application uses the Inference Engine with third-party components that depend on Intel® OpenMP, multiple loadings of the libiomp library may occur and cause OpenMP runtime initialization conflicts. This might happen if the application uses the Intel® Math Kernel Library (Intel® MKL) through the “Single Dynamic Library” (libmkl_rt.so) mechanism and calls Intel® MKL after loading the Inference Engine plugin.

Error log report:

OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, see http://www.intel.com/software/products/support/.

Possible workarounds:

Preload the OpenMP runtime using the LD_PRELOAD variable:
This eliminates multiple loadings of libiomp, and makes all components use this specific version of OpenMP.
```
LD_PRELOAD=<path_to_libiomp5.so] <path_to your_executable]
```
Set KMP_DUPLICATE_LIB_OK=TRUE. This option might result in performance degradation or incorrect results.

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Arria, Core, Movidia, Movidius, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

*Other names and brands may be claimed as the property of others.