Intel® Computer Vision SDK 2018 Overview

About the Intel® Computer Vision SDK 2018

This document describes the Intel® Computer Vision SDK, its key components, and its support on a CPU, FPGA, and Intel® Integrated Graphics. At the end of this document is an in-depth glossary of the terms and concepts used by the Intel® CV SDK. This document does not give you instructions about how to install or use the Intel® CV SDK.

The Intel® Computer Vision SDK (Intel® CV SDK) is a comprehensive toolkit that you can use to develop and deploy vision-oriented solutions on Intel platforms. Vision-oriented means the solutions use images or videos to perform specific tasks. A few of the solutions include autonomous vehicles, digital surveillance cameras, robotics, and mixed-reality headsets.

The Intel® CV SDK:

Enables CNN-based deep learning inference on the edge
Supports heterogeneous execution across Intel computer vision accelerators—CPU, Intel® Integrated Graphics, Intel® Movidius™ Myriad™ 2 Vision Processing Unit, and FPGA using a common API
Speeds time-to-market via an easy-to-use library of CV functions and pre-optimized kernels
Includes optimized calls for computer vision standards including OpenCV*, OpenCL™, and OpenVX*

The Intel® CV SDK includes the Deep Learning Deployment Toolkit, a product that includes the Model Optimizer and the Inference Engine. In addition to the Deep Learning Deployment Toolkit, the Intel® CV SDK adds these components:

Deep Learning Workflow

A simple deep learning workflow looks like this:

Intel Computer Vision Basic Workflow

A summary of the steps for optimizing and deploying a trained model:

Configure the Model Optimizer for your framework.
Convert a trained model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and bias values.
Test the model in the Intermediate Representation format using the Inference Engine in the target environment via provided Inference Engine validation application or sample applications.
Integrate the Inference Engine into your application to deploy the model in the target environment.

The next diagram shows how the Deep Learning Deployment Toolkit and the Intel® CV SDK fit into an end-to-end computer vision workflow. The dark blue boxes indicate parts of the Intel® CV SDK, including the Deep Learning Deployment Toolkit. The light blue text indicates Intel tools that you can use for this process. The Intel tools are not the only tools you can use.

Computer vision and deep learning toolkit

The next sections go into more detail about the Intel® CV SDK and Deep Learning Deployment Toolkit.

Supported Frameworks

Three frameworks are supported.

Caffe*

Caffe is a popular open-source framework that was developed at UC Berkeley. It can be used on both Linux and Windows. This framework provides a way to switch from a CPU to Intel® Integrated Graphics by setting a flag on a device that includes Intel® Integrated Graphics.

For more information about Caffe, see http://caffe.berkeleyvision.org/

To work with your Caffe model, see https://software.intel.com/en-us/articles/CVSDK-Using-Caffe

TensorFlow*

TensorFlow is a popular open-source framework that was developed by Google It can be used on both Linux and Windows.

For more information, see https://www.tensorflow.org/

The Model Optimizer accepts only a frozen TensorFlow* model as input. In TensorFlow models, all variables are converted to constants and operations related to training are removed. Detailed explanations about this topic is provided in the TensorFlow* documentation here.

To work with your TensorFlow model, see https://software.intel.com/en-us/articles/CVSDK-Using-TensorFlow

MXNet*

MXNet is a popular open-source framework that was developed by Apache. It can be used on both Linux and Windows. This framework provides a way to switch from CPU to Intel® Integrated Graphics.

For more information, see https://mxnet.apache.org/

To work with your MXNet model, see https://software.intel.com/en-us/articles/CVSDK-Using-MXNet

Model Optimizer

The Model Optimizer is the first of two key components of the Intel® CV SDK and the Deep Learning Deployment Toolkit. The Model Optimizer is a command-line tool for Windows* and Linux that converts trained models into Immediate Representation (IR) files that are required by the Inference Engine. In the optimization process, the Model Optimizer:

Performs horizontal fusion of the network layers
Merges the network layers
Prunes unused branches in the network
Applies weight compression methods

The Model Optimizer has two main purposes:

To produce a valid Intermediate Representation that the Inference Engine can use. The Model Optimizer's main responsibility is to produce two files that form the Intermediate Representation: an .xml that describes the network topology, and a .bin file that contains the weights and biases binary data.
To produce an optimized Intermediate Representation. Pre-trained models can contain layers that are important for training, but serve no value during inference and also increase inference time. Layers can be removed from the Intermediate Representation and represented as one mathematical operation made by single layer. The Model Optimizer tries to recognize patterns and merges layers, creating an Intermediate Representation with fewer layers than the original model, reducing the inference time.

How the Model Optimizer Works

NOTE: The Intel® CV SDK documentation discusses the Caffe, TensorFlow, and MXNet frameworks.

The Model Optimizer:

Loads a trained Caffe, TensorFlow, or MXNet model into memory.
Reads the model
Builds an internal representation of the model
Optimizes the model
Produces Intermediate Representation files. The Intermediate Representation is the only format that is accepted by Inference Engine.

The Model Optimizer uses three stages to process a model:

Stage 1: Learning

Iteratively runs the networks on a set of input samples and collects network statistics on each layer. This allows the Model Optimizer to estimate the dynamic range of all layer activations, weights and biases. This is required only if the target data type differs from the original data type with which the network was trained.
Reports collected statistics for offline analysis. Statistics contain these metrics: min, max, standard deviation, mean, and percentiles (99%, 99.5%, 99.95%).
Builds an optimal configuration for the target precision network, and creates an inference network converted to run in the target data type.

Stage 2: Feedback

Simulates the inference process to estimate potential accuracy loss. Each layer in the produced network has a bit accurate implementation for a specific Intel® platform, which simulates the mode of operation of the hardware and the required data type precision.
Reports the network performance in terms of accuracy and loss. These metrics are identical to those that would have reported using the dedicated scoring API, such as OpenVX*, the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) and others.

Stage 3: Deployment: Outputs an Intermediate Representation of the network. The Intermediate Representation is a required input to the Inference Engine. The Intermediate Representation consists of two files:

A topology file: a .xml file that describes the network topology.
A trained data file: a .bin file that contains the weights and biases binary data.

Layered Models

Frameworks and neural networks topologies include known layers, such as convolution, pooling, and activation. When the Model Optimizer loads a trained model, it looks through the topology for the framework that was used to train the model, trying to find each layer type in the list of known layers. This list is different for each framework.

If the Model Optimizer does not find the layer, it looks for the layer in any custom layers that you provided. If the layer is still not found, you receive a failure message, the search stops, and the conversion to the Intermediate Representation fails.

To read the model and produce the Intermediate Representation, the Model Optimizer must be able to identify and use topology layers. If, like most users, your topology contains only supported layers, you do not need to perform any extra steps. However, if you use a topology that has layers that are not included in the list of supported layers, you need to take further action, depending on which framework you are using:

MXNet models with custom layers: The Model Optimizer fails. You have no options to work with the custom layers.

TensorFlow models with custom layers: You have three options:

Register the layers as Model Optimizer extensions, allowing the Model Optimizer to create a valid and optimized Intermediate Representation.
Use sub-graph replacement if you have some sub-graphs that should be expressed in the Intermediate Representation, and sub-models that should not be expressed.
Register model sub-graphs that can be offloaded to TensorFlow during inference. The Intermediate Representation cannot be inferred with Intel Integrated Graphics or an FPGU, and the Model Optimizer reflects each subgraph as a single custom layer in the Intermediate Representation. This is an experimental feature, intended only for development.

Caffe models with custom layers: You have three options:

Do nothing. If you have the Caffe Python interface installed, the Model Optimizer uses Caffe to generate the Intermediate Representation, using Caffe to calculate the shapes of the custom layers. These layers will not contain the original layer parameters in the Intermediate Representation file.
Register the layers to pass the original layer parameters to the Intermediate Representation by using CustomLayerMaping. This option also requires the Caffe Python interface.
Register the layers as Model Optimizer extensions. The Model Optimizer generates a valid and optimized Intermediate Representation.

To be successful with custom layers, it is important to understand how to do two things:

How to map a sub-graph in a framework model to a sub-graph that consists of Inference Engine layers. For Caffe, the mapping is 1-to-1 between the Caffe layer and and the Inference Engine layer.
How to infer shapes for unknown sub-graphs. This inference can be for a step when the Internal Representation consists of framework-specific layers, or for a step when internal representation already consists of Inference Engine layers.

You can use a framework fallback for unknown sub-graphs. This applies when the original framework is used for inference of output shapes of operations, such as when the framework is not available or should not be used.

To see which topologies are supported with each framework, and to configure your framework to work with the Model Optimizer, see the Caffe*, TensorFlow*, and MXNet* guides.

Model Optimizer Directory Structure

|-- model_optimizer
    |-- extensions
        |-- front/caffe
            |-- CustomLayersMapping.xml.example 
            manner
    |-- mo
        |-- back - Back-End logic: contains IR emitting logic
        |-- front - Front-End logic: contains matching between Framework-specific layers and IR specific, calculation
        of output shapes for each registered layer
        |-- graph - Graph utilities to work with internal IR representation
        |-- middle - Graph transformations - optimizations of the model
        |-- pipeline - Sequence of steps required to create IR for each framework
        |-- utils - Utility functions
    |-- tf_call_ie_layer - Sources for TensorFlow fallback in Inference Engine during model inference
    |-- mo.py - Centralized entry point that can be used for any supported framework
    |-- mo_caffe.py - Entry point particularly for Caffe
    |-- mo_mxnet.py - Entry point particularly for MXNet
    |-- mo_tf.py - Entry point particularly for TensorFlow
    |-- ModelOptimizer - Entry point

Inference Engine

The Inference Engine is the second of the two key components of the Intel® CV SDK.

The Inference Engine uses the Intermediate Representation files that result from running the Model Optimizer to provide an optimized C++ application. The Inference Engine helps application execution with computational graph analysis, scheduling, model compression.

The Inference Engine has:

A core library
Four hardware-specific libraries in addition to several third-party libraries
A plugin for Intel® Xeon® and Intel® Core™ processors with Intel® AVX2
A plugin for Intel Atom® processors ("CPU Plugin")
A plugin for Intel® Integrated Graphics
A plugin for the Intel® Arria® A10 GX Development Kit ("FPGA Plugin")
A plugin for the Intel® Movidius™ Myriad™ 2 Vision Processing Unit ("Myriad Plugin")

Inference Engine Workflow

A brief description of the Inference Engine workflow is:

Use the model as input. The model is in the form of Intermediate Representation (IR) that was produced by Model Optimizer.
Optimize the inference execution for target hardware.
Deliver the inference solution to one or more embedded inference platforms.

To work with the Inference Engine or work with Inference Engine samples that are provided with the Intel® CV SDK, see the Inference Engine Developer Guide.

Supported Processor Types

CPU

You have three options for using the Intel® CV SDK: CPU, FPGA, and Intel® Integrated Graphics. The CPU when used alone is the slowest of these options

Intel® Integrated Graphics

Intel® Integrated Graphics is an electronics circuit that accelerates graphic processes. Intel® Integrated Graphics offloads intensive computing parts of applications to lessen the work required by the CPU. This makes your applications run faster. Intel® Integrated Graphics is used with a CPU, not instead of the CPU.

You have three options for using the Intel® CV SDK: CPU, FPGA, and Intel® Integrated Graphics. Intel® Integrated Graphics is the second fastest of these options

FPGA

FPGA is an acronym for field programmable gate array. It is a integrated that greatly increases processing speed. You have three options for using the Intel® CV SDK: CPU, FPGA, and Intel® Integrated Graphics, and FPGA. FPGA is the fastest of these options

Terminology

To help you understand the components and concepts that the Intel® CV SDK uses, take a brief glance at this terminology:

Note: Links open in a new window.

Term	Description
API	Application programming interface. You use an API to tell your program how to communicate with a image or video device and which actions to perform based on what the device sees in the image or video.
Caffe*	Caffe is a popular open-source framework that was developed at UC Berkeley. It can be used on both Linux and Windows. This framework provides a way to switch from a CPU to Intel® Integrated Graphics by setting a flag on a device that includes Intel® Integrated Graphics. For more information, see http://caffe.berkeleyvision.org/
CNN	CNN is an acronym for convolutional neural network. CNNs are successful in identifying specific objects in images or videos.
CHW, NC, C	Tensor memory layout. Example: the CHW value at index (c, h, w) is physically located at index (c * H + h) * W + w for others by analogy.
Computer vision	Uses computers to get information from digital images or videos to automate tasks, such as counting vehicles driving past an intersection.
CPU	Central processor unit. You have three options for using the Intel® CV SDK: CPU, FPGA, and Intel® Integrated Graphics. The CPU when used alone is the slowest of these options.
Deep learning, sometimes called DL
Framework	A framework is composed of libraries and software that you use to create, train and deploy your computer vision model. A framework provides a way to build and deploy images. For the Intel CV SDK, a framework gives you a way to work with the images or videos from your digital camera or recorder. Companies or individuals can provide custom frameworks to perform specific tasks by supporting specific programs, compilers, libraries, tools, and APIs. The Intel® CV SDK supports Caffe, TensorFlow Intel® Movidius™ Myriad™ 2 Vision Processing Unit.
FP16 format	Half-precision floating-point format
FP32 format	Single-precision floating point format
FPGA	FPGA is an acronym for field programmable gate array. It is a integrated that greatly increases processing speed. You have three options for using the Intel® CV SDK: CPU, FPGA, and Intel® Integrated Graphics, and FPGA. FPGA is the fastest of these options. For more information about the Intel® FPGA that is supported by the Intel® CV SDK, see https://www.altera.com/products/fpga/arria-series.html
Intel® Integrated Graphics	Graphics processor unit. Intel® Integrated Graphics is an electronics circuit that accelerates graphic processes. Intel® Integrated Graphics offloads intensive computing parts of applications to lessen the work required by the CPU. This makes your applications run faster. Intel® Integrated Graphics is used with a CPU, not instead of the CPU. You have three options for using the Intel® CV SDK: CPU, FPGA, and Intel® Integrated Graphics. Intel® Integrated Graphics is the second fastest of these options.
I16	2-byte unsigned integer format
Inference Engine	The second of two key components of the Intel® CV SDK.
Intel® Movidius™ Myriad 2 Vision Processing Unit	The Intel® Movidius™ Myriad 2 Vision Processing Unit gives you immediate access to its advanced vision processing core, while allowing you to develop proprietary capabilities that provide true differentiation For more information,see https://www.movidius.com/solutions/vision-processing-unit/
Intermediate Representation, sometimes referred to as IR	The Intermediate Representation consists of two files created by the Model Optimizer and used by the Inference Engine
Model	A model uses data, equations, and instructions to make predictions.
Model Optimizer	The first of two key components of the Intel® CV SDK. This is a command-line tool that converts trained models into Immediate Representation (IR) files
MXNet*	MXNet is a popular open-source framework that was developed by Apache. It can be used on both Linux and Windows. This framework provides a way to switch from CPU to Intel® Integrated Graphics. For more information, see https://mxnet.apache.org/
NCHW or NHWC	Image data layout. This refers to the representation of batches of images where: N is the number of images in a batch C is the channels H is the number of pixels in the vertical dimension W is the number of pixels in the horizontal dimension
OpenCL™	OpenCL is an acronym for Open Computing Language. Intel provides an implementation of OpenCL. OpenCL is useful in writing programs that run across different platforms, like CPUs and Intel® Integrated Graphics. For more information, see https://www.khronos.org/opencl/ and https://software.intel.com/en-us/intel-opencl
OpenCV*	OpenCV is an acronym for Open Source Computer Vision. OpenCV is an open-source library of programming functions that let you work with models that were created with specific frameworks, including Caffe. OpenCV works on both Windows and Linux. For more information, see https://opencv.org/
OpenVX*	OpenVX speeds is an open source utility used to improve the performance of computer vision applications, especially when used with such tasks as video surveillance, facial recognition, and tracking bodies and gestures, among others. OpenVX creates nodes that are optimized for your hardware. It handles memory management and figures out the best way to process your images. For more information, see https://www.khronos.org/openvx/
Proto file	A proto file contains data and services and is compiled with protoc. The proto file is created in the format defined by the associated protocol buffer.
Protobuf	A protobuf is a library for protocol buffers.
Protoc	A compiler that is used to generate code from proto files.
Protocal buffer	Data structures are saved and in and communicated from protocol buffers. The primary purpose of protocol buffers is in network communication. Protocol buffers are used because they are simple and fast.
TensorFlow*	TensorFlow is a popular open-source framework that was developed by Google It can be used on both Linux and Windows. For more information, see https://www.tensorflow.org/
Training	Training means teaching your computer software to correctly identify what you told it to look at in the model. Training takes place before you use the Intel® CV SDK. If you are using the Intel® CV SDK and decide you need to make changes and re-train your model, you will need to return to the application that you originally used to train your model, and then return to the Intel® CV SDK after you are done training.
U16 format	2-byte unsigned integer format
U8 format	1-byte unsigned integer format

Note: The * next to some terms indicate trademarks that belong to someone else. OpenCL and the OpenCL logo are trademarks of Apple Inc. used with permission from Khronos

Helpful Links

Note: Links open in a new window.

Intel® CV SDK Home Page: https://software.intel.com/en-us/computer-vision-sdk

Intel® CV SDK Documentation: https://software.intel.com/en-us/computer-vision-sdk/documentation/featured

Model Optimizer Developer Guide: https://software.intel.com/en-us/articles/CVSDK-ModelOptimizer

Inference Engine Developer Guide: https://software.intel.com/en-us/articles/CVSDK-InferEngine

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Arria, Core, Movidia, Movidius, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used with permission by Khronos

*Other names and brands may be claimed as the property of others.