Model Optimizer Developer Guide

Introduction

The Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.

The Model Optimizer process assumes you have a network model trained using one of the supported frameworks. The diagram below illustrates the typical workflow for deploying a trained deep learning model:

Intel Computer Vision Basic Workflow

A summary of the steps for optimizing and deploying a trained model:

Configure the Model Optimizer for your framework.
Convert a trained model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and bias values.
Test the model in the Intermediate Representation format using the Inference Engine in the target environment via provided Inference Engine validation application or sample applications.
Integrate the Inference Engine into your application to deploy the model in the target environment. See the Inference Engine Guide.

Model Optimizer Workflow

The Model Optimizer process assumes you have a network model that was trained with a supported framework. The workflow is:

Configure the Model Optimizer for the framework that was used to train the network. To perform this configuration, use the configuration bash script for Linux* OS, or the batch file for Windows* OS. The script and batch file are in: <INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites
- For Linux* OS:
```
install_prerequisites.sh
```
- For Windows* OS:
```
install_prerequisites.bat
```
For more information about configuring the Model Optimizer, see Configuring the Model Optimizer.
Provide as input a trained model that contains a specific topology and the adjusted weights and biases described in the framework-specific files.
Convert the trained model to an optimized Intermediate Representation.

The Model Optimizer produces an Intermediate Representation (IR) of the network as output. The Inference Engine reads, loads, and infers the Intermediate Representation. The Inference Engine API offers a unified API across supported Intel® platforms. The Intermediate Representation is a pair of files that describe the whole model:

.xml: Describes the network topology
.bin: Contains the weights and biases binary data

Configuring the Model Optimizer

You must configure the Model Optimizer for the framework that was used to train the model. This section tells you how to configure the Model Optimizer either through scripts or by using a manual process.

Using Configuration Scripts

You can either configure all three frameworks at the same time, or install an individual framework. The scripts install all required dependencies and provide the fastest and easiest way to configure the Model Optimizer.

To configure all three frameworks: Go to the <INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites directory and run:

For Linux*:
```
install_prerequisites.sh
```
For Windows*:
```
install_prerequisites.bat
```

To configure a specific framework: Go to the <INSTALL_DIR>/model_optimizer/install_prerequisites directory and run:

CAFFE* NOTE: By default, you do not need to install Caffe to create an Intermediate Representation for a Caffe model unless you use Caffe for custom layer shape inference and do not write Model Optimizer extensions. To learn more about implementing Model Optimizer custom operations and the limitations of using Caffe for shape inference, see Caffe Models with Custom Layers.

TENSORFLOW* NOTE: To offload part of the inference to the TensorFlow framework, additional configuration steps are required:

For Caffe on Linux:
```
install_prerequisites_caffe.sh
```
For Caffe on Windows:
```
install_prerequisites_caffe.bat
```
For TensorFlow on Linux:
```
install_prerequisites_tf.sh
```
For TensorFlow on Windows:
```
install_prerequisites_tf.bat
```
For MXNet on Linux:
```
install_prerequisites_mxnet.sh
```
For MXNet on Windows:
```
install_prerequisites_mxnet.bat
```

Using a Manual Configuration Process

If you prefer, you can manually configure the Model Optimizer for one framework at a time.

Go to the Model Optimizer directory:

cd <INSTALL_DIR>/deployment_tools/model_optimizer/

Strongly recommended for all global Model Optimizer dependency installations: Create and activate a virtual environment. While not required, this option is strongly recommended since the virtual environment creates a Python* sandbox, and dependencies for the Model Optimizer do not influence the global Python configuration, installed libraries, or other components. In addition, a flag ensures that system-wide Python libraries are available in this sandbox:
- Create a virtual environment:
```
virtualenv -p /usr/bin/python3.5 .env3 --system-site-packages
```
- Activate the virtual environment:
```
virtualenv -p /usr/bin/python3.5 .env3/bin/activate
```
Install all dependencies or only the dependencies for a specific framework:
- To install dependencies for all frameworks:
```
pip3 install -r requirements.txt
```
- To install dependencies only for Caffe:
```
pip3 install -r requirements_caffe.txt
```
- To install dependencies only for TensorFlow:
```
pip3 install -r requirements_tensorflow.txt
```
- To install dependencies only for MXNet:
```
pip3 install -r requirements_mxnet.txt
```

Using the protobuf Library in the Model Optimizer, for Caffe* on Windows*

These procedures require:

Access to github and the ability to use git commands
Microsoft Visual Studio* 2013 for Win64*
C/C++

The Model Optimizer uses the protobuf library to load trained Caffe models. By default, the library executes the pure Python* language implementation, which is slow. These steps implement the faster, C implementation of the protobuf library on Windows OS or Linux OS.

Building the protobuf Library on Windows* OS

Clone protobuf:

git clone https://github.com/google/protobuf.git
cd protobuf

Create a Visual Studio solution file. Run these two commands:

C:\Path\to\protobuf\cmake\build>mkdir solution
cd solution C:\Path\to\protobuf\cmake\build\solution
cmake -G "Visual Studio 12 2013 Win64" ../..

Change the runtime library option for libprotobuf and libprotobuf-lite:
- Open the project's Property Pages dialog box.
- Expand the C/C++ tab.
- Select the Code Generation property page.
- Change the Runtime Library property to Multi-thread DLL (/MD).
Build the libprotoc, protoc, libprotobuf, and libprotobuf-lite projects in the Release configuration.
Add a path to the build directory to the PATH environment variable:
```
set PATH=%PATH%;C:\Path\to\protobuf\cmake\build\solution\Release
```
Go to the python directory:
```
cd C:\Path\to\protobuf\python
```
Use a text editor to open and change these setup.py options:
- Change from libraries = ['protobuf']
  to libraries = ['libprotobuf', 'libprotobuf-lite']
- Change from extra_objects = ['../src/.libs/libprotobuf.a', '../src/.libs/libprotobuf-lite.a']
  to extra_objects = ['../cmake/build/solution/Release/libprotobuf.lib', '../cmake/build/solution/Release/libprotobuf-lite.lib']
Build the Python package with the CPP implementation:
```
python setup.py build –cpp_implementation
```

Install the Python package with the CPP implementation:

python -m easy_install dist/protobuf-3.5.1-py3.5-win-amd64.egg

Set an environment variable to boost the protobuf performance:
```
set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
```

How the Model Optimizer Works

The Model Optimizer loads a model into memory, reads it, builds the internal representation of the model, optimizes it and produces the Intermediate Representation. The Intermediate Representation is the only format the Inference Engine accepts.

NOTE: The Model Optimizer does not infer models. The Model Optimizer is an offline tool that runs before the inference takes place.

The Model Optimizer has two main purposes:

Produce a valid Intermediate Representation. If this main conversion artifact is not valid, the Inference Engine can not run. The primary responsibility of the Model Optimizer is to produce the two files that form the Intermediate Representation.
Produce an optimized Intermediate Representation. Pretrained models contain layers that are important for training, such as the dropout layer. These layers are useless during inference and might increase the inference time.
In many cases, these layers can be automatically removed from the resulting Intermediate Representation. However, if a group of layers can be represented as one mathematical operation, and thus as a single layer, the Model Optimizer recognizes such patterns and replaces these layers with one. The result is an Intermediate Representation that has fewer layers than the original model. This decreases the inference time.

To produce a valid Intermediate Representation, the Model Optimizer must be able to read the original model layers and to handle their properties and represent them in Intermediate Representation format, while maintaining validity of the resulting Intermediate Representation.

For example, according to the catalog of Intermediate Representation layers, every layer must have an output. The layer output is represented in the Intermediate Representation by the output blob dimensions.

What You Need to Know About Your Model

Many common layers exist across known frameworks and neural network topologies. Examples of these layers are Convolution, Pooling, and Activation. To read the original model and produce the Intermediate Representation of a model, the Model Optimizer must be able to work with these layers.

The layer list varies by framework. See the Caffe*, TensorFlow* and MXNet* documentation for the topologies supported by each of these frameworks. If your topology contains only layers from the list of layers, as is the case for the topologies used by most users, the Model Optimizer easily creates the Intermediate Representation, after which you can proceed to working with the Inference Engine.

However, if you use a topology with layers that are not recognized by the Model Optimizer out of the box. See Custom Layers in the Model Optimizer to learn how to work with custom layers.

Model Optimizer Directory Structure

The Model Optimizer directory has the following structure:

|-- model_optimizer
    |-- extensions
        |-- front/caffe
            |-- CustomLayersMapping.xml.example - example of file for registering custom Caffe layers (compatible with the 2017R3 release)
    |-- mo
        |-- back - Back-End logic: contains IR emitting logic
        |-- front - Front-End logic: contains matching between Framework-specific layers and IR specific, calculation
        of output shapes for each registered layer
        |-- graph - Graph utilities to work with internal IR representation
        |-- middle - Graph transformations - optimizations of the model
        |-- pipeline - Sequence of steps required to create IR for each framework
        |-- utils - Utility functions
    |-- tf_call_ie_layer - Source code that enables TensorFlow fallback in Inference Engine during model inference
    |-- mo.py - Centralized entry point that can be used for any supported framework
    |-- mo_caffe.py - Entry point particularly for Caffe
    |-- mo_mxnet.py - Entry point particularly for MXNet
    |-- mo_tf.py - Entry point particularly for TensorFlow
    |-- ModelOptimizer - Entry point particularly for Caffe that contains same CLI as 2017R3 publicly released Model Optimizer

Custom Layers in the Model Optimizer

The Model Optimizer searches for each layer of the input model in the list of known layers before building the model's internal representation, optimizing the model, and producing the Intermediate Representation.

The list of known layers is different for each of supported frameworks. To see the layers supported by your framework, see the Caffe*, TensorFlow* or MXNet* documentation.

Custom layers are layers that are not included into a list of known layers. If your topology contains any layers that are not in the list of known layers, the Model Optimizer classifies them as custom.

Caffe Models with Custom Layers

You have two options if your Caffe model has custom layers:

Register the custom layers as extensions to the Model Optimizer. For instructions, see Extending Model Optimizer with New Primitives. When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You only need to write a small chunk of Python* code that lets the Model Optimizer:
- Generate a valid Intermediate Representation according to the rules you specified
- Be independent from the availability of Caffe* on your computer
Register the custom layers as Custom and use the system Caffe to calculate the output shape of each Custom Layer, which is required by the Intermediate Representation format. For this method, the Model Optimizer requires the Caffe* Python* interface on your system. When registering the custom layer in the CustomLayersMapping.xml file, you can specify if layer parameters should appear in Intermediate Representation or if they should be skipped. To read more about the expected format and general structure of this file, see Legacy Mode for Caffe Custom Layers. This approach has several limitations:
- If your layer output shape depends on dynamic parameters, input data or previous layers parameters, calculation of output shape of the layer via Caffe can be incorrect. In this case, you need to patch Caffe on your own.
- If the calculation of output shape of the layer via Caffe fails inside the framework, Model Optimizer is unable to produce any correct Intermediate Representation and you also need to investigate the issue in the implementation of layers in the Caffe* and patch it.
- You are not able to produce Intermediate Representation on any machine that does not have Caffe installed. If you want to use Model Optimizer on multiple machines, your topology contains Custom Layers and you use CustomLayersMapping.xml to fallback on Caffe, you need to configure Caffe on each new machine.
For these reasons, it is best to use the Model Optimizer extensions for Custom Layers is the preferable option: you do not depend on the framework and fully control the workflow.

If your model contains Custom Layers, It is important to understand the internal workflow of Model Optimizer. Consider the following example.

Example:

The example network has:

One input layer (#1)
One output Layer (#5)
Three internal layers (#2, 3, 4)

The custom and standard layer types are:

Layers 2 and 5 are implemented as Model Optimizer extensions
Layers 1 and 4 are supported in Model Optimizer out-of-the box
Layer 3 is neither in the list of supported layers nor in extensions, but is specified in CustomLayersMapping.xml

NOTE: If any of the layers are not in one of three categories described above, the Model Optimizer fails with an appropriate message and a link to the corresponding question in Model Optimizer FAQ.

The general process is as shown:

Example custom layer network

The example model is fed to Model Optimizer that loads the model with the special parser, built on top of caffe.proto file. In case of failure, Model Optimizer asks you to prepare the parser that can read the model. For more information, refer to Model Optimizer, FAQ #1.
The Model Optimizer Extracts the attributes of all layers. In particular, it goes through the list of layers and attempts to find the appropriate extractor. In order of priority, Model Optimizer checks if the layer is:
- Registered in CustomLayersMapping.xml
- Registered as a Model Optimizer extension
- Registered as a standard Model Optimizer layer.
When the Model Optimizer finds a satisfied condition from the list above, it extracts the attributes according to the following rules:
- For bullet #1 - either takes all parameters or no parameters, according to the content of CustomLayersMapping.xml
- For bullet #2 - takes only those parameters specified in the extension
- For bullet #3 - takes only those parameters specified in the standard extractor
The Model Optimizer calculates the output shape of all layers. The logic is the same as it is for the priorities. Important: the Model Optimizer always takes the first available option.
The Model Optimizer optimizes the original model and produces the Intermediate Representation.

Extending the Model Optimizer with New Primitives

This section explains how to register a custom layer in the Model Optimizer, including how to register Proposal as a custom layer. This section also demonstrates how Proposal works as a custom layer.

The Model Optimizer loads the model, goes through the topology, and tries to find each layer type in the list of known layers. If the Model Optimizer does not find a layer in that list, it looks for the layer in the list of custom layers. If the Model Optimizer fails to find the layer among the defined custom layers, it registers a Caffe* fallback for for the output shape inference. If the Model Optimizer does not find Caffe* and cannot infer shapes, the Model Optimizer fails with an appropriate message.

You must know two things about custom layers with the Model Optimizer:

How to map a subgraph in a FW model to a subgraph consisting of Inference Engine layers. For Caffe*, the subgraph is a 1-to-1 mapping of a Caffe layer to an Inference Engine layer.
How to infer shapes for unknown subgraphs. This can be either for a step in which the internal representation consists of framework-specific layers, or for a step in which the internal representation consists of Inference Engine layers.

You also have the option of a framework fallback for unknown subgraphs, for when the original framework is used for inference of output shapes of operations. The example below demonstrates the case in which the framework is not available or should not be used.

Preparing an Example Topology

NOTE: Skip this section if you have a topology with a layer that is not known to the Model Optimizer.

The information in this section prepares a Caffe model with the provided, deployment-ready prototxt for a well-known topology called Faster-R-CNN to demonstrate the workflow. To use this example, you must have weights and biases for inference.

Download the .caffemodel file.
Run the Model Optimizer on the .caffemodel file:
```
python mo.py --input_model ZF_faster_rcnn_final.caffemodel --input_proto test.prototxt
```
You will likely see the error message:
```
Error parsing text-format caffe.NetParameter: 196:16: Message type "caffe.DropoutParameter" has no field named "scale_train".
```
Whether you see the error depends on your Caffe version. For example, BVLC Caffe does not support the boolean parameter scale_train for the dropout layer. The error message does not matter because the dropout layer is needed only for training, and the Model Optimizer removes it.

Comment out these lines in test.prototxt:

...
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
    # scale_train: false # <-------------- comment out this line
  }
}
...
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
    # scale_train: false # <-------------- comment out this line
  }
}
...

Run the Model Optimizer on this model again:

python mo.py --input_model ZF_faster_rcnn_final.caffemodel --input_proto test.prototxt

You will see the message:

[ ERROR ]  Found custom layer proposal. Model Optimizer does not support this layer. 
Please, register it in CustomLayersMapping.xml or implement extension. 
For more information please refer to Model Optimizer FAQ, question #45.

This message means the Model Optimizer can load the model, but is unable to infer the shape and handle the custom layer properties.

Registering a Custom Layer as a Model Optimizer Extension

In the following sections, you will learn how to make the Model Optimizer independent from Caffe* when processing a model that has a custom layer. In this example, the custom layer is referred to as the Proposal layer.

Use this section to implement the mapping rules for the Proposal layer attributes and the output shape calculation. As part of these steps, you must first create a class for the Proposal layer and inherit it from general-purpose Op that defines the interface of every new custom layer.

In this section, it is important to understand the Op class and its function. The implementation of this class shows that it expects a graph and attributes to be passed when initializing. The graph and attributes are in PATH_TO_MO/mo/ops/op.py

Op is keeps the attributes for each operation and contains logic for handling node creation for internal model representation. Op is responsible for dumping each particular operation to the XML format for the Intermediate Representation. By inheriting from it, the technical items are complete and you concentrate on the specificity of this layer: the attributes it supports and the rules on computing its output shape.

Follow these steps:

Create the file python_proposal.py in the directory extensions/ops:
```
from mo.ops.op import Op
class PythonProposalOp(Op):
    pass
```

Define the name of the operation and make a stub constructor:

from mo.ops.op import Op
class PythonProposalOp(Op):
    op = 'Python'
    def __init__(self, graph, attrs):
        super().__init__(graph)

Every Op must have three specific fields defined: type, op, and infer. In most cases, the type and op names are the same, and infer is defined as a function to compute the output shape. Reflect these fields in your constructor:
```
from mo.ops.op import Op
class PythonProposalOp(Op):
    op = 'Python'
    def __init__(self, graph, attrs):
        mandatory_props = {
            'type': __class__.op,
            'op': __class__.op,
            'infer': None
        }
        super().__init__(graph, mandatory_props, attrs)
```
According to the Intermediate Representation catalog, Proposal has the attributes:
- pre_nms_topn
- post_nms_topn
- nms_thresh
- feat_stride
- min_size
- base_size
- ratio
- scale
In defining supported attribute names, it is best to use the same names as are used in the original models. The names are similar to parameters, and have no connection with the model layer properties. For clarity, you can use the name my_ratio for ratio. Other than defining the list of supported parameters, you can define only the parameters that appear in the Intermediate Representation in the backend_attrs method.
Define your attributes:
```
class PythonProposalOp(Op):
    # ... constructor
     def supported_attrs(self):
            return [
                'pre_nms_topn',
                'post_nms_topn',
                'nms_thresh',
                'feat_stride',
                'min_size',
                'base_size',
                'ratio',
                'scale'
            ]
```

The Model Optimizer now knows how to create the layer called Proposal when it is in the topology and the Model Optimizer knows what attributes this layer has. However, the Model Optimizer does not know how to calculate the output shape of this operation. Define a rule to calculate the output shape:

import numpy as np
from mo.graph.graph import Node
from mo.ops.op import Op
class PythonProposalOp(Op):
   def __init__(self, graph, attrs):
       mandatory_props = {
           'type': __class__.op,
           'op': __class__.op,
           'infer': PythonProposalOp.calculate_output_shape
       }
       super().__init__(graph, mandatory_props, attrs)
    # ... supported attrs
    @staticmethod
    def calculate_output_shape(node: Node):
        node.out_node().shape = (1, 1, 1, 1) # any Proposal now has always the same output

According to the Intermediate Representation catalog, Proposal has the following output calculation formula, where shape dynamically depends on the post_nms_topn parameter.
Implement the output calculation formula in Python:

import numpy as np
class PythonProposalOp(Op):
    # ... static fields
    # ... constructor
    # ... supported attrs
    @staticmethod
    def calculate_output_shape(node: Node):
        input_shape = node.in_node(0).shape
        out_shape = np.array([0, 0], dtype=np.int64)
        # rois blob: holds R regions of interest, each is a 5 - tuple
        # (n, x1, y1, x2, y2) specifying an image batch index n and a
        # rectangle(x1, y1, x2, y2)
        out_shape[0] = input_shape[0] * node.post_nms_topn
        out_shape[1] = 5
        node.out_node(0).shape = out_shape

The node does not contain this parameter because it should be initialized in the constructor and in other parameters. The Inference Engine contains the implementation of a Caffe-like Proposal layer and works well with the default values from caffe.proto:

// Message that stores parameters used by ProposalLayer message ProposalParameter { optional uint32 feat_stride = 1 [default = 16]; optional uint32 base_size = 2 [default = 16]; optional uint32 min_size = 3 [default = 16]; repeated float ratio = 4; repeated float scale = 5; optional uint32 pre_nms_topn = 6 [default = 6000]; optional uint32 post_nms_topn = 7 [default = 300]; optional float nms_thresh = 8 [default = 0.7]; }

Change the constructor as follows:

class PythonProposalOp(Op):
    # ... static fields
    def __init__(self, graph, attrs):
        mandatory_props = {
            'type': __class__.op,
            'op': __class__.op,
            'feat_stride': 16,
            'base_size': 16,
            'min_size': 16,
            'ratio': [0.5, 1, 2],
            'scale': [8, 16, 32],
            'pre_nms_topn': 6000,
            'post_nms_topn': 300,
            'nms_thresh': 0.7,
            'infer': PythonProposalOp.calculate_output_shape
        }
        super().__init__(graph, mandatory_props, attrs)
    # ... supported attrs
    # ... calculate output shape

Summary

In this section you implemented support for a custom layer with type 'Python' that is 'Proposal' layer in the topology. You learned how to calculate output shape of this layer.

The values of attributes are hardcoded, and in the next section you will learn how to extract these values from original framework model.

Registering Rules to pass Extension Layer Properties from a Caffe* Model to the Intermediate Representation

The Model Optimizer now knows how to set the shape of the PythonProposalOp operation, but it is incorrect to initialize attributes with same values for every operation. Instead, the values should be extracted from the original topology. The Model Optimizer does not know how to map the custom layer properties to the PythonProposalOp. For this, you must register the FrontExtractorOp instance.

NOTE: This step is required only if the layer requires parameters from the original model.

Create the file python_proposal_ext.py in the folder PATH_TO_MO/extensions/front/caffe

from mo.front.extractor import FrontExtractorOp
class PythonProposalFrontExtractor(FrontExtractorOp):
    pass

Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing:
```
from mo.front.extractor import FrontExtractorOp
class PythonProposalFrontExtractor(FrontExtractorOp):
    op = 'Python'
    enabled = True
```

Register a mapping rule between the original model and the PythonProposalOp attributes, by overriding the following function:

from mo.front.extractor import FrontExtractorOp
from mo.ops.op import Op
class PythonProposalFrontExtractor(FrontExtractorOp):
    op = 'Python'
    enabled = True
    @staticmethod
    def extract(node):
        proto_layer = node.pb
        param = proto_layer.python_param # each layer has a specific parameter, take a look at caffe.proto
        python_params = str(param.param_str) # for Python layers, all params are in param_str
        attrs = {
            'feat_stride': int(python_params.split(':')[-1])
        }
        # update the attributes of the node
        Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
        return __class__.enabled

You have successfully extracted the parameter feat_stride from prototxt, assuming it is the only parameter in this layer.

To increase the implementation's flexibility:

import ast
from mo.front.extractor import FrontExtractorOp
from mo.ops.op import Op
class PythonProposalFrontExtractor(FrontExtractorOp):
    op = 'Python'
    enabled = True
    @staticmethod
    def extract(node):
        proto_layer = node.pb
        param = proto_layer.python_param
        attrs = PythonProposalFrontExtractor.parse_param_str(str(param.param_str))
        # update the attributes of the node
        Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
        return __class__.enabled
    @staticmethod
    def parse_param_str(param_str: str):
        if param_str[0] != '{' and param_str[-1] != '}':
            param_str = '{' + param_str + '}'
        return ast.literal_eval(param_str)

You can successfully convert the model. Open the .xml file and view your code:

...
<layer id="42" name="proposal" precision="FP32" type="Python">
    <data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.7" post_nms_topn="300" pre_nms_topn="6000" ratio="[0.5, 1, 2]" scale="[8, 16, 32]"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>18</dim>
            <dim>15</dim>
            <dim>15</dim>
        </port>
        <port id="1">
            <dim>1</dim>
            <dim>36</dim>
            <dim>15</dim>
            <dim>15</dim>
        </port>
        <port id="2">
            <dim>1</dim>
            <dim>3</dim>
        </port>
    </input>
    <output>
        <port id="3">
            <dim>300</dim>
            <dim>5</dim>
        </port>
    </output>
</layer>
...

Look at the output shape of the custom layer you implemented. The shape was calculated according to the rules specified in PythonProposalOp.The ratio and scale properties have the value [0.5, 1, 2] and [8, 16, 32]. They have square brackets because they are originally a repeated parameter. You converted the parameter to a list in PythonProposalOp. The Model Optimizer cast the value to a string. According to Python rules, a list has a string representation of opening and closing square brackets and values joined by commas.

This is not a valid notation for the Intermediate Representation specification, because repeated parameters must be separated by a comma but without the brackets. Therefore, you must override the Model Optimizer default behavior regarding how it handles those parameters during the Intermediate Representation emitting stage, after the optimizations are complete. To do so, implement backend_attrs() in the PythonProposalOp class:

class PythonProposalOp(Op):
    ... other methods
     def backend_attrs(self) -> list:
            """
            Gets list of attributes that should appear in resulting IR
            Returns:
                list of attributes names or list of tuples (name of attribute, pre-processing rule)
            """
            return [
                (  # a tuple per attribute
                    'ratio',  # name of attribute
                    # pre-processing rule in a form of lambda
                    # lambda takes a PythonProposalOp node with all defined properties
                    # it translates [1,2,3] -> "1,2,3"
                    lambda node: ','.join(map(str, node['ratio']))
                ),
                (
                    'scale',
                    lambda node: ','.join(map(str, node['scale']))
                ),
                'feat_stride',
                'base_size',
                'min_size',
                'pre_nms_topn',
                'post_nms_topn',
                'nms_thresh'
            ]

The model can now be successfully converted.

Open the .xml file. ratio and scale have the expected correct values 0.5,1,2 and 8,16,32.

NOTE: The Model Optimizer supports the Faster-R-CNN topology. Run the following command for the same Intermediate Representation:

Summary

In this section you learned how to:

Create a framework-independent extension implementation of the Intermediate Representation custom layer with unified logic for calculating output shapes, specified set of attributes, and so on;
Use the Framework-Specific property extractor to map original model custom layer properties to the expected properties of the Framework-Independent extension;
Manipulate the custom layer properties representation in the resulting Intermediate Representation.

Files used in this section:

extensions/ops/python_proposal.py:

import numpy as np
from mo.graph.graph import Node
from mo.ops.op import Op
class PythonProposalOp(Op):
    op = 'Python'
    def __init__(self, graph, attrs):
        mandatory_props = {
            'type': __class__.op,
            'op': __class__.op,
            'feat_stride': 16,
            'base_size': 16,
            'min_size': 16,
            'ratio': [0.5, 1, 2],
            'scale': [8, 16, 32],
            'pre_nms_topn': 6000,
            'post_nms_topn': 300,
            'nms_thresh': 0.7,
            'infer': PythonProposalOp.calculate_output_shape
        }
        super().__init__(graph, mandatory_props, attrs)
    def supported_attrs(self):
        return [
            'pre_nms_topn',
            'post_nms_topn',
            'nms_thresh',
            'feat_stride',
            'min_size',
            'base_size',
            'ratio',
            'scale'
        ]
    def backend_attrs(self) -> list:
        """
        Gets list of attributes that should appear in resulting IR
        Returns:
            list of attributes names or list of tuples (name of attribute, pre-processing rule)
        """
        return [
            (  # a tuple per attribute
                'ratio',  # name of attribute
                # pre-processing rule in a form of lambda
                # lambda takes a PythonProposalOp node with all defined properties
                # it translates [1,2,3] -> "1,2,3"
                lambda node: ','.join(map(str, node['ratio']))
            ),
            (
                'scale',
                lambda node: ','.join(map(str, node['scale']))
            ),
            'feat_stride',
            'base_size',
            'min_size',
            'pre_nms_topn',
            'post_nms_topn',
            'nms_thresh'
        ]
    @staticmethod
    def calculate_output_shape(node: Node):
        input_shape = node.in_node(0).shape
        out_shape = np.array([0, 0], dtype=np.int64)
        # rois blob: holds R regions of interest, each is a 5 - tuple
        # (n, x1, y1, x2, y2) specifying an image batch index n and a
        # rectangle(x1, y1, x2, y2)
        out_shape[0] = input_shape[0] * node.post_nms_topn
        out_shape[1] = 5
        node.out_node(0).shape = out_shape

extensions/front/caffe/python_proposal_ext.py:

import ast
from mo.front.extractor import FrontExtractorOp
from mo.ops.op import Op
class PythonProposalFrontExtractor(FrontExtractorOp):
    op = 'Python'
    enabled = True
    @staticmethod
    def extract(node):
        proto_layer = node.pb
        param = proto_layer.python_param
        attrs = PythonProposalFrontExtractor.parse_param_str(str(param.param_str))
        # update the attributes of the node
        Op.get_op_class_by_name(__class__.op).update_node_stat(node, attrs)
        return __class__.enabled
    @staticmethod
    def parse_param_str(param_str: str):
        if param_str[0] != '{' and param_str[-1] != '}':
            param_str = '{' + param_str + '}'
        return ast.literal_eval(param_str)

Legacy Mode for Caffe* Custom Layers

The Model Optimizer can register custom layers in a way that the output shape is calculated by the Caffe* framework installed on your system. This chapter covers this option.

NOTE: The Caffe Python API has an issue when layer name does not correspond to the name of its top. The fix was implemented on BVLC Caffe*. The Caffe framework on your computer must contain this fix. Otherwise, Caffe framework can unexpectedly fail during the fallback procedure.

NOTE: The Caffe fallback feature was validated against this github revision. You may have issues with forks or later Caffe framework versions.

Create a file CustomLayersMapping.xml:

mv extensions/front/caffe/CustomLayersMapping.xml.example extensions/front/caffe/CustomLayersMapping.xml

Add (register) custom layers to CustomLayersMapping.xml:

<CustomLayer NativeType="${Type}" hasParam="${has_params}" protoParamName="${layer_param}"/>

Where:

${Type} is a type of the layer in the Caffe*;
${has_params} is "true" if the layer has parameters, and is "false" otherwise.
${layer_param} is a name of the layer parameters in caffe.proto if the layer has it.

Example:

Proposal layer has parameters and they appear in the Intermediate Representation. The parameters are stored in the proposal_param property of the layer: ```sh <CustomLayer NativeType="Proposal" hasParam ="true" protoParamName = "proposal_param"/> ```
CustomLayer layer has no parameters: ```sh <CustomLayer NativeType="CustomLayer" hasParam ="false"/> ```

For this feature, you need an appropriate version of Caffe installed on the computer on which you run the Model Optimizer.

Constraints of Using the Caffe Fallback

Several layers in the Caffe framework can have shapes that dynamically depend on the input data, not only the layers that proceed the layer and its parameters. For example, SimplerNMS is filtering out bounding boxes that do not satisfy the condition. Internally, Caffe* fallback forwards the whole net without any meaningful data - just some noise. It is natural to get only one bounding box (0,0,0,0) instead of expected number (for example 15). There is an option to patch Caffe* accordingly, however it makes success of Intermediate Representation generation on the patched Caffe* on the particular Machine. To keep the solution independent from Caffe* we recommend to use extensions mechanism for such layers.

Known cases like Proposal, DetectionOutput, SimplerNMS are implemented as extensions and can be used out of the box.

A detailed description of supported layers is in the Intermediate Representation Layers Notation Reference Catalog.

Building Caffe*

Build Caffe* with Python* 3.5:

export CAFFE_HOME=PATH_TO_CAFFE
cd $CAFFE_HOME
rm -rf  ./build
mkdir ./build
cd ./build
cmake -DCPU_ONLY=ON -DOpenCV_DIR=<your opencv install dir> -DPYTHON_EXECUTABLE=/usr/bin/python3.5 ..
make all # also builds pycaffe
make install
make runtest # optional

Add Caffe* Python directory to PYTHONPATH to let it be imported from the Python program:
```
export PYTHONPATH=$CAFFE_HOME/python;$PYTHONPATH
```
Check the Caffe* installation:
```
python3
import caffe
```

If Caffe was installed correctly, the Caffe module is imported without errors.

TensorFlow Models With Custom Layers

You have three options for TensorFlow* models with custom layers:

Register those layers as extensions to the Model Optimizer. In this case, the Model Optimizer generates a valid and optimized Intermediate Representation.
If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option. This feature is helpful for many TensorFlow models. To read more, see Sub-graph Replacement in the Model Optimizer.
Experimental feature of registering definite sub-graphs of the model as those that should be offloaded to TensorFlow during inference. In this case, the Model Optimizer produces an Intermediate Representation that:
- Can be inferred only on CPU
- Reflects each sub-graph as a single custom layer in the Intermediate Representation
For more information, see Offloading Computations to TensorFlow. This feature is for development only. It is expected to be used, when you have the model that has complex structure and it is not an easy task to write extensions for internal subgraphs. In this case, you offload these complex subgraphs to TensorFlow to make sure that Model Optimizer and Inference Engine can successfully execute your model, however for each such subgraph, TensorFlow library is called that is not optimized for inference. Then, you start replacing each subgraph with extension and remove its offloading to TensorFlow* during inference until all the model is converted by Model Optimizer and infered by Inference Engine only with the maximum performance

Sub-Graph Replacement in the Model Optimizer

Several reasons exist for why the Model Optimizer could not generate an Intermediate Representation for a model. However, in some cases, the Intermediate Representation could be generated after providing certain hints to the tool. The examples of hints below are mostly related to TensorFlow* but potentially could be actual for models created in any framework:

Topology contains an operation (or a sub-graph of operations) not known for Model Optimizer, but this operation (sub-graph) could be expressed as a combination of known operations (so hint would be a description of this combination to the tool);
Sub-graph of operations in the topology expresses a single layer known for Inference Engine;
TensorFlow* and Inference Engine use different layout of tensors, NHWC and NCHW respectively. If some tensor in NHWC layout is flattened (e.g. all the dimensions are squashed into single dim) then it is not possible to convert it to NCHW layout required for Inference Engine, so Model Optimizer could not produce correct Intermediate Representation.

The detailed solutions for the examples above are given later but for now let's find what is common in all three examples.

Sub-graph Replacement

The sub-graph (or a single node) of initial graph is replaced with a new sub-graph (single node) in these cases. The sub-graph replacement consists of the following steps:

Identify an existing sub-graph for replacement.
Generate a new sub-graph.
Connect a new sub-graph to the graph (create input/output edges to the new sub-graph).
Create output edges out of a new sub-graph to the graph.
Do something with the original sub-graph (e.g. remove).

Model Optimizer provides several ways to perform most of the sub-graph replacement steps. The next subsections describe these methods.

Replace a Single Operation With a Sub-graph of Operations

For example, there is an operation "SquaredDifference" in TensorFlow* which calculates (a - b)^2, where a and b are input tensors. Inference Engine does not support such operation. However, SquaredDifference could be expressed using two Power operations and one Eltwise Add. The Power operation calculates scale * (a ^ power) + shift where a is a tensor and scale, power and shift are float values. The first Power operation negates the value of tensor b. The second one is used to square the result of a + (- b) which is calculated using the elementwise Add operation applied to tensor a and tensor -b.

Given that, we can replace all SquaredDifference operations in initial model with two Power and one Eltwise. Now let's take a look at implementation of that replacer in the following file extensions/front/SquaredDifference.py.

import networkx as nx
from mo.front.common.replacement import FrontReplacementOp
from mo.graph.graph import Node
from mo.ops.eltwise import Eltwise
from mo.ops.power import Power
class SquaredDifference(FrontReplacementOp):
    """
    Example class illustrating how to implement replacement of a single op in the front-end of the MO pipeline.
    This class replaces a single op "SquaredDifference" by a sub-graph consisting of 3 lower-level ops.
    """
    op = "SquaredDifference"
    enabled = True
    def replace_op(self, graph: nx.MultiDiGraph, node: Node):
        negate = Power(graph, dict(scale=-1, name=node.name + '/negate_'))
        add = Eltwise(graph, dict(operation='sum', name=node.name + '/add_'))
        squared = Power(graph, dict(power=2, name=node.name + '/squared_'))
        out_node = squared.create_node([add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])])
        # Replace edge from out port 0 of the matched node with a edge from node out_node.id with port 0.
        # The "explicit" version of the return value is: [(out_node.id, 0)])
        return [out_node.id]

The Model Optimizer internal representation of the graph uses the networkx module.

Key lines:

Line 1: Imports this module.
Line 3: Imports class FrontReplacementOp that is used to replace operation of particular type with a new sub-graph. This class performs the first step of the sub-graph replacement (Identify an existing sub-graph for replacement). It is important to mention that the replacement happens before shape inference and creation of data nodes representing tensors with values. At this stage of model conversion pipeline all nodes in the graph are operation nodes or nodes of type "Const" that produce tensor with fixed value embedded into the node.
Line 4: Imports class "Node" representing a single node in the computation graph.
Lines 5 - 6: Import classes representing operations Power and Eltwise. These classes are inherited from base class "mo.ops.Op" that represents operation and stores its attributes.
Line 9: Defines class SquaredDifference inherited from FrontReplacementOp. This is a replacer class that is automatically registered and executed by Model Optimizer. Since the class is located in the common (not framework) specific directory extensions/front it is used for replacement for all supported frameworks.
Line 15: Defines the class variable op that stores the name of the operation to be replaced. In this case, it is "SquaredDifference".
Line 16: Defines class variable enabled that controls whether the replacer is enabled or not. The only function that should be implemented in the class is replace_op. It gets graph to operate on and an instance of node of desired operation ("SquaredDifference" in this case). This function performs step two and three of the sub-graph replacement (Generate a new sub-graph to replace with and Connect a new sub-graph to the graph).
Lines 19 - 21: Create instances of operations classes with required attributes.
Line 23: Creates a sub-graph from the operations defined above. The "create_node" method of the "Op" class generates Node from the Op and uses single mandatory argument - the list of input nodes (represented as instances of Node class) to create input edges to the node being generated. Inputs of the SquaredDifference node are retrieved using node.in_node(0) and node.in_node(1) method calls. The elementwise Add node gets first input as initial first input of "SquaredDifference" node, the second input of add is the result of negation of the second input of "SquaredDifference" node: [add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])]. Then the result of Add node is squared. "out_node" node performs this calculation.

The "replace_op" function returns a list of node names used to create output edges of the sub-graph to connect it with the rest of the graph. Each element of the list describes mapping between old output edge of the matched node and new sub-graph node and output edge index. The i-th element of the list corresponds to the i-th output tensor of the matched node. In this case, "SquaredDifference" produces single tensor through output port 0, so the returned list contains single element. In general case each element is a tuple where first element is the name of a new node producing required tensor and the second is the output port for that tensor. If the output port is 0 it is possible to use shortcut - just the name of the node instead of a tuple. Line 26 uses this shortcut. The returned values is used to create the new sub-graph output edges (step 4 of the sub-graph replacement).

Default implementation of the FrontReplacementOp class removes matched node and all its input/output edges (step five of the sub-graph replacement).

Another example of such kind of replacement is in the "extensions/front/Sub.py" class where all instances of Sub operations are replaced with two operations: Power to negate the second argument and the Eltwise to perform elementwise add.

Replace Sub-graph of Operations With a New Sub-graph of Operations

The previous example considered situation when one single node of a specific type is replaced. When it is necessary to replace a sub-graph of operations it is necessary to tell Model Optimizer how to identify this sub-graph. There are three options how to achieve that:

Use graph isomorphism pattern of the networkx module.
Use nodes name pattern to identify "scope" (according to TensorFlow* terminology) to be replaced.
Use sets of "start" and "end" node names to match all nodes "between" them.

Let's review each option based on real examples.

Replace Sub-graph of Operations Using Graph Isomorphism Pattern

The Networkx Python module provides methods to find graph isomorphic to a given one using nodes and edges match: networkx.algorithms.isomorphism.categorical_node_match, networkx.algorithms.isomorphism.categorical_multiedge_match etc. Model Optimizer uses these methods and provides simple API to use that feature.

For example, the Caffe* has layer called Mean-Variance Normalization (MVN), which is also supported by the Inference Engine. This layer is implemented with low-level operations in TensorFlow*: Mean, StopGradient, SquaredDifference, Squeeze and FusedBatchNorm. Model Optimizer should replace sub-graph with these operations with a single Inference Engine layer of type "MVN".

The file extensions/front/tf/mvn.py perform such a replacement. The first part of the file is:

class MVN(FrontReplacementSubgraph):
    enabled = True
    def pattern(self):
        log.debug('Enabled MVN replacement')
        return dict(
            nodes=[
                ('mean', dict(op='Mean')),
                ('stop_grad', dict(op='StopGradient')),
                ('sqdiff', dict(op='SquaredDifference')),
                ('variance', dict(op='Mean')),
                ('squeeze_mean', dict(op='Squeeze')),
                ('squeeze_variance', dict(op='Squeeze')),
                ('fbn', dict(op='FusedBatchNorm')),
            ],
            edges=[
                ('mean', 'stop_grad', {'in': 0}),
                ('stop_grad', 'sqdiff', {'in': 1}),
                ('sqdiff', 'variance', {'in': 0}),
                ('mean', 'squeeze_mean', {'in': 0}),
                ('variance', 'squeeze_variance', {'in': 0}),
                ('squeeze_mean', 'fbn', {'in': 3}),
                ('squeeze_variance', 'fbn', {'in': 4}),
            ],
            node_attrs=['op'],
            edge_attrs=['in'])

In this file:

Line 1: Defines class "MVN" inherited from class FrontReplacementSubgraph that performs sub-graph replacement using sub-graph isomorphism pattern.
Line 3: Sets class variable "enabled" to value True meaning that this replacer is enabled.
The function "pattern" defines the sub-graph constraints to be matched. It returns a dictionary with four keys:
- the "nodes" defines a list of nodes to be matched. Each element in the list is a tuple. The first element is the alias name assigned for the matched node; the second element is a dictionary with desired attributes of the node.
- the "edges" defines a list of edges to be matched. Each element in the list is a tuple. The first and the second elements are the start and end edge nodes alias names respectively. The third element is a dictionary with desired edge attributes.
- the "node_attrs" contains the names of nodes attributes to use during sub-graph isomorphism search.
- the "edge_attrs" contains the names of edges attributes to use during sub-graph isomorphism search.
  The sub-graph is matched if all provided constraints are satisfied. If at least one node with desired attributes is missing or at least one defined edge is absent then the sub-graph is not matched.
Line 9: Adds constraint that sub-graph should contain node with attribute "op" with value "Mean". The matched node gets an alias name "mean". The same way the line 10 add constrain for node "StopGradient", the matched node gets an alias name "stop_grad" etc.
Now look at how the edges constraints are defined. Line 18: Defines edge from node with alias name "mean" to node with alias name "stop_grad" having attribute "in" equal to 0. This means that the output of node "mean" is connected to the node "stop_grad" as a first input (Model Optimizer uses zero-based indexing that is why "in" is 0). Another example is in line 25 where the edge from "squeeze_mean" is connected to the "fbn" node as fourth input.
Lines 26 - 27: Specify a list of attributes to be checked. In fact these lists are just list of all keys in the dictionaries for node and edge attributes.

Now when the Model Optimizer knows how to find sub-graph (step one of the sub-graph replacement) it is necessary to implement function that will perform actual sub-graph replacement (step 2 and 3). The code for this function is:

def replace_sub_graph(self, graph: nx.MultiDiGraph, match: dict):
    fbn = match['fbn']
    input = fbn.in_node(0)
    log.debug('Found potential MVN pattern after {} with name {}'.format(input.op, input.name))
    if input.id != match['mean'].in_node(0).id or input.id != match['sqdiff'].in_node(0).id:
        return
    log.debug('Confirmed MVN pattern after {} with name {}'.format(input.op, input.name))
    MVN = Op.get_op_class_by_name('MVN')
    mvn = MVN(graph, dict(
        name=fbn.name + '/MVN_',
        eps=fbn.eps,
        required_reduction_indices=[1,2] if fbn.data_format == b'NHWC' else [2,3]
    ))
    mvn.attrs['old_infer'] = mvn.attrs['infer']
    mvn.attrs['infer'] = __class__.infer
    mul = Eltwise(graph, dict(operation='mul', name=fbn.name + '/Mul_'))
    add = Eltwise(graph, dict(operation='sum', name=fbn.name + '/Add_'))
    input_gamma = fbn.in_node(1)
    input_beta = fbn.in_node(2)
    mean_reduction = match['mean'].in_node(1)
    variance_reduction = match['mean'].in_node(1)
    new_subgraph = add.create_node([
        mul.create_node([
            mvn.create_node([input, mean_reduction, variance_reduction]),
            input_gamma
        ]),
        input_beta
    ])
    replace_node(fbn, new_subgraph)

The function accepts two arguments - the graph and the dictionary "match". The keys in the dictionary are the alias names of matched nodes (defined in the "nodes" list in the function "pattern") and the values are the matched node of the graph (the instance of Node object).

The function generates new sub-graph with node of type "MVN" and two nodes of the type Eltwise calculating sum and product. There is nothing interesting how the graph is generated and mathematics behind that so attention will be put to two aspects of this function.

The first one is the call to function "replace_node" in line 36. The FusedBatchNorm node is replaced with the output node of the generated sub-graph: all input edges of the FusedBatchNorm node are re-connected to the "new_subgraph" node; all consumers of the FusedBatchNorm node are updated to get inputs from the "new_subgraph" node. This action connects newly generated sub-graph with an existing graph (step 4 of the sub-graph replacement).

The second one is that the default implementation of the inference function for MVN operation is overwritten. In line 16 the default implementation of the inference function for MVN is saved to attribute "old_infer". In line 17 the new inference function is saved to the instance of the MVN operation class. Let's take a look at the new inference function code:

@staticmethod
def infer(node: Node):
    if not(node.in_node(1).has_valid('value') and node.in_node(2).has_valid('value')):
        log.warning('Reduction indices for mean and variance for MVN node {} are not constants'.format(node.name))
        return
    if not(all(node.in_node(1).value == node.required_reduction_indices) and
        all(node.in_node(2).value == node.required_reduction_indices)):
        log.warning('Reduction indices for mean {} and variance {} do not match required ones {}'.format(
            node.in_node(1).value,
            node.in_node(2).value,
            node.required_reduction_indices
        ))
        return
    node.graph.remove_edge(node.in_node(1).id, node.id)
    node.graph.remove_edge(node.in_node(2).id, node.id)
    node.old_infer(node)

The infer function is needed to infer value of the node (if it is possible) and to infer shapes of the output tensors of the node (mandatory). The custom infer function performs additional checks that describe limitations of the MVN layer implementation in Inference Engine. For example, reduction indices for mean and variance must be constants (line 10), while in TensorFlow* they could be computed during model inference. In addition, the function removes two edges from the graph (lines 17 and 18) because all required information is already stored in the MVN node attributes. This is due to different MVN layer implementation in Inference Engine and TensorFlow*: mean and variance are attributes of the node in Inference Engine while in TensorFlow* they are input tensors. Edges are not removed in the "replace_sub_graph" function because these edges are used in the "infer" function (lines 7-12).

The last action in the "infer" method (line 19) is to call default infer function for the MVN which is saved in the attribute "old_infer" of the node to infer output tensors shapes.

What about step 5 of the sub-graph replacement? What will happen with 6 matched nodes? All these nodes are automatically removed during the dead code elimination pass that is performed after applying of custom sub-graph replacements defined. 6 matched nodes are no more connected to the inputs of the network after replacing node "fbn" with a newly created sub-graph node. Since they are not marked as output nodes (using --output command line parameter) they could be removed.

The replacement works for all sub-graph isomorphism instances found in the network.

Replace Sub-graph of Operations Using Nodes Name Pattern

TensorFlow* uses mechanism of scope to group related operation nodes. It is a good practice to put nodes performing particular task into the scope. This approach divides graph into logical blocks that are easier to review in TensorBoard. The "scope" in fact just defines common prefix for the node names in the scope.

For example, Inception topologies contain several types of so-called "Inception blocks". Some of them are exactly equal to each other but located in different places of the network. For example, Inception V4 from tensorflow.contrib.slim module has inception blocks "Mixed_5b", "Mixed_5c" and "Mixed_5d" with exactly the same nodes with the same attributes.

Now consider situation when someone implemented these Inception blocks extremely efficiently using single Inference Engine custom layer called "InceptionBlock" and would like to replace these blocks with instances of the layer to decrease inference time. Model Optimizer provides mechanism to replace sub-graph of operations defined by the regular expressions for the node names prefixes (aka scope). In this particular case the some of the patterns are: ".*InceptionV4/Mixed_5b", ".*InceptionV4/Mixed_5c" and ".*InceptionV4/Mixed_5d". Each pattern starts with ".*" because a prefix "InceptionV4" is added to all nodes names during a model freeze.

The sub-graph replacement using nodes name pattern is a bit trickier than replacements of single operation and networkx isomorphism pattern described above. The following additional steps should be done in comparison with previously described replacements:

Prepare configuration file template defining node names patterns and information about custom layer attributes.
Run Model Optimizer with command line parameter to add information about input and output nodes of the specified sub-graphs.

Consider the following possible configuration file for the Inception Block replacer:

[
    {
        "custom_attributes": {
            "attr1_key": "attr1_value",
            "attr2_key": 123456
        },
        "id": "InceptionBlockReplacer",
        "op": "InceptionBlock",
        "instances": [
            ".*InceptionV4/Mixed_5b",
            ".*InceptionV4/Mixed_5c",
            ".*InceptionV4/Mixed_5d"
        ],
        "match_kind": "scope"
    }
]

The JSON file contains list of dictionaries. Each dictionary defines one replacement. Each replacement is defined with several keys:

"id" (mandatory) is the unique identifier of the replacer. It is used in the Python code that implements sub-graph replacement to link the class and the replacement description from the configuration file.
"match_kind" (mandatory) is the string that specifies what matching algorithm is used. Currently supported "scope" and "points". In this example the first one is considered. The "points" match kind is described below.
"instances" (mandatory) specifies instances of the sub-graph to be matched. It contains list of node names prefixes patterns for the match kind "scope".
"custom_attributes" (optional) is dictionary with static attributes of the layer to be dumped to Inference Engine Intermediate Representation XML file.
"op" (optional) is used only if the sub-graph replacement Python code is not needed because the sub-graph should be replaced with a single node of type "op". If this attribute is not set then it is necessary to implement Python code with sub-graph generation code. Both options are considered in this example.

When the configuration file is ready, run the Model Optimizer with regular command line parameters pointing to the file with model and input shapes (if necessary) and additional parameter "--tensorflow_custom_operations_config_update" pointing to the generated configuration file. If the file is correct then Model Optimizer adds two keys to the "InceptionBlockReplacer" dictionary: "inputs" and "outputs" with the following content:

[
    {
        "id": "InceptionBlockReplacer",
        ...
        "inputs": [
            [
                {
                    "node": "Branch_2/Conv2d_0a_1x1/Conv2D$",
                    "port": 0
                },
                {
                    "node": "Branch_3/AvgPool_0a_3x3/AvgPool$",
                    "port": 0
                },
                {
                    "node": "Branch_1/Conv2d_0a_1x1/Conv2D$",
                    "port": 0
                },
                {
                    "node": "Branch_0/Conv2d_0a_1x1/Conv2D$",
                    "port": 0
                }
            ]
        ],
        "outputs": [
            {
                "node": "concat$",
                "port": 0
            }
        ]
    }
]

The value for key "inputs" is a list of lists describing input tensors of the sub-graph. Each element of the top-level list corresponds to one unique input tensor of the sub-graph. Each internal list describes list of nodes consuming this tensor and port numbers where the tensor is consumed. Model Optimizer generates regular expressions for the input nodes names to uniquely identify them in each instance of the sub-graph defined by the "instances". Denote these nodes as input nodes of the sub-graph.

In the InceptionV4 topology, the "InceptionV4/Mixed_5b" block has four input tensors from outside of the sub-graph but all of them are produced by the node "InceptionV4/Mixed_5a/concat". Therefore, the top-level list of the "inputs" contains one list corresponding to this tensor. Four input nodes of the sub-graph consume the tensor produced by "InceptionV4/Mixed_5a/concat" node. In this case, all four input nodes consumes input tensor into port 0.

The order of items in the internal list describing nodes does not matter, but the order of elements in the top-level list is important. This order defines the order how Model Optimizer attach input tensors to a new generated node if the sub-graph is replaced with a single node. The i-th input node of the sub-graph is obtained using call "match.single_input_node(i)" in the sub-graph replacer code. More information about API is given below. The configuration file can be edited in the text-editor to change the order of input tensors if necessary.

The value for key "outputs" is a list describing nodes of the sub-graph producing tensor that goes outside of the sub-graph or do not have child nodes. Denote these nodes as output nodes of the sub-graph. The order of elements in the list is important. The i-th element of the list describes the i-th output tensor of the sub-graph which could be obtained using call "match.output_node(i)". The order of elements can be manually changed in the configuration file. Model Optimizer uses this order to connect output edges if the sub-graph is replaced with a single node.

Now when meaning of "inputs" and "outputs" attributes is clean return back to the replacer implementation. The replacer "InceptionBlockReplacer" contains attribute "op" with the value "InceptionBlock" that means that the identified sub-graph should be replaced with a single layer of type "InceptionBlock". Such a layer is not known for Model Optimizer so it is necessary to define it. See Extending the Model Optimizer with New Primitives. You must create file "extension/ops/InceptionBlock.py" with the following content:

import numpy as np
from mo.graph.graph import Node
from mo.ops.op import Op
class InceptionBlock(Op):
    op = "InceptionBlock"
    enabled = True
    def __init__(self, graph, attrs):
        super().__init__(graph, attrs, {
            'type': __class__.op,
            'op': __class__.op,
        })

The shape inference function is not defined. In this case Model Optimizer uses TensorFlow* fallback to calculate shapes of the sub-graph output tensors.

Run the Model Optimizer with the command line parameter --tensorflow_use_custom_operations_config and point to the created configuration file. Of course, regular command line parameters with path to the model file and input shape (if necessary) should be provided. Model Optimizer generates Intermediate Representation xml with three sequantial layers of type "InceptionBlock" like this:

<layer id="1658" name="InceptionBlock1877" precision="FP32" type="InceptionBlock">
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>384</dim>
            <dim>35</dim>
            <dim>35</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>384</dim>
            <dim>35</dim>
            <dim>35</dim>
        </port>
    </output>
</layer>

The implementation of the sub-graph replacement by scope with a single layer is complete. Now look how Model Optimizer replaces sub-graph identified by start/end nodes (aka "points") with another sub-graph.

Replace Sub-graph of Operations Using Points

In this scenario, for the matching algorithm user defines the sub-graph via a set of "start" and "end" nodes. Given the set, the Model Optimizer performs the following steps:

Starts graph traversal from every start nodes following the direction of the graph edges. The search stops in end nodes or in case of nodes without further children. All visited nodes are added to the matched sub-graph.
Starts another graph traversal from each non-start node of the sub-graph, i.e. every node except nodes from "start" set. In this step the edges are traversed in the opposite edge direction. All newly visited nodes are added to the matched sub-graph. This step is needed to add nodes required for calculation values of internal nodes of the matched sub-graph.
Checks that all "end" nodes were reached from "input" nodes. If no then exit with error.
Check that there are no "Placeholder" operations among added nodes. If it is not true then some side branch of the sub-graph (added in step 2) depends on inputs of the network. Such configuration is not correct so exit with error.

This algorithm finds all nodes "between" start and end nodes. Also nodes needed for calculation of non-input nodes of the matched sub-graph produce constant values because they do not depend on input of the network. This sub-graph match has a limitation that each start node must have only one input. Therefore, it is not possible to specify, for example, convolution node as input because it has two inputs: data tensor and tensor with weights.

For example of replacement with points, see Case Study: Converting SSD Models Created With a TensorFlow Object Detection API.

Offloading Computations to TensorFlow

The Model Optimizer can't generate an Intermediate Representation from unsupported TensorFlow operations, as is the case with some custom layers. However, you can still successfully create an Intermediate Representation if you offload the unsupported operations to TensorFlow* for computation.

Limitations:

You can only offload operations to TensorFlow from a Linux computer.
The custom layer supports inference only on a CPU, not on Intel Integrated Graphics or on an FPGA.
The Inference Engine uses NCHW layout for tensors butTensorFlow uses usually NHWC. The Model Optimizer performs conversion between these layouts to correctly infer the model.
The Model Optimizer adds transpose operations to convert sub-graph 4D input tensors from NCHW layout to NHWC and vice versa for the output nodes. These operations are embedded in the protobuf string that describes the TensorFlow sub-graph in the Intermediate Representation .xml file.
Sometimes, this approach fails. For example, the offload convolution to TF fails because the convolution layout weights in TensorFlow don't correspond to the layout of weights in the Inference Engine. However, offloading convolution nodes plus nodes with weights succeeds because the node with weights are part of offloaded sub-graph, so there are no transposes for the weights tensor. The successful nodes are usually of type Const.

How to Build a Custom Layer to Offload Computations to TensorFlow

NOTE: You need to perform this step only once.

Clone the TensorFlow r1.4 Git repository.
Set the environment variable TF_ROOT_DIR to point to the cloned directory.
Choose one of these options:
- Run source <INSTALL_DIR>/bin/setupvars.sh
- Set the environment variable INTEL_CVSDK_DIR to point to a directory containing the inference_engine/include/ directory.
Build an Inference Engine layer with TensorFlow runtime. This might take about 20 minutes:
```
./tf_call_ie_layer/build.sh.
```
A shared library is generated:
```
$TF_ROOT_DIR/bazel-bin/tensorflow/cc/inference_engine_layer/libtensorflow_call_layer.so
```
This library is the Inference Engine custom layer, which is used to offload inference to TensorFlow*.

How to Run a Model With Operations Offloaded to TensorFlow*

Compile extensibility_sample

Run extensibility_sample:

./extensibility_sample -i <path_to_image_file> -m <path_to_IR.xml> -d CPU -l <path_to_libtensorflow_call_layer.so>

Three command-line options are available offload part of the inference to TensorFlow.

NOTE: Use the command-line options on the line with the command:

python3 mo.py --input_model model-file.pb

For example:

Use node name patterns to offload a sub-graph of operations, using the command-line option:

-- tensorflow_subgraph_patterns

This option uses a comma-separated list of regular expressions to match node names. This offload has two primary characteristics:

All nodes that match a specific regular expression are merged into a single Inference Engine node that TensorFlow* executes.
All patterns are applied independently, which means two nodes that match two different patterns are not merged into one node. For example, the option --tensorflow_subgraph_patterns "Scope_1/.*,Scope_2.*" is merged with all nodes whose names start from Scope_1/ to a new node, and all nodes whose names start from Scope_2 are merged to a different node.

Offload specific types of operations, using the command-line option:

--tensorflow_operation_patterns

This option specifies a comma-separated list of regular expressions to match node types. This offload has this primary characteristic: All nodes that match a specific regular expression are merged into a single Inference Engine node that TensorFlow* executes. For example, the following command offloads all operations of type 'Concat', 'ConcatV2', 'Add', and 'BiasAdd' to Tensorflow*:

--tensorflow_operation_patterns "Concat.*,.*Add"

Offload all unsupported operations automatically, using the command-line option:

--offload_unsupported_operations_to_tf

With this option, the Model Optimizer analyzes a network graph and finds unsupported operations. The Model Optimizer finds and offloads the connected sub-graphs of unsupported operations. The unsupported operations are offloaded to TensorFlow*.

You can use use all three options by issuing the commands in this order:

python3 mo.py --input_model model-file.pb --tensorflow_subgraph_patterns
python3 mo.py --input_model model-file.pb --tensorflow_operation_patterns
python3 mo.py --input_model model-file.pb --offload_unsupported_operations_to_tf

Case Study: Converting SSD Models Created With TensorFlow Object Detection API

As explained in Sub-graph Replacement in Model Optimizer, you have multiple ways to setup the sub-graph matching. In this example we focus on the defining the sub-graph via a set of "start" and "end" nodes. The result of matching is two buckets of nodes:

Nodes "between" start and end nodes.
Nodes connected to the first list, but just on the constant path (e.g. these nodes are not connected to the inputs of the entire graph). Let's look closer to the SSD models from the TensorFlow* detection model zoo: SSD MobileNet and SSD InceptionV2.

A distinct layer of any SSD topology is the Detection Output layer. This layer is implemented with a dozens of primitive operations in TensorFlow* while in Inference Engine it is one layer. Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations (that implement the DetectionOutput layer) with a single well-known DetectionOutput node.

The Inference Engine DetectionOutput layer consumes three tensors in the following order:

Tensor with locations of bounding boxes.
Tensor with confidences for each bounding box.
Tensor with prior boxes (aka anchors in TensorFlow* terminology).

The DetectionOutput layer produces one tensor with 7 numbers for each actual detection. There are more output tensors in the TensorFlow* Object Detection API, but the values in them are consistent with the Inference Engine ones.

The difference with other examples is that here the DetectionOutput sub-graph is replaced with a new sub-graph (not a single layer).

Look at sub-graph replacement configuration file extensions/front/tf/ssd_support.json that is used to enable two models listed above:

[
    {"custom_attributes": {"code_type": "caffe.PriorBoxParameter.CENTER_SIZE","confidence_threshold": 0.01,"keep_top_k": 200,"nms_threshold": 0.45,"pad_mode": "caffe.ResizeParameter.CONSTANT","resize_mode": "caffe.ResizeParameter.WARP"
        },"id": "TFObjectDetectionAPIDetectionOutput","include_inputs_to_sub_graph": true,"include_outputs_to_sub_graph": true,"instances": {"end_points": ["detection_boxes","detection_scores","num_detections"
            ],"start_points": ["Postprocessor/Shape","Postprocessor/Slice","Postprocessor/ExpandDims","Postprocessor/Reshape_1"
            ]
        },"match_kind": "points"
    }
]

Lines 3-10 define static attributes that will be saved as is to the Intermediate Representation XML file for layer DetectionOutput.

Lines 12 and 13 define values for attributes that should be always set to "true" for this release of the Model Optimizer. These two attributes are specific for sub-graph match by points only.

Lines 14-26 define one instance of the sub-graph to be match. It is an important difference between sub-graph matching by scope and points. Several instances could be specified for matching by scope, but matching with points allows specifying just one instance. So the full node names (not regular expressions like in case of match with scope) are specified in "instances" dictionary.

Now let's analyze the structure of the topologies generated with the Object Detection API. There are several blocks in the graph performing particular task:

"Preprocessor" block resizes, scale and subtract mean values from the input image.
"FeatureExtractor" block is a MobileNet or other backbone to extract features.
"MultipleGridAnchorGenerator" block creates initial bounding boxes locations ("anchors" )
"Postprocessor" block acts as a DetectionOutput layer. So we need to replace "Postprocessor" block with DetectionOutput layer. It is necessary to add all input nodes of the "Postprocessor" scope to the list "start_points". Consider inputs of each of these nodes:
"Postprocessor/Shape" consumes tensor with locations.
"Postprocessor/Slice" consumes tensor with confidences.
"Postprocessor/ExpandDims" consumes tensor with prior boxes.
"Postprocessor/Reshape_1" consumes tensor with locations similarly to the "Postprocessor/Shape" node. Despite the fact that the last node "Postprocessor/Reshape_1" gets the same tensor as node "Postprocessor/Shape" it must be explicitly put to the list.

Object Detection API "Postprocessor" block generates output nodes: "detection_boxes", "detection_scores", "num_detections", "detection_classes".

Now consider the implementation of the sub-graph replacer, available in the "extensions/front/tf/SSDs.py". The file is rather big so only some code snippets are used:

class PostprocessorReplacement(FrontReplacementFromConfigFileSubGraph):
    replacement_id = 'TFObjectDetectionAPIDetectionOutput'

These lines define the new PostprocessorReplacement class inherited from FrontReplacementFromConfigFileSubGraph. FrontReplacementFromConfigFileSubGraph is designed to replace sub-graph of operations described in the configuration file. There are methods to override for implementing custom replacement logic that we need:

generate_sub_graph performs new sub-graph generation and returns dictionary where key is an alias name for the node and value is a Node objects. The dictionary has the same format as parameter match in the replace_sub_graph method in the example with networkx sub-graph isomorphism pattern. This dictionary is passed as argument to the next three methods, so it should contain entries the for nodes that the functions need
input_edges_match specifies mapping between input edges to sub-graph before replacement and after replacement. The key of the dictionary is a tuple specifying input tensor of the sub-graph before replacement: sub-graph input node name and input port number for this node. The value for this key is also a tuple specifying the node where this tensor should be attached during replacement: the node name (or alias name of the node) and the input port for this node. If the port number is zero then the parameter could be omitted so the key or value is just a node name (alias). Default implementation of the method returns an empty dictionary so Model Optimizer does not create new edges.
output_edges_match returns mapping between old output edges of the matched nodes and new sub-graph node and output edge index. The format is similar to the dictionary returned in the "input_edges_match" method. The only difference is that instead of specifying input port numbers for the nodes it is necessary to specify output port number. Of course, this mapping is needed for the output nodes only. Default implementation of the method returns an empty dictionary so the Model Optimizer does not create new edges.
nodes_to_remove specifies list of nodes those Model Optimizer should remove after sub-graph replacement. Default implementation of the method removes all sub-graph nodes.

Review of the replacer code, considering details of the DetectionOutput layer implementation in Inference Engine. There are several constraints to the input tensors of the DetectionOutput layer:

The tensor with locations must be of shape [#batch, #prior_boxes * 4] or [#batch, #prior_boxes * 5] depending on shared locations between different batches or not.
The tensor with confidences must be of shape [#batch, #prior_boxes * #classes] and confidences values are in range [0, 1], i.e. passed through softmax layer.
The tensor with prior boxes must be of shape [#batch, 2, #prior_boxes * 4]. Inference Engine expects that it contains variance values which TensorFlow* Object Detection API doesn't add.

To enable these models, add Reshape operations for locations and confidences tensors, and update the values for the prior boxes to include the variance constants (they are not there in TensorFlow* Object Detection API).

Look at the generate_sub_graph method:

def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
    log.debug('PostprocessorReplacement.generate_sub_graph')
    log.debug('matched_nodes = {}'.format(match.matched_nodes_names()))
    # softmax to be applied to the confidence
    softmax_conf_op = Softmax(graph, {'axis': 2, 'nchw_layout': True})
    softmax_conf_node = softmax_conf_op.add_node(dict(name='DetectionOutput_SoftMax_conf_'))
    # IE DetectionOutput layer consumes flattened tensors
    # reshape operation to flatten locations tensor
    reshape_loc_op = Reshape(graph, {'dim': np.array([0, -1])})
    reshape_loc_node = reshape_loc_op.add_node(dict(name='DetectionOutput_Reshape_loc_'))
    # IE DetectionOutput layer consumes flattened tensors
    # reshape operation to flatten confidence tensor
    reshape_conf_op = Reshape(graph, {'dim': np.array([0, -1])})
    reshape_conf_node = reshape_conf_op.add_node(dict(name='DetectionOutput_Reshape_conf_'))
    # create Node object from Op class
    detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes)
    detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer']
    detection_output_op.attrs['infer'] = __class__.do_infer
    detection_output_node = detection_output_op.add_node(dict(name=detection_output_op.attrs['type'] + '_'))
    # create internal edges of the sub-graph. In this case we add edges to connect input port 0 and 1 of the
    # detection output with output of reshape of locations and reshape of confidence
    create_edge(softmax_conf_node, reshape_conf_node, 0, 0)
    create_edge(reshape_loc_node, detection_output_node, 0, 0)
    create_edge(reshape_conf_node, detection_output_node, 0, 1)
    return {'detection_output_node': detection_output_node, 'reshape_conf_node': softmax_conf_node,
            'reshape_loc_node': reshape_loc_node}

The method has two inputs: the graph to operate on and the instance of SubgraphMatch object which describes matched sub-graph. The latter class has several useful methods to get particular input/output Node of the sub-graph by input/output index or by node name pattern. Examples of these methods usage are given below.

Lines 6 and 7 create new instance of operation of type Softmax and graph Node object corresponding to that operation.

Lines 11-12 and 16-17 create new instance of operation of type Reshape to reshape locations and confidenses tensors correspondingly.

Lines 20-23 create new instance of operation Detection Output and graph Node object corresponding to that operation.

Lines 27-29 connect softmax node with reshape node and connect two reshaped locations and confidences tensors with Detection Output node.

Lines 30-31 define dictionary with aliases for detection output node, reshape locations and confidences nodes. These aliases are used in the "input_edges_match" and "output_edges_match" methods.

The input_edges_match method is the following:

def input_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
    locs_consumer_node, locs_consumer_node_port = match.input_nodes(0)[0]
    conf_consumer_node, conf_consumer_node_port = match.input_nodes(1)[0]
    priors_consumer_node, priors_consumer_node_port = match.input_nodes(2)[0]
    # create matching nodes for locations and confidence tensors using simple scheme "old_node_name: new_node_name"
    # which in fact means "(old_node_name, 0): (new_node_name, 0)", while first '0' means old_port and the second
    # zero defines 'new_port'.
    return {locs_consumer_node.id: new_sub_graph['reshape_loc_node'].id,
            conf_consumer_node.id: new_sub_graph['reshape_conf_node'].id,
            priors_consumer_node.id: (new_sub_graph['detection_output_node'].id, 2),
            }

The method has three parameters: input graph, match object describing matched sub-graph and new_sub_graph dictionary with alias names returned from the "generate_sub_graph" method.

Lines 2-4 initialize Node objects and input ports for these nodes where the input tensors for the sub-graph are consumed. The method match.input_nodes(ind) returns list of tuples where the first element is a Node object and the second is the input port for this node which consumes the ind-th input tensor of the sub-graph. input_points list in the configuration file defines the order of input tensors to the sub-graph. For example, the "locs_consumer_node" object of type Node is a node that consumes tensor with locations in the port with number "locs_consumer_node_port".

Lines 8-11 define dictionary with the mapping of tensors as described above. Note that the attribute "id" of the Node object contains the name of the node in the graph.

The "output_edges_match" method is the following:

def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
    # the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so we need to create only
    # one output edge match
    return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id}

The method has the same three parameters as "input_edges_match" method. The returned dictionary contains mapping just for one tensor initially produces by the first output node of the sub-graph (which is "detection_boxes" according to the configuration file) to a single output tensor of the created DetectionOutput node. In fact, it is possible to use any output node of the initial sub-graph in mapping because the sub-graph output nodes are the output nodes of the whole graph (their output is not consumed by any other nodes).

Now Model Optimizer know how to replace the sub-graph. The last step to enable the model is to cut-off some parts of the graph not needed during inference.

It is necessary to remove the Preprocessor block where image is resized. Inference Engine does not support dynamic input shapes so Model Optimizer must froze the input image size and thus resizing of the image is not necessary. This is achieved by specifying input tensor for the model using the following command line parameter: "--input=1:Preprocessor/mul". This command line option is described in Cutting Off Parts of a Model.

There are several "Switch" operations in the Postprocessor block without output edges. For example, "Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_t", "Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_f", "Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_t", "Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_f" etc.

The Model Optimizer marks these nodes as output nodes of the topology. Some part of the "Posprocessor" blocks are not removed during sub-graph replacement because of that. In order to fix this issues it is necessary to specify output nodes of the graph manually using the "--output" command line parameter.

Example Model Optimizer Command-Line for TensorFlow's SSD

The final command line to convert SSDs from the TensorFlow* Object Detection Zoo is:

./mo_tf.py --input_model=<path_to_frozen.pb> --input=1:Preprocessor/mul --input_shape="(1,300,300,3)" --tensorflow_use_custom_operations_config extensions/front/tf/ssd_support.json --output="detection_boxes,detection_scores,num_detections"

MXNet Models With Custom Layers

MXNet* models with Custom Layers are not supported, the Model Optimizer refuses to work because it does not have any option to guess how to work with these entities. The recommendation is to manually cut the model and provide Model Optimizer with this cut model.

Advanced Topics About the Model Optimizer Internals

Cutting Off Parts of a Model

Sometimes some parts of a model must be removed while the Model Optimizer is converting models to the Intermediate Representation. This chapter describes methods of doing cutting off parts of a model using Model Optimizer command-line options. Model cutting applies mostly to TensorFlow models, but is also useful for other frameworks. In this chapter, TensorFlow examples are used for illustration.

Purpose of Model Cutting

The following examples are the situations when model cutting is useful or even required:

model has pre- or post-processing parts that cannot be translated to existing IE layers;
model has a training part that is convenient to be kept in the model but it is not used during inference;
model is too complex (contains lots of unsupported operations that cannot be easily implemented as custom layers), so the complete model cannot be converted in one shot;
model is one of the supported SSD models. In this case, you need to cut a post-processing part off.
problem with model conversion in the Model Optimizer or inference in the Inference Engine occurred. To localize the issue, limit the scope for conversion by iteratively searching for problematic places in the model.
single custom layer or a combination of custom layers is isolated for debugging purposes.

Command-Line Options

Model Optimizer provides command line options --input and --output to specify new entry and exit nodes, while ignoring the rest of the model:

--input option accepts a comma-separated list of layer names of the input model that should be treated as new entry points to the model;
--output option accepts a comma-separated list of layer names of the input model that should be treated as new exit points from the model.

The --input option is required for cases unrelated to model cutting. For example, when the model contains several inputs and --input_shape or --mean_values options are used, you should use the --input option to specify the order of input nodes for correct mapping between multiple items provided in --input_shape and --mean_values and the inputs in the model. This is out of scope.

Model cutting is illustrated with Inception V1. This model is in models/research/slim repository. This section describes pre-work to prepare the model for the Model Optimizer to be ready to proceed with this chapter.

Default Behavior Without --input and --output

The input model is converted as a whole if neither --input nor --output command line options are used. All Placeholder operations in a TensorFlow graph are automatically identified as entry points. The Input layer type is generated for each of them. All nodes that have no consumers are automatically identified as exit points.

For Inception_V1, there is one Placeholder: input. If the model is viewed in the TensorBoard, the input operation is easy to find: InceptionV1 placeholder

There is only one output operation, which enclosed in a nested name scope InceptionV1/Logits/Predictions, the Reshape operation has a full name

InceptionV1/Logits/Predictions/Reshape_1. In the TensorBoard it looks the following way together with some predecessors: TensorBoard with predecessors

Convert this model:

mo.py --input_model=inception_v1.pb -b 1

The output .xml file with an Intermediate Representation contains the Input layer among other layers in the model:

<layer id="286" name="input" precision="FP32" type="Input">
    <output>
        <port id="0">
            <dim>1</dim>
            <dim>3</dim>
            <dim>224</dim>
            <dim>224</dim>
        </port>
    </output>
</layer>

The input layer is converted from the TensorFlow graph Placeholder operation input and has the same name.

The -b option is used here for conversion to override a possible undefined batch size (coded as -1 in TensorFlow models). If a model was frozen with a defined batch size, you may omit this option in all examples here.

The last layer in the model is InceptionV1/Logits/Predictions/Reshape_1, which matches an output operation in the TensorFlow graph:

<layer id="389" name="InceptionV1/Logits/Predictions/Reshape_1" precision="FP32" type="Reshape">
    <data axis="0" dim="1,1001" num_axes="-1"/>
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>1001</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>1001</dim>
        </port>
    </output>
</layer>

Due to automatic identification of inputs and outputs, you do not need to provide the --input and --output options to convert the whole model. The following commands are equivalent for the Inception V1 model:

mo.py --input_model=inception_v1.pb -b 1

mo.py --input_model=inception_v1.pb -b 1 --input=input --output=InceptionV1/Logits/Predictions/Reshape_1

The Intermediate Representations are identical for both conversions. The same is true if the model has multiple inputs and/or outputs.

Cut at the End

Now consider how to cut some parts of the model off. For the Inception V1 model, the first convolution block InceptionV1/InceptionV1/Conv2d_1a_7x7 is considered:

The first convolution block

The following command cuts the rest of the model off just after the InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu making this node the last one in the model:

mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu

Complete converted Intermediate Representation has three layers:

<?xml version="1.0" ?>
<net batch="1" name="model" version="2">
    <layers>
        <layer id="3" name="input" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>224</dim>
                    <dim>224</dim>
                </port>
            </output>
        </layer>
        <layer id="5" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution" precision="FP32" type="Convolution">
            <data dilation-x="1" dilation-y="1" group="1" kernel-x="7" kernel-y="7" output="64" pad-x="2" pad-y="2" stride="1,1,2,2" stride-x="2" stride-y="2"/>
            <input>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>224</dim>
                    <dim>224</dim>
                </port>
            </input>
            <output>
                <port id="3">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </output>
            <blobs>
                <weights offset="0" size="37632"/>
                <biases offset="37632" size="256"/>
            </blobs>
        </layer>
        <layer id="6" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
            <input>
                <port id="0">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </input>
            <output>
                <port id="1">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </output>
        </layer>
    </layers>
    <edges>
        <edge from-layer="3" from-port="0" to-layer="5" to-port="0"/>
        <edge from-layer="5" from-port="3" to-layer="6" to-port="0"/>
    </edges>
</net>

The TensorBoard picture illustrates that the original model has more nodes. Model Optimizer has fused batch normalization InceptionV1/InceptionV1/Conv2d_1a_7x7/BatchNorm to the convolution InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution. For this reason, it is not present in the final Intermediate Representation. This is not an effect of the --output option, it is a usual behavior of the Model Optimizer for batch normalizations and convolutions. The effect of the --output is that the ReLU layer becomes the last one in the converted model.

Cut From the Beginning

If you want to go further and cut the beginning of the model and leave only the ReLU layer, you can use the following command line, where --input and --output specify the same node in the graph:

mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu

The resulting Intermediate Representation looks like this:

<xml version="1.0">
<net batch="1" name="model" version="2">
    <layers>
        <layer id="0" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu/placeholder_port_0" precision="FP32" type="Input">
            <output>
                <port id="0">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </output>
        </layer>
        <layer id="2" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
            <input>
                <port id="0">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </input>
            <output>
                <port id="1">
                    <dim>1</dim>
                    <dim>64</dim>
                    <dim>112</dim>
                    <dim>112</dim>
                </port>
            </output>
        </layer>
    </layers>
    <edges>
        <edge from-layer="0" from-port="0" to-layer="2" to-port="0"/>
    </edges>
</net>

The Input layer is automatically created to feed the layer that is converted from the node specified in --input: InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu in this case. Model Optimizer does not replace the ReLU node by the Input layer, it produces such Intermediate Representation to make the node be the first executable node in the final Intermediate Representation. So the Model Optimizer creates enough number of Inputs to feed all input ports of the node that is passed in --input.

Even though --input_shape is not specified in the command line, the shapes for layers are inferred from the beginning of the original TensorFlow* model to the point at which the new input is defined. It has the same shape [1,64,112,112] as the model converted as a whole or without cutting off the beginning.

Shape Override for New Inputs

The input shape can be overridden with --input_shape. In this case, the shape is applied to the node referenced in --input, not to the original Placeholder in the model. For example, this command line

mo.py --input_model=inception_v1.pb --input_shape=[1,5,10,20] --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu

gives the following shapes in the Input and ReLU layers:

<layer id="0" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu/placeholder_port_0" precision="FP32" type="Input">
    <output>
        <port id="0">
            <dim>1</dim>
            <dim>20</dim>
            <dim>5</dim>
            <dim>10</dim>
        </port>
    </output>
</layer>
<layer id="3" name="InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu" precision="FP32" type="ReLU">
    <input>
        <port id="0">
            <dim>1</dim>
            <dim>20</dim>
            <dim>5</dim>
            <dim>10</dim>
        </port>
    </input>
    <output>
        <port id="1">
            <dim>1</dim>
            <dim>20</dim>
            <dim>5</dim>
            <dim>10</dim>
        </port>
    </output>
</layer>

An input shape [1,20,5,10] in the final Intermediate Representation differs from the shape [1,5,10,20] specified in the command line, because the original TensorFlow* model uses NHWC layout, but the Intermediate Representation uses NCHW layout. So usual NHWC to NCHW layout conversion occurred.

When --input_shape is specified, shape inference inside the Model Optimizer is not performed for the nodes in the beginning of the model that are not included in the translated region. It differs from the case when --input_shape is not specified as noted in the previous section where the shape inference is still performed for such nodes to deduce shape for the layers that should fall into the final Intermediate Representation. So --input_shape should be used for a model with a complex graph with loops, which are not supported by the Model Optimizer, to exclude such parts from the Model Optimizer shape inference process completely.

Inputs With Multiple Input Ports

There are operations that contain more than one input ports. In the example considered here, the convolution InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution is such operation. When --input_shape is not provided, a new Input layer is created for each dynamic input port for the node. If a port is evaluated to a constant blob, this constant remains in the model and a corresponding input layer is not created. TensorFlow* convolution used in this model contains two ports:

port 0: input tensor for convolution (dynamic);
port 1: convolution weights (constant).

Following this behavior, the Model Optimizer creates an Input layer for port 0 only, leaving port 1 as a constant. So the result of:

mo.py --input_model=inception_v1.pb -b 1 --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution

is identical to the result of conversion of the model as a whole, because this convolution is the first executable operation in Inception V1.

Different behavior occurs when --input_shape is also used as an attempt to override the input shape:

mo.py --input_model=inception_v1.pb--input=InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3]

An error occurs:

[ ERROR ]  Node InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution has more than 1 input and input shapes were provided.
Try not to provide input shapes or specify input port with port:node notation, where port is an integer.

For more information see FAQ #30.

In this case, when --input_shape is specified and the node contains multiple input ports, you need to specify an input port index together with an input node name. The input port index is specified in front of the node name with ':' as a separator (PORT:NODE). In the considered case, the port index 0 of the node InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution should be specified as 0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution.

Here is a correct command line:

mo.py --input_model=inception_v1.pb --input=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3]

Model Optimization Techniques

Optimizations offers methods to accelerate inference with the convolution neural networks (CNN) that do not require model retraining.

Linear Operation Fusing

Many convolution neural networks includes BatchNormalization and ScaleShift layers (ex. Resnet*, Inception*) that can be fused into previous Convolution or FullyConnected layers.

Usage

In the Model Optimizer, this optimization is turned on by default. To disable it, you can pass -–disable_fusing parameter to the Model Optimizer.

Optimization Description

This optimization method consists of three stages:

BatchNormalization and ScaleShift decomposition: on this stage, BatchNormalization layer is decomposed to Mul → Add → Mul → Add sequence, and ScaleShift layer is decomposed to Mul → Add layers sequence.
Linear operations merge: on this stage we merge sequences of Mul and Add operations to the single Mul → Add instance.
For example, if we have BatchNormalization → ScaleShift sequence in our topology, it is replaced with Mul → Add (by the first stage). On the next stage, the latter will be replaced with ScaleShift layer in case if we have no available Convolution or FullyConnected layer to fuse into (next).
Linear operations fusion: on this stage, the tool fuses Mul and Add operations to Convolution or FullyConnected layers. Notice that it searches for Convolution and FullyConnected layers both backward and forward in the graph (except for Add operation that cannot be fused to Convolution layer in forward direction).

Usage Examples

The first picture below shows the depicted part of Caffe* Resnet269 topology where BatchNorm and ScaleShift layers will be fused to Convolution layers, shown in the second picture.

Pic.1 Caffe Resnet269 block (from Netscope)

Part of Caffe Resnet269 topology

Pic.2 Fused Caffe Resnet269 block (from Netscope)

BatchNorm and ScaleShift layers fused to Convolution layers

Grouped Convolution Fusing

Grouped convolution fusing is a specific optimization that applies for TensorFlow* topologies. The main idea of this optimization is to combine convolutions results for the Split outputs and then recombine them using Concat operation in the same order as they were out from Split (pic.3).

Pic.3 Split→Convolutions→Concat block from TensorBoard* Split->Convolutions->Concat block from TensorBoard

Intermediate Representation Notation Reference Catalog

Convolution Layer

Name:Convolution

Short description:Reference

Detailed description: Reference

Parameters: Convolution layer parameters should be specified in the convolution_data node, which is a child of the layer node.

Parameter name: stride (stride-x, stride-y)
- Description:stride (stride-x, stride-y) is a distance (in pixels) to slide the filter on the feature map over the (x, y) axis. For example, stride equal 1 (1, 1) means sliding the filter 1 pixel at a time over the (x, y) axis
- Range of values: integer values starting from 0
Parameter name: pad (pad-x, pad-y)
- Description:pad (pad-x, pad-y) is a number of pixels to add to the left (top) of the input. For example, pad (pad-x, pad-y) equal 1 (1, 1) means adding 1 pixel to the left of the input. Right and bottom padding should be calculated from the expected output width (height)
- Range of values: integer values starting from 0
Parameter name: kernel (kernel-x, kernel-y)
- Description: kernel (kernel-x, kernel-y) is a width (height) of each filter. For example, kernel (kernel-x, kernel-y) equal 3 (3, 3) means that each filter has width (height) equals 3
- Range of values: integer values starting from 0
Parameter name: output
- Description:output is a number of output feature maps per whole output (when group > 1, output still matches the number of output features regardless of 'group' value). For example, output equals 1 means that there is 1 output feature map in a layer
- Range of values: integer values starting from 0
Parameter name:group
- Description: group denotes the number of groups to which output and input should be split. For example, group equal 1 means that all the filters are applied to full input (usual convolution), group equals 2 means that both input and output channels are separated into 2 groups and i-th output group is connected to i-th input group channels. group equals number of output feature maps denotes depth-wise separable convolution (Reference)
- Range of values: integer values starting from 0
Parameter name: dilation (dilation-x, dilation-y)
- Description: dilation (dilation-x, dilation-y) denotes the distance in width (height) between elements (weights) in the filter. For example, dilation-x and dilation-y equal 1 means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation-x and dilation-y equal 2 means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1
- Range of values: integer values starting from 0

Weights Layout: Weights layout is GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

For the convolutional layer, the number of output features in each dimension is calculated using the formula:
The receptive field in each layer is calculated using the formulas:
- Jump in the output feature map:
  $j_{out} = j_{in} * s$
- Size of the receptive field of output feature:
  $r_{out} = r_{in} + \left ( k - 1 \right ) * j_{in}$
- Center position of the receptive field of the first output feature:
  $start_{out} = start_{in} + \left ( \frac{k - 1}{2} - p \right ) * j_{in}$
- Output is calculated using the following formula:
  $out = \sum_{i = 0}^{n}w_{i}x_{i} + b$

Example

<layer ... type="Convolution" ... >
        <convolution_data stride-x="4" stride-y="4" pad-x="0" pad-y="0" kernel-x="11" kernel-y="11" output="96" group="1" dilation-x="2" dilation-y="2"/>
        <input> ... </input>
        <output> ... </output>
        <weights ... />
        <biases ... />
    </layer>

Pooling Layer

Name: Pooling

Short description: Reference

Detailed description: Reference

Parameters: Specify pooling layer parameters in the pooling_data node, which is a child of the layer node.

NOTE: A subset of pooling parameters, in particular, pad-x, pad-y, kernel-x, kernel-y, stride-x, stride-y are described in the Convolution layer.

Parameter name:pool-method
- Description:pool-method is a type of pooling strategy for values
- Range of values:
  - max - chooses the biggest value in a feature map for each filter position
  - avg - takes the average value in a feature map for each filter position
Parameter name:exclude-pad
- Description:exclude-pad is a type of pooling strategy for values in the padding area. For example, if exclude-pad is True, zero-values in the padding are not used
- Range of values: True or False

Mathematical Formulation

For max pool-method
$output_{j} = MAX\left \{ x_{0}, ... x_{i} \right \}$
For avg pool-method:
$output_{j} = \frac{\sum_{i = 0}^{n}x_{i}}{n}$

Example

<layer ... type="Pooling" ... >
        <pooling_data kernel-x="3" kernel-y="3" pad-x="0" pad-y="0" stride-x="2" stride-y="2" pool-method="max" exclude-pad="true"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

ROIPooling Layer

Name: ROIPooling

Short description: It is a pooling layer with max pooling strategy (see max option in the Pooling layer parameters description). It is used over feature maps of non-uniform sizes and outputs another feature map of a fixed size.

Detailed description: Reference

Parameters: Specify ROIPooling layer parameters in the data node, which is a child of the layer node.

Parameter name:pooled_h (pooled_w)
- Description: pooled_h (pooled_w) is a height of the ROI output feature map. For example, pooled_h (pooled_w) equal 6 means that the height (width) of the output of ROIpooling is 6
- Range of values: integer values starting from 0
Parameter name:spatial_scale
- Description: spatial_scale is a ratio of the input feature map over the input image size
- Range of values: positive floating point value

Mathematical Formulation

$output_{j} = MAX\left \{ x_{0}, ... x_{i} \right \}$

Example

<layer ... type="ROIPooling" ... >
        <data pooled_h="6" pooled_w="6" spatial_scale="0.062500"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

FullyConnected Layer

Name: FullyConnected

Short description: Reference

Detailed description: Reference

Parameters: Specify FullyConnected layer parameters in the fc_data node, which is a child of the layer node.

Parameter name: out-size
- Description: out-size is a length of the output vector. For example, out-size equal 4096 means that the output vector length is 4096
- Range of values: integer values starting from 0

Mathematical Formulation

If previous layer is FullyConnected:
$y_{i} = f\left ( z_{i} \right ) \quad with \quad z_{i} = \sum_{j=1}^{m_{1}^{\left ( l-1 \right )}}w_{i,j}^{\left ( l \right )}y_{i}^{\left ( l -1 \right )}$
Otherwise:
$\[ y_{i} = f\left ( z_{i} \right ) \quad with \quad z_{i}^{\left ( l \right )} = \sum_{j=1}^{m_{1}^{\left ( l-1 \right )}}\sum_{r=1}^{m_{2}^{\left ( l-1 \right )}}\sum_{s=1}^{m_{3}^{\left ( l-1 \right )}}w_{i,j,r,s}^{\left ( l \right )}\left ( Y_{i}^{\lef$

Example

<layer ... type="FullyConnected" ... >
        <fc_data out-size="4096"/>
        <input> ... </input>
        <output> ... </output>
    </layer>

Weights layout: OI, which means that Input is changing the fastest, then Output.

ReLU Layer

Name: ReLU

Short description: Reference

Detailed description: Reference

Parameters: ReLU layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Parameter name: negative_slope
- Description: negative_slope is a multiplier, which is used if the unit is not active (that is negative). For example, negative_slope equal 0.1 means that an inactive unit value would be multiplied by 0.1 and this is the Leaky ReLU. If negative_slope is equal to 0, this is the usual ReLU
- Range of values: double values starting from 0
Parameter name: engine
- Description: engine is a parameter that specifies computational engine implementation. For example, engine equal caffe.ReLUParameter.CAFFE means that a Caffe* computational engine is used
- Range of values:
  - caffe.ReLUParameter.DEFAULT
  - caffe.ReLUParameter.CAFFE
  - caffe.ReLUParameter.CUDNN

Mathematical Formulation

Example

<layer ... type="ReLU" ... >
    <data negative_slope="0.100000"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Activation Layer

Name: Activation

Short description: Activation layer represents an activation function of each neuron in a layer, which is used to add non-linearity to the computational flow.

Detailed description: Reference

Parameters: Activation layer parameters should be specified in the data node, which is a child of the layer node.

Parameter name: type
- Description: type represents particular activation function. For example, type equal sigmoid means that neurons of this layer have a sigmoid activation function
- Range of values:
  - sigmoid - sigmoid activation function. Learn more from the Detailed description section
  - tanh - tanh activation function. Learn more from the Detailed description section

Mathematical Formulation

Sigmoid function:
$f\left ( x \right ) = \frac{1}{1+e^{-x}}$
Tahn function:
$f\left ( x \right ) = \frac{2}{1+e^{-2x}} - 1 = 2sigmoid(2x) - 1$

Example

<layer ... type="Activation" ... >
    <data type="sigmoid" />
    <input> ... </input>
    <output> ... </output>
</layer>

SoftMax layer

Name: SoftMax

Short description: Reference

Detailed description: Reference

Parameters: SoftMax layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Parameter name: axis
- Description: axis represents the axis of which the SoftMax is calculated. axis equal 1 is a default value
- Range of values: positive integer values

Mathematical Formulation

$y_{c} = \frac{e^{Z_{c}}}{\sum_{d=1}^{C}e^{Z_{d}}}$

where C is a number of classes

Example

<layer ... type="SoftMax" ... >
    <data axis="1" />
    <input> ... </input>
    <output> ... </output>
</layer>

Deconvolution Layer

Name: Deconvolution

Short description: Deconvolution layer is applied for upsampling the output to the higher image resolution.

Detailed description: Reference

Parameters: Deconvolution layer parameters should be specified in the deconvolution_data node, which is a child of the layer node.

NOTE:Deconvolution layer has the same way of parameters definition in XML as a Convolution layer.

Weights layout: Weights layout is the following: GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical formulation:
Deconvolution is also called transpose convolution and performs operation, reverse to convolution.

The number of output features for each dimensions is calculated:
$S_{o}=stride\left (S_{i} - 1 \right ) + S_{f} - 2pad$

Where S is size of output, input and filter

Output is calculated in the same way as for convolution layer:
$out = \sum_{i = 0}^{n}w_{i}x_{i} + b$

Example

<layer ... type="Deconvolution" ... >
    <deconvolution_data stride-x="2" stride-y="2" pad-x="1" pad-y="1" kernel-x="4" kernel-y="4" output="19" group="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Local Response Normalization (LRN) layer

Name: Norm

Short description: Reference

Detailed description: Reference

Parameters: Norm layer parameters should be specified in the norm_data node, which is a child of the layer node.

Parameter name: alpha
- Description: alpha represents the scaling parameter for the normalizing sum. For example, alpha equal 0.0001 means that the normalizing sum is multiplied by 0.0001
- Range of values: floating point positive number
Parameter name: beta
- Description: beta represents the exponent for the normalizing sum. For example, beta equal 0.75 means that the normalizing sum is raised to the power of 0.75
- Range of values: floating point positive number
Parameter name: region
- Description: region represents strategy of local regions extension. For example, region equal across means that the normalizing sum is performed over adjacent channels
- Range of values:
  - across - normalizing sum is performed over adjacent channels
  - same - normalizing sum is performed over nearby spatial locations
Parameter name: local-size
- Description: local-size represents the side length of the region to be used for the normalization sum or number of channels depending on the strategy specified in the region parameter. For example, local-size equal 5 for the across strategy means application of sum across 5 adjacent channels
- Range of values: positive integer bigger than zero

Mathematical Formulation

$o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta}$

Where n is the size of each local region.

Example

<layer ... type="Norm" ... >
    <norm_data alpha="9.9999997e-05" beta="0.75" local-size="5" region="across"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Concat Layer

Name: Concat

Short description: Reference

Parameters: Concat layer parameters should be specified in the concat_data node, which is a child of the layer node.

Parameter name: axis
- Description: axis is the number of axis over which input blobs are concatenated. For example, axis equal 1 means that input blobs are concatenated over the first axis
- Range of values: positive number greater or equal to 0

Mathematical Formulation
Axis parameter specifies a blob dimension to concat values. For example, for two input blobs B1xC1xH1xW1 and B2xC2xh4xW2 if axis: 1, output blob is****: B1xC1+C2xH1xW1. This is only possible if B1=B2, H1=h4, W1=W2

Example

<layer ... type="Concat" ... >
    <concat_data axis="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Split Layer

Name: Split

Short description: Split layer splits the input into several output groups. Group sizes are denoted by the number and the size of output ports.

Detailed description: Reference

Parameters: None

Mathematical Formulation

Splits input blob among children. For example, blob is BxC+CxHxW and there are two children. Then, output blob is BxCxHxW.

Example

<layer ... type="Split" ... >
    <input> ... </input>
    <output> ... </output>
</layer>

Reshape Layer

Name: Reshape

Short description: Reshape layer changes dimensions of the input blob according to the specified order. Input blob volume is equal to output blob volume, where volume is the product of dimensions.

Detailed description: Reference

Parameters: Reshape layer parameters should be specified in the data node, which is a child of the layer node.

Parameter name: axis
- Description: axis is the number of the starting axis for reshape. For example, axis equal 1 means that Reshape replaces dimensions starting from the next after the first dimension
- Range of values: positive number greater or equal to 0
Parameter name: dim
- Description: dim is a set of numbers separated with comma, which denote the dimensions of output blob. For example, dim equal 88,1,71 means that output blob gets following dimensions: first dimension equals 88, second dimension equals 1, third dimension equals 71. For more information, refer to the Description block. If dim is equal to two numbers, it performs flattening
- Range of values: set of positive integer numbers separated with comma
Parameter name: num_axes
- Description: num_axes is the number of dimensions to be replaced with a reshaped blob starting from the dimension number specified in axis property. For example, num_axes equal 2 means that 2 dimensions are replaced with reshaped blob
- Range of values:
  - -1 - all dimensions are taken starting from the dimension number specified in axis property
  - positive number greater than the value in the axis parameter

Mathematical Formulation

If you want to reshape input blob BxCxHxW into Bx1x(C*H)xW, the dim parameters of your layer should be:

Example

<layer ... type="Reshape" ... >
    <data axis="0" dim="1, 1001" num_axes="-1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Eltwise Layer

Name: Eltwise

Short description: Eltwise layer performs element-wise operation, which is specified in parameters, over given inputs.

Parameters: Eltwise layer parameters should be specified in the elementwise_data node, which is placed as a child of the layer node.

Parameter name: operation
- Description: operation is the simple mathematical operation to be performed over inputs. For example, operation equal mul means that input blobs are multiplied
- Range of values:
  - sum - summation of given values
  - max - select maximum from given values
  - mul - multiplication of given values

Mathematical Formulation Eltwise accepts 2 inputs of any number of dimensions - from 1 to 4, however, it is required for both of them to have absolutely same dimensions. The produced blob is also of the same dimension as each of its parents

Eltwise does the following with the input blobs:

$o_{i} = f(b_{i}^{1}, b_{i}^{2})$

where - first blob i-th element, - second blob i-th element, $o_{i}$ - output blob i-th element, $f(a,b)$ - is a function that performs an operation over its two arguments $a, b$ .

For sum operation, is defined as
$f(a,b) = a + b$
For mul operation, is defined as
$f(a,b) = a * b$
For max operation, is defined as
$f(a,b) = \left\{\begin{array}{ll} a \quad \mbox{if } a \geq b \\ b \quad \mbox{if } b > a \end{array}\right.$

Example

<layer ... type="Eltwise" ... >
    <elementwise_data operation="sum"/>
    <input> ... </input>
    <output> ... </output>
</layer>

ScaleShift Layer

Name: ScaleShift

Short description: ScaleShift layer performs linear transformation of the input blobs. Weights denote scaling parameter, biases - a shift.

Parameters: ScaleShift layer does not have additional parameters.

Mathematical Formulation

$o_{i} =\gamma b_{i} + \beta$

Example

<layer ... type="ScaleShift" ... >
    <input> ... </input>
    <output> ... </output>
</layer>

Crop Layer

Name: Crop

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified as child crop nodes of the crop-data node, which is placed as a child of the layer node.

Parameter name: axis
- Description: axis is the number of the dimension to be used for crop. For example, axis equal 1 means that crop is performed over the first dimension
- Range of values: positive number greater or equal to 0
Parameter name: offset
- Description: offset denotes the starting point for crop in the input blob. For example, offset equal 2 means that crop is starting from the second value in the given axis
- Range of values: positive integer number
Parameter name: dim
- Description: dim is the result size of the output blob for the given axis. For example, dim equal 88 means that output blob gets the dimension equals 88 for the given axis
- Range of values: positive integer number

Mathematical Formulation

Crop changes dimensions of the input blob. Only dimensions of axes from attributes axis are changed. Dimensions of the output blob are computed based on offset and dims.

Example

<layer ... type="Crop" ... >
    <crop-data axis="2,3" offset="0,0" dim="34,34"/>
    </crop-data>
    <input> ... </input>
    <output> ... </output>
</layer>

Batch Normalization Layer

Name: BatchNormalization

Short description: Reference

Detailed description: Reference

Parameters: BatchNormalization layer parameters should be specified as the batch_norm_data node, which is a child of the layer node.

Parameter name: epsilon
- Description: epsilon is the number to be added to the variance to avoid division by zero when normalizing the value. For example, epsilon equal 0.001 means that 0.001 is added to the variance
- Range of values: positive floating point number

Mathematical Formulation

BatchNormalization is the normalization of the output in each hidden layer.

Input: Values of x over a mini-batch: $\beta = \left \{ x_{1...m} \right \}$

Parameters to learn: $\gamma, \beta$

Output:
$\left \{ o_{i} = BN_{\gamma, \beta} \left ( b_{i} \right ) \right \}$
Mini-batch mean:
$\mu_{\beta} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}$
Mini-batch variance:
$\sigma_{\beta }^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m}\left ( b_{i} - \mu_{\beta} \right )^{2}$
Normalize:
$\hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\beta}}{\sqrt{\sigma_{\beta }^{2} + \epsilon }}$
Scale and shift:
$o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta }\left ( b_{i} \right )$

Example

<layer ... type="BatchNormalization" ... >
    <batch_norm_data epsilon="9.99e-06" />
    <input> ... </input>
    <output> ... </output>
</layer>

Normalize Layer

Name: Normalize

Short description: Normalize layer performs l-p normalization of 1 of input blob.

Parameters: Normalize layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: across_spatial
- Description: across_spatial is a flag that denotes if normalization is performed over CHW or HW. For example, across_spatial equals 0 means that normalization is not shared across channels
- Range of values:
  - 0
  - 1 - not supported
Parameter name: channel_shared
- Description: channel_shared is a flag that denotes if scale parameters are shared across channels. For example, channel_shared equal 0 means that scale parameters are not shared across channels
- Range of values:
  - 0 - scale parameters are not shared across channels
  - 1 - not supported
Parameter name: eps
- Description: eps is the epsilon used to avoid division by zero when normalizing the value. For example, eps equals 0.001 means that 0.001 is used if all the values in normalization are equal to zero
- Range of values: positive floating point number

Mathematical Formulation

$o_{i} = \sum_{i}^{H*W}\frac{\left ( n*C*H*W \right )* scale}{\sqrt{\sum_{i=0}^{C*H*W}\left ( n*C*H*W \right )^{2}}}$

Example

<layer ... type="Normalize" ... >
    <data across_spatial="0" channel_shared="0" eps="0.000000"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Tile Layer

Name: Tile

Short description: Tile layer extends input blob with copies of data along specific axis.

Detailed description: Reference

Parameters: Tile layer parameters should be specified as the tile_data node, which is a child of the layer node.

Parameter name: axis
- Description: axis is the index of the axis to tile. For example, axis equals 3 means that fourth axis is used for tiling
- Range of values: positive integer number
Parameter name: tiles
- Description: tiles is a size of the specified axis in the output blob. For example, tiles equal 88 means that output blob gets 88 copies of data from specified axis
- Range of values: positive integer number

Mathematical Formulation

Tile extends input blobs and filling in output blobs following rules:

$out_i=input_i[inner\_dim*t]$

$t \in \left ( 0, \quad tiles \right )$

Example

<layer ... type="Tile" ... >
    <tile_data axis="3" tiles="88"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Permute Layer

Name: Permute

Short description: Permute layer performs reordering of input blob dimensions.

Detailed description: Reference

Parameters: Permute layer parameters should be specified as the data node, which is a child of the layer node.

NOTE: Model Optimizer (Beta 2) does not use the data node for retrieving parameters and currently supports only the following order for permutation: 0,2,3,1.

Parameter name: order
- Description: order is the set of dimensions indexes for output blob. For example, order equal 0,2,3,1 means that the output blob has following dimensions: first dimension from the input blob, third dimension from the input blob, fourth dimension from the input blob, second dimension from the input blob
- Range of values: set of positive integer numbers separated by comma

Mathematical Formulation

Permute layer performs reordering input blob. Source indexes and destination indexes are bound by formula:

$src\_ind_{offset} = n * ordered[1] * ordered[2] * ordered[3] + (h * ordered[3] + w)$

$n \in \left ( 0, order[0] \right )$

$w \in \left ( 0, order[3] \right )$

Example

<layer ... type="Permute" ... >
    <data order="0,2,3,1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PriorBox Layer

Name: PriorBox

Short description: PriorBox layer generates prior boxes of specified sizes and aspect ratios across all dimensions.

Parameters: PriorBox layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: min_size (max_size)
- Description: min_size (max_size) is the minimum (maximum) box size (in pixels). For example, min_size (max_size) equal 15 means that the minimum (maximum) box size is 15
- Range of values: positive integer number
Parameter name: aspect_ratio
- Description: aspect_ratio is a variance of aspect ratios. Duplicate values are ignored. For example, aspect_ratio equal 2.000000,3.000000 means that for the first box aspect_ratio is equal to 2 and for the second box - 3
- Range of values: set of positive integer numbers
Parameter name: flip
- Description: flip is a flag that denotes that each aspect_ratio is duplicated and flipped. For example, flip equals 1 and aspect_ratio equals 3 mean that aspect_ratio is equal to 1/3
- Range of values:
  - 0 - each aspect_ratio is flipped
  - 1 - each aspect_ratio is not flipped
Parameter name: clip
- Description: clip is a flag that denotes if each value in the output blob is within [0,1]. For example, clip equal 1 means that each value in the output blob is within [0,1]
- Range of values:
  - 0 - clipping is not performed
  - 1 - each value in the output blob is within [0,1]
Parameter name: step
- Description: step is a distance between box centers. For example, step equal 85 means that the distance between neighborhood prior boxes centers is 85
- Range of values: floating point positive number
Parameter name: offset
- Description: offset is a shift of box respectively to top left corner. For example, offset equal 85 means that the shift of neighborhood prior boxes centers is 85
- Range of values: floating point positive number
Parameter name: variance
- Description: variance denotes a variance of adjusting bounding boxes. For example, variance equals 85 means that the shift of neighborhood prior boxes centers is 85
- Range of values: floating point positive number

Mathematical formulation:
PriorBox computes coordinates of prior boxes by following:

First calculates center_x and center_y of prior box:
$W \equiv Width \quad Of \quad Image$
$H \equiv Height \quad Of \quad Image$
- If step equals 0:
  $center_x=(w+0.5)$
  $center_y=(h+0.5)$
- else:
  $center_x=(w+offset)*step$
  $center_y=(h+offset)*step$
  $w \subset \left( 0, W \right )$
  $h \subset \left( 0, H \right )$
Then, for each $s \subset \left( 0, min_sizes \right )$ calculates coordinates of priorboxes:
$xmin = \frac{\frac{center_x - s}{2}}{W};$
$xmin = \frac{\frac{center_y - s}{2}}{H};$
$xmax = \frac{\frac{center_x + s}{2}}{W};$
$xmin = \frac{\frac{center_y + s}{2}}{H};$

Example

<layer ... type="PriorBox" ... >
    <data step="64.000000" min_size="162.000000" max_size="213.000000" offset="0.500000" flip="1" clip="0" aspect_ratio="2.000000,3.000000" variance="0.100000,0.100000,0.200000,0.200000" />
    <input> ... </input>
    <output> ... </output>
</layer>

SimplerNMS layer

Name: SimplerNMS

Short description: SimplerNMS layer performs filtering of bounding boxes and outputs only those with the highest confidence of prediction.

Parameters: SimplerNMS layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: pre_nms_topn (post_nms_topn)
- Description: pre_nms_topn (post_nms_topn) is the quantity of bounding boxes before (after) applying NMS operation. For example, pre_nms_topn (post_nms_topn) equals 15 means that the minimum (maximum) box size is 15
- Range of values: positive integer number
Parameter name: cls_threshold
- Description: cls_threshold is the minimum value of the proposal to be taken into consideration. For example, cls_threshold equal 0.5 means that all boxes with prediction probability less than 0.5 are filtered out
- Range of values: positive floating point number
Parameter name: iou_threshold
- Description: iou_threshold is the minimum ratio of boxes overlapping to be taken into consideration. For example, iou_threshold equal 0.7 means that all boxes with overlapping ratio less than 0.7 are filtered out
- Range of values: positive floating point number
Parameter name: feat_stride
- Description: feat_stride is the step size to slide over boxes (in pixels). For example, feat_stride equal 16 means that all boxes are analyzed with the slide 16
- Range of values: positive integer number
Parameter name: min_bbox_size
- Description: min_bbox_size is the minimum size of box to be taken into consideration. For example, min_bbox_size equal 35 means that all boxes with box size less than 35 are filtered out
- Range of values: positive integer number
Parameter name: scale
- Description: scale is array of scales for anchor boxes generating
- Range of values: positive integer number

Mathematical Formulation

SimplerNMS accepts three inputs with four dimensions. Produced blob has two dimensions, the first one equals post_nms_topn.

SimplerNMS does the following with the input blob:

Generates initial anchor boxes. Left top corner of all boxes is (0, 0). Width and height of boxes are calculated based on scaled (according to the scale parameter) default widths and heights
For each point in the first input blob:
- pins anchor boxes to picture according to the second input blob, which contains four deltas for each box: for x and y of center, for width, and for height
- finds out score in the first input blob
Filters out boxes with size less than min_bbox_size.
Sorts all proposals (box, score) by score from highest to lowest
Takes top pre_nms_topn proposals
Calculates intersections for boxes and filters out all with $intersection/union > iou_threshold$
Takes top post_nms_topn proposals
Returns top proposals

Example

<layer ... type="SimplerNMS" ... >
    <data cls_threshold="0.500000" iou_threshold="0.700000" min_bbox_size="16" feat_stride="16" pre_nms_topn="6000" post_nms_topn="150"/>
    <input> ... </input>
    <output> ... </output>
</layer>

DetectionOutput Layer

Name: DetectionOutput

Short description: DetectionOutput layer performs non-maximum suppression to generate the detection output using information on location and confidence predictions.

Detailed description: Reference

Parameters: DetectionOutput layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: num_classes
- Description: number of classes to be predicted
- Range of values: positive integer values
Parameter name: background_label_id
- Description: background label id. If there is no background class, set it to -1
- Range of values: integer values
Parameter name: top_k
- Description: maximum number of results to be kept on NMS stage
- Range of values: integer values
Parameter name: variance_encoded_in_target
- Description: if True, variance is encoded in target; otherwise, we need to adjust the predicted offset accordingly
- Range of values: logical values
Parameter name: keep_top_k
- Description: number of total bboxes to be kept per image after NMS step. -1 means keeping all bboxes after NMS step
- Range of values: integer values
Parameter name: num_orient_classes
- Range of values: integer values
Parameter name: code_type
- Description: type of coding method for bounding boxes
- Range of values: caffe.PriorBoxParameter.CENTER_SIZE and others
Parameter name: share_location
- Description: bounding boxes are shared among different classes
- Range of values: logical values
Parameter name: interpolate_orientation
- Range of values: integer values
Parameter name: nms_threshold
- Description: threshold to be used in NMS stage
- Range of values: floating point values
Parameter name: confidence_threshold
- Description: only consider detections whose confidences are larger than a threshold. If not provided, consider all boxes
- Range of values: floating point values

Mathematical Formulation

At each feature map cell, DetectionOutput predicts the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, DetectionOutput computes class scores and the four offsets relative to the original default box shape. This results in a total of $(c + 4)k$ filters that are applied around each location in the feature map, yielding $(c + 4)kmn$ outputs for a m × n feature map.

Example

<layer ... type="DetectionOutput" ... >
    <data num_classes="21" share_location="1" background_label_id="0" nms_threshold="0.450000" top_k="400" eta="1.000000" output_directory="" output_name_prefix="" output_format="" label_map_file="" name_size_file="" num_test_image="0" prob="1.000000" resize_mode="caffe.ResizeParameter.WARP" height="0" width="0" height_scale="0" width_scale="0" pad_mode="caffe.ResizeParameter.CONSTANT" pad_value="#" interp_mode="#" code_type="caffe.PriorBoxParameter.CENTER_SIZE" variance_encoded_in_target="0" keep_top_k="200" confidence_threshold="0.010000" visualize="0" visualize_threshold="0.000000" save_file=""/>
    <input> ... </input>
    <output> ... </output>
</layer>

Memory / Delay Object layer

Name: Memory

Short description: Memory layer represents delay layer in terms of LSTM terminology.

Detailed description: Memory layer saves state between two infer requests. In the topology, it is the single layer, however, in the Intermediate Representation, it is always represented as a pair of Memory layers. One of these layers does not have outputs and another does not have inputs (in terms of the Intermediate Representation).

Parameters: Memory layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: id
- Description: id is the id of the pair of Memory layers. For example, id equals r_27-28 means that layers with id 27 and 28 are in one pair
- Range of values: positive integer number
Parameter name: index
- Description: index represents if the given layer is input or output. For example, index equal 0 means this layer is output one
- Range of values:
  - 0 - current layer is output one
  - 1 - current layer is input one
Parameter name: size
- Description: size represents the size of the group. For example, size equals 2 means this group is a pair
- Range of values: only 2 is supported

Mathematical Formulation
Memory save data from the input blob.

Example

<layer ... type="Memory" ... >
    <data id="r_27-28" index="0" size="2" />
    <input> ... </input>
    <output> ... </output>
</layer>

Clamp Layer

Name: Clamp

Short description: Clamp layer represents clipping activation operation.

Detailed description: Reference

Parameters: Clamp layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: min
- Description: min is the lower bound of values in the output shape. Any value in the input shape that is smaller than the bound, is replaced by the min value. For example, min equal 10 means that any value in the input shape that is smaller than the bound, is replaced by 10
- Range of values: positive integer number
Parameter name: max
- Description: max is the upper bound of values in the output shape. Any value in the input shape that is greater than the bound, is replaced by the max value. For example, max equals 50 means that any value in the input shape that is greater than the bound, is replaced by 50
- Range of values: positive integer number

Mathematical Formulation

Clamp generally does the following with the input blobs:

$\[ out_i=\left\{\begin{array}{ll} max\_value \quad if \quad input_i>max\_value, \\ min\_value \quad if \quad input_i$

Example

<layer ... type="Clamp" ... >
    <data min="10" max="50" />
    <input> ... </input>
    <output> ... </output>
</layer>

ArgMax Layer

Name: ArgMax

Short description: ArgMax layer compute the index of the K maximum values for each datum across all dimensions CxHxW.

Detailed description: Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to True, output is a vector of pairs (max_ind, max_val) for each image. The axis parameter specifies an axis along which to maximize.

Parameters: ArgMax layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: top_k
- Description: top_k is the number K of maximum items to output
- Range of values: positive integer number
Parameter name:out_max_val
- Description: if out_max_val equals 1, output is a vector of pairs (max_ind, max_val), unless axis is set. Then output is max_val along the specified axis
- Range of values: 0 or 1
Parameter name: axis
- Description: if set, maximizes along the specified axis, else maximizes the flattened trailing dimensions for each index of the first / num dimension
- Range of values: integer values

Mathematical Formulation

ArgMax generally does the following with the input blobs:

$f(y) \leq f(x) \right\} \]$

Example

<layer ... type="ArgMax" ... >
    <data top_k="10" out_max_val="1" axis="-1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PSROIPooling Layer

Name: PSROIPooling

Short description: PSROIPooling layer compute position-sensitive max pooling on regions of interest specified by input, takes as input N position-sensitive score maps and a list of R regions of interest.

Detailed description: Reference

Parameters: PSRoiPooling layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: output_dim
- Description: output_dim is the pooled output channel number
- Range of values: positive integer number
Parameter name: group_size
- Description: group_size is the number of groups to encode position-sensitive score maps
- Range of values: positive integer number
Parameter name: spatial_scale
- Description: spatial_scale is multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling
- Range of values: positive floating point value

Mathematical Formulation

The output value for $(i, j)$ -th bin is obtained by summation from one score map $x_{i,j}$ corresponding to that bin. In short, the difference from RoIPooling is that a general feature map x is replaced by a specific positive-sensitive score map $x_{i,j}$

Example

<layer ... type="PSROIPooling" ... >
    <data output_dim="10" out_max_val="1" spatial_scale="0.1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

GRN Layer

Name: GRN

Short description: GRN is Global Response Normalization with L2 norm (across channels only).

Parameters: GRN layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: bias
Description: bias is added to the variance
Range of values: floating point value

Mathematical Formulation

GRN computes L2 norm by channels for input blob. GRN generally does the following with the input blob:

$output_{i} = \frac{input_{i}}{\sqrt{\sum_{i}^{C} input_{i}}}$

Example

<layer ... type="GRN" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PReLU Layer

Name: PReLU

Short description: PReLU is the Parametric Rectifier Linear Unit. The difference from ReLU is that negative slopes can vary across channels.

Parameters: PReLU layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: channel_shared
- Description: channel_shared shows if negative slope shared across channels or not
- Range of values: 0 or 1
Parameter name: filler_type
- Description: filler_type defines initialization type for negative slope
- Range of values: string
Parameter name: filler_value
- Description: filler_value defines the value in constant filler
- Range of values: integer
Parameter name: min(max)
- Description: min(max) defines the minimal(maximal) value in uniform filler
- Range of values: integer
Parameter name: mean
- Description: mean defines the mean value in Gaussian filler
- Range of values: integer

Mathematical Formulation

PReLU accepts one input with four dimensions. The produced blob has the same dimensions as input.

PReLU does the following with the input blob:
$o_{i} = max(0, x_{i}) + w_{i} * min(0,x_{i})$

where $w_{i}$ is from weights blob.

Example

<layer ... type="PReLU" ... >
    <data bias="1.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

PriorBoxClustered Layer

Name: PriorBoxClustered

Short description: PriorBoxClustered layer generates prior boxes of specified sizes.

Parameters: PriorBoxClustered layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: width (height)
- Description: width (height) is a parameter that specifies desired boxes widths (heights) in pixels
- Range of values: floating point positive number
Parameter name: clip
- Description: clip is a flag that denotes if each value in the output blob is within [0,1]. For example, clip equal 1 means that each value in the output blob is within [0,1]
- Range of values:
- 0 - clipping is not performed
- 1 - each value in the output blob is within [0,1]
Parameter name: flip
- Description: flip is a flag that denotes whether the list of boxes is augmented with the flipped ones
- Range of values:
  - 0 - list of boxes is not augmented with the flipped ones
  - 1 - list of boxes is augmented with the flipped ones
Parameter name: step (step_w, step_h)
- Description: step (step_w, step_h) is a distance between box centers. For example, step equal 85 means that the distance between neighborhood prior boxes centers is 85
- Range of values: floating point positive number
Parameter name: offset
- Description: offset is a shift of box respectively to top left corner. For example, offset equal 85 means that the shift of neighborhood prior boxes centers is 85
- Range of values: floating point positive number
Parameter name: variance
- Description: variance denotes a variance of adjusting bounding boxes. For example, variance equal 85 means that the shift of neighborhood prior boxes centers is 85
- Range of values: floating point positive number
Parameter name: img_h (img_w)
- Description: img_h (img_w) specifies height (width) of input image. These parameters are calculated unless provided explicitly
- Range of values: floating point positive number

Mathematical Formulation

PriorBoxClustered computes coordinates of prior boxes by following:

Calculates the center_x and center_y of prior box:
$W \equiv Width \quad Of \quad Image$
$H \equiv Height \quad Of \quad Image$
$center_x=(w+offset)*step$
$center_y=(h+offset)*step$
$w \subset \left( 0, W \right )$
$h \subset \left( 0, H \right )$
For each $s \subset \left( 0, W \right )$ calculates the prior boxes coordinates:
$xmin = \frac{center_x - \frac{width_s}{2}}{W}$
$ymin = \frac{center_y - \frac{height_s]}{2}}{H}$
$xmax = \frac{center_x - \frac{width_s}{2}}{W}$
$xmax = \frac{center_y - \frac{height_s}{2}}{H}$

If clip is defined, the coordinates of prior boxes are recalculated with the formula:
$coordinate = \min(\max(coordinate,0), 1)$

Example

<layer ... type="PriorBoxClustered">
    <data clip="0" flip="0" height="44.0,10.0,30.0,19.0,94.0,32.0,61.0,53.0,17.0" offset="0.5" step="16.0" variance="0.1,0.1,0.2,0.2"
     width="86.0,13.0,57.0,39.0,68.0,34.0,142.0,50.0,23.0"/>
    <input>
        ...
    </input>
    <output>
        ...
    </output>
</layer>

MVN Layer

Name: MVN

Short description: Reference

Parameters: MVN layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: across_channels
- Description: across_channels is a flag that denotes if mean values are shared across channels. For example, across_channels equal 0 means that mean values are not shared across channels
- Range of values:
  - 0 - mean values are not shared across channels
  - 1 - mean values are shared across channels
Parameter name: normalize_variance
- Description: normalize_variance is a flag that denotes whether to perform variance normalization
- Range of values:
  - 0 - variance normalization is not performed
  - 1 - variance normalization is performed
Parameter name: eps
- Description: eps is the number to be added to the variance to avoid division by zero when normalizing the value. For example, epsilon equal 0.001 means that 0.001 is added to the variance
- Range of values: positive floating point number

Mathematical Formulation

MVN subtracts mean from the input blob:

$o_{i}=i_{i} - \frac{\sum {i_{k}}}{C*H*W}$

If normalize_variance is set to 1, the output blob is divided by variance:

$o_{i}=\frac{o_{i}}{\sum \sqrt {o_{k}^2}+\epsilon}$

Example

<layer ... type="MVN">
    <data across_channels="1" eps="9.999999717180685e-10" normalize_variance="1"/>
    <input>
        ...
    </input>
    <output>
        ...
    </output>
</layer>

CTCGreadyDecoder Layer

Name: CTCGreadyDecoder

Short description: CTCGreadyDecoder performs greedy decoding on the logits given in input (best path).

Detailed description: Reference

Parameters: CTCGreadyDecoder layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: ctc_merge_repeated
- Description: ctc_merge_repeated is flag for collapsing the repeated labels during the ctc calculation
- Range of values: 0 or 1

Mathematical formulation

Given an input sequence X of length T, CTCGreadyDecoder assumes the probability of a length T character sequence C is given by,

$p(C|X) = \prod_{t=1}^{T} p(c_{t}|X)$

Example

<layer ... type="CTCGreadyDecoder" ... >
    <data stride="1"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Proposal Layer

Name: Proposal

Short description: Proposal layer performs filtering of only those bounding boxes and outputs with the highest confidence of prediction.

Parameters: Proposal layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: pre_nms_topn (post_nms_topn)
- Description: pre_nms_topn (post_nms_topn) is the quantity of bounding boxes before (after) applying NMS operation. For example, pre_nms_topn (post_nms_topn) equal 15 means that the minimum (maximum) box size is 15
- Range of values: positive integer number
Parameter name: nms_thresh
- Description: nms_thresh is the minimum value of the proposal to be taken into consideration. For example, nms_thresh equal 0.5 means that all boxes with prediction probability less than 0.5 are filtered out
- Range of values: positive floating point number
Parameter name: feat_stride
- Description: feat_stride is the step size to slide over boxes (in pixels). For example, feat_stride equal 16 means that all boxes are analyzed with the slide 16
- Range of values: positive integer number
Parameter name: min_size
- Description: min_size is the minimum size of box to be taken into consideration. For example, min_size equal 35 means that all boxes with box size less than 35 are filtered out
- Range of values: positive integer number
Parameter name: base_size
- Description: base_size is the base size for anchor generation
- Range of values: positive integer number
Parameter name: ratio
- Description: ratio is the ratios for anchor generation
- Range of values: array of float numbers
Parameter name: scale
- Description: scale is the scales for anchor generation
- Range of values: array of float numbers

Mathematical formulation

Proposal layer accepts three inputs with four dimensions. The produced blob has two dimensions: first one equals batch_size * post_nms_topn.

Proposal does the following with the input blob:

Generates initial anchor boxes Left top corner of all boxes in (0, 0). Width and height of boxes are calculated from base_size with scale and ratio parameters
For each point in the first input blob:
- pins anchor boxes to the image according to the second input blob that contains four deltas for each box: for x and y of center, for width and for height
- finds out score in the first input blob
Filters out boxes with size less than min_size
Sorts all proposals (box, score) by score from highest to lowest
Takes top pre_nms_topn proposals
Calculates intersections for boxes and filter out all with $intersection/union > nms_thresh$
Takes top post_nms_topn proposals
Returns top proposals

Example

<layer ... type="Proposal" ... >
    <data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.6" post_nms_topn="200" pre_nms_topn="6000" 
     ratio="2.67" scale="4.0,6.0,9.0,16.0,24.0,32.0"/>
    <input> ... </input>
    <output> ... </output>
</layer>

Resample Layer

Name: Resample

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters should be specified as the data node, which is a child of the layer node.

Parameter name: type
- Description: type parameter specifies type of blob interpolation
- Range of values:
  - LINEAR - linear blob interpolation
  - CUBIC - cubic blob interpolation
  - NEAREST - nearest-neighbor blob interpolation
Parameter name: antialias
- Description: antialias is a flag that denotes whether to perform anti-aliasing
- Range of values:
  - 0 - anti-aliasing is not performed
  - 1 - anti-aliasing is performed

Mathematical formulation

Resample layer scales the input blob. Depending on the type parameter, Resample applies different blob interpolation algorithms and performs anti-aliasing if the antialias parameter is specified.

Example

<layer type="Resample"> 
  <data antialias="0" factor="1.0" height="227" type="caffe.ResampleParameter.LINEAR" width="227"/> 
      <input> 
      ... 
      </input> 
      <output> 
      ... 
      </output> 
</layer>

Frequently Asked Questions

If your question is not covered by the topics below, use the CV SDK Support page, where you can participate on a free forum.

1. Current caffe.proto does not contain field

Internally, the Model Optimizer uses a protobuf library to parse and load Caffe* models. This library requires a file grammar and a generated parser. For a Caffe fallback, the Model Optimizer uses a Caffe-generated parser for a Caffe-specific .proto file (which is usually located in the src/caffe/proto directory). So, if you have Caffe installed on your machine with Python* interface available, make sure that this is exactly the version of Caffe that was used to create the model.

If you just want to experiment with the Model Optimizer and test a Python extension for working with your custom layers without building Caffe, add the layer description to the caffe.proto file and generate a parser for it.

For example, to add the description of the CustomReshape layer, which is an artificial layer not present in any caffe.proto files:

Add the following lines to of the caffe.proto file:

package mo_caffe; // to avoid conflict with system Caffe* it is highly recommended to specify different package name
...
message LayerParameter {
  // other layers parameters description
  ...
  optional CustomReshapeParameter custom_reshape_param = 546; // 546 - ID is any number not present in caffe.proto
}
// these lines to end of the file - describing contents of this parameter
message CustomReshapeParameter {
  optional BlobShape shape = 1; // we just use the same parameter type as some other Caffe layers
}

Generate a new parser:

cd <INSTALL_DIR>/deployment_tools/model_optimizer/mo/front/caffe/proto
python3 generate_caffe_pb2.py --input_proto <PATH_TO_CUSTOM_CAFFE>/src/caffe/proto/caffe.proto

where PATH_TO_CUSTOM_CAFFE is the path to the root directory of custom Caffe*.

Now, the Model Optimizer is able to load the model into memory and start working with your extensions if there are any.

However, because your model has custom layers, you must register your custom layers as custom.

2. How do I create a bare caffemodel, if I have only prototxt?

You need the Caffe* Python* interface. In this case, do the following:

python3
import caffe
net = caffe.Net('<PATH_TO_PROTOTXT>/my_net.prototxt', caffe.TEST)
net.save('<PATH_TO_PROTOTXT>/my_net.caffemodel')

3. Unable to create ports for node with id

Most likely, the Model Optimizer does not know how to infer output shapes of some layers in the given topology. To lessen the scope, compile the list of layers that are custom for the Model Optimizer: present in the topology, absent in list of supported layers for the target framework: Caffe*, TensorFlow*, MXNet*. Then refer to available options in the corresponding section: Caffe* Models with Custom Layers, TensorFlow* Models with Custom Layers, MXNet* Models with Custom Layers.

4. Input image of shape is larger than mean image from file

Your model input shapes must be smaller than or equal to the shapes of the mean image file you provide. The idea behind the mean file is to subtract its values from the input image in an element-wise manner. When the mean file is smaller than the input image, there are not enough values to perform element-wise subtraction. Also, make sure that you use the mean file that was used during the network training phase. Note that the mean file is dataset dependent.

5. Mean file is empty

Most likely, the mean file that you have is specified with --mean_file flag, while launching the Model Optimizer is empty. Make sure that this is exactly the required mean file and try to regenerate it from the given dataset if possible.

6. Probably mean file has incorrect format

The mean file that you provide for the Model Optimizer must be in a .binaryproto format. You can try to check the content using recommendations from the BVLC Caffe* (#290).

7. Invalid .proto file: there is neither "layer" nor "layers" top-level messages

The structure of any Caffe* topology is described in the caffe.proto file of any Caffe version. For example, in the Model Optimizer, you can find the following proto file, used by default: <INSTALL_DIR>/deployment_tools/model_optimizer/mo/front/caffe/proto/my_caffe.proto. There you can find the structure:

message NetParameter {
  // ... some other parameters
  // The layers that make up the net.  Each of their configurations, including
  // connectivity and behavior, is specified as a LayerParameter.
  repeated LayerParameter layer = 100;  // ID 100 so layers are printed last.
  // DEPRECATED: use 'layer' instead.
  repeated V1LayerParameter layers = 2;
}

This means that any topology should contain layers as top-level structures in prototxt. For example, see the LeNet topology.

8. Old-style inputs (via input_dims) are not supported. Please, specify inputs via input_shape

The structure of any Caffe* topology is described in the caffe.proto file for any Caffe version. For example, in the Model Optimizer you can find the following .proto file, used by default: <INSTALL_DIR>/deployment_tools/model_optimizer/mo/front/caffe/proto/my_caffe.proto. There you can find the structure:

message NetParameter {

 optional string name = 1; // consider giving the network a name
  // DEPRECATED. See InputParameter. The input blobs to the network.
  repeated string input = 3;
  // DEPRECATED. See InputParameter. The shape of the input blobs.
  repeated BlobShape input_shape = 8;
  // 4D input dimensions -- deprecated.  Use "input_shape" instead.
  // If specified, for each input blob there should be four
  // values specifying the num, channels, height and width of the input blob.
  // Thus, there should be a total of (4 * #input) numbers.
  repeated int32 input_dim = 4;
  // ... other parameters
}

So, the input layer of the provided model must be specified in one of the following styles:

input: "data"
input_shape
{
    dim: 1
    dim: 3
    dim: 227
    dim: 227
}

input: "data"
input_shape
{
    dim: 1
    dim: 3
    dim: 600
    dim: 1000
}
input: "im_info"
input_shape
{
     dim: 1
     dim: 3
}

layer
{
    name: "data"
    type: "Input"
    top: "data"
    input_param {shape: {dim: 1 dim: 3 dim: 600 dim: 1000}}
}
layer
{
    name: "im_info"
    type: "Input"
    top: "im_info"
    input_param {shape: {dim: 1 dim: 3}}
}

input: "data"
input_dim: 1
input_dim: 3
input_dim: 500

However, if your model contains more than one input, the Model Optimizer is able to convert the model with inputs specified in a form of 1, 2, 3 of the list above. The last form is not supported for multi-input topologies.

9. Mean file for topologies with multiple inputs is not supported

Model Optimizer does not support mean file processing for topologies with more than one input. In this case, you need to perform preprocessing of the inputs for a generated Intermediate Representation in the Inference Engine to perform subtraction for every input of your multi-input model.

10.Cannot load or process mean file: value error

There are multiple reasons why the Model Optimizer does not accept the mean file. See FAQ #4, #5, and #6.

11. Invalid prototxt file: value error

There are multiple reasons why the Model Optimizer does not accept a Caffe* topology. See FAQs #7 and #20.

12. Error happened while constructing caffe.Net in the Caffe* fallback function

Model Optimizer tried to infer a specified layer via the Caffe* framework, however it cannot construct a net using the Caffe Python* interface. Make sure that your caffemodel and prototxt files are correct. To prove that the problem is not in the prototxt file, see FAQ #2.

13. Cannot infer shapes due to exception in Caffe*

Model Optimizer tried to infer a custom layer via the Caffe* framework, however an error occurred, meaning that the model could not be inferred using the Caffe. It might happen if you try to convert the model with some noise weights and biases resulting in problems with layers with dynamic shapes. You should write your own extension for every custom layer you topology might have. For more details, refer to: Extending Model Optimizer with New Primitives.

14. Cannot infer shape for node {} because there is no Caffe* available. Please, register Python* inference function for op or use Caffe for shape inference

Your model contains a custom layer and you have correctly registered it with the CustomLayersMapping.xml file. These steps are required to offload shape inference of the custom layer with the help of the system Caffe*. However, the Model Optimizer could not import a Caffe package. Make sure that you have built Caffe with a pycaffe target and added it into the PYTHONPATH environment variable. For more information, please refer to the Configuring the Model Optimizer. At the same time, it is highly recommend to avoid dependency on Caffe and write your own Model Optimizer extension for your custom layer. For more information, refer to the FAQ #45.

15. Framework name can not be deduced from the given options. Use --framework to choose one of Caffe, TensorFlow, MXNet*

You have run the Model Optimizer without a flag --framework caffe|tf|mxnet. Model Optimizer tries to deduce the framework by the input model file extension (.pb for TensorFlow*, .caffemodel for Caffe*, .params for MXNet*). Your input model might have a different extension and you need to explicitly set the source framework. For example, use --framework caffe.

16. Input shape is required to convert MXNet* model. Please, provide it with --input_shape

Input shape was not provided. That is mandatory for converting an MXNet* model to the Intermediate Representation, because MXNet models do not contain information about input shapes. Please, use the --input_shape flag to specify it. For more information about using the --input_shape, refer to the FAQ #57.

17. Both --mean_file and --mean_values are specified. Specify either mean file or mean values

--mean_file and --mean_values are two ways of specifying preprocessing for the input. However, they cannot be used together, as it would mean double subtraction and lead to ambiguity. Choose one of these options and pass it using the corresponding CLI option.

18. Negative value specified for --mean_file_offsets option. Please, specify positive integer values in format '(x,y)'

You might have specified negative values with --mean_file_offsets. Only positive integer values in format '(x,y)' must be used.

19. Both --scale and --scale_values are defined. Specify either scale factor or scale values per input channels

--scale sets a scaling factor for all channels. --scale_values sets a scaling factor per each channel. Using both of them simultaneously produces ambiguity, so you must use only one of them. For more information, refer to the Using Framework-Agnostic Conversion Parameters: for Converting a Caffe* Model, Converting a TensorFlow* Model, Converting an MXNet* Model.

20. Cannot find .prototxt file: for Caffe* please specify --input_proto - a protobuf file that stores topology and --input_model that stores pretrained weights

Model Optimizer cannot find a .prototxt file for a specified model. By default, it must be located in the same directory as the input model with the same name (except extension). If any of these conditions is not satisfied, use --input_proto to specify the path to the .prototxt file.

21. Specified input model does not exist

You probably specified an incorrect path to a model. Make sure that the path is correct and the file exists.

22. Failed to create directory .. . Permission denied!

Model Optimizer cannot create a directory specified via --output_dir. Make sure that you have enough permissions to create the specified directory.

23. Discovered data node without inputs and value

One of the layers in the specified topology might not have inputs or values. Please make sure that the provided caffemodel and protobuf files are correct.

24. Part of the nodes was not translated to IR. Stopped

Some of the layers are not supported by the Model Optimizer and cannot be translated to an Intermediate Representation. You can extend the Model Optimizer by adding new primitives. For more information, refer to Extending the Model Optimizer with New Primitives page.

25. While creating an edge from .. to .. : node name is undefined in the graph. Check correctness of the input model

Model Optimizer cannot build a graph based on a specified model. Most likely, it is incorrect.

26. Node does not exist in the graph

You might have specified an output node via the --output flag that does not exist in a provided model. Make sure that the specified output is correct and this node exists in the current model.

27. --input parameter was provided. Other inputs are needed for output computation. Provide more inputs or choose another place to cut the net

Most likely, the Model Optimizer tried to cut the model by a specified input. However, other inputs are needed.

28. Placeholder node does not have input port, but input port was provided

You might have specified a placeholder node with an input node, while the placeholder node does not have it the model.

29. Port index is out of number of available input ports for node

This error occurs when an incorrect input port is specified with the --input command line argument. When using --input, you can optionally specify an input port in the form: X:node_name, where X is an integer index of the input port starting from 0 and node_name is the name of a node in the model. This error occurs when the specified input port X is not in the range 0..(n-1), where n is the number of input ports for the node. Please, specify a correct port index, or do not use it if it is not needed.

30. Node has more than 1 input and input shapes were provided. Try not to provide input shapes or specify input port with PORT:NODE notation, where PORT is an integer

This error occurs when an incorrect combination of the --input and --input_shape command line options is used. Using both --input and --input_shape is valid only if --input points to the Placeholder node, a node with one input port or --input has the form PORT:NODE, where PORT is an integer port index of input for node NODE. Otherwise, the combination of --input and --input_shape is incorrect.

31. Input port > 0 in --input is not supported if --input_shape is not provided. Node: NAME_OF_THE_NODE. Omitted port index and all input ports will be replaced by placeholders. Or provide --input_shape

When using the PORT:NODE notation for the --input command line argument and PORT> 0, you should specify --input_shape for this input. This is a limitation of the current Model Optimizer implementation.

32. No or multiple placeholders are in the model, but only one shape is provided, cannot set it

Looks like you have provided only one shape for the placeholder, however there are no or multiple inputs in the model. Please, make sure that you have provided correct data for placeholder nodes.

33. The amount of input nodes for port is not equal to 1

This error occurs when the SubgraphMatch.single_input_node function is used for an input port that supplies more than one node in a sub-graph. The single_input_node function can be used only for ports that has a single consumer inside the matching sub-graph. When multiple nodes are connected to the port, use the input_nodes function or node_by_pattern function instead of single_input_node. Please, refer to Sub-Graph Replacement in the Model Optimizer for more details.

34. Output node for port has already been specified

This error occurs when the SubgraphMatch._add_output_node function is called manually from user's extension code. This is an internal function, and you should not call it directly.

35. Unsupported match kind .. . Match kinds points or scope are supported only

While using configuration file to implement a TensorFlow* front replacement extension, an incorrect match kind was used. Only points or scope match kinds are supported. Please, refer to Sub-Graph Replacement in the Model Optimizer for more details.

36. Cannot write an event file for the TensorBoard to directory

Model Optimizer tried to write an event file in the specified directory but failed to do that. That could happen because the specified directory does not exist or you do not have enough permissions to write in it.

37. There is no registered infer function for node with op = .. . Please, implement this function in the extensions

Most likely, you tried to extend Model Optimizer with a new primitive, but did not specify an infer function. For more information on extensions, see Extending the Model Optimizer with New Primitives.

38. Stopped shape/value propagation at node ..

Model Optimizer cannot infer shapes or values for the specified node. It can happen because of a bug in the custom shape infer function, because the node inputs have incorrect values/shapes, or because the input shapes are incorrect.

39. The input with shape .. does not have the batch dimension

Batch dimension is the first dimension in the shape and it should be equal to 1 or undefined. In your case, it is not equal to either 1 or undefined, which is why the -b shortcut produces undefined and unspecified behavior. To resolve the issue, specify full shapes for each input with the --input_shape option. Run Model Optimizer with the --help option to learn more about the notation for input shapes.

40. Not all output shapes were inferred or fully defined for node

Most likely, the shape is not defined (partially or fully) for the specified node. You can use --input_shape with positive integers to override model input shapes.

41. Shape for tensor is not defined. Cannot proceed

This error occurs when the --input command line option is used to cut a model and --input_shape is not used to override shapes for a node and a shape for the node cannot be inferred by Model Optimizer. You need to help Model Optimizer and specify shapes with --input_shape for each node that is specified with the --input command line option.

42. Module tensorflow was not found. Please install TensorFlow* 1.2 or higher

To convert TensorFlow* models with Model Optimizer, TensorFlow* 1.2 or newer must be installed. For more information on prerequisites, see Configuring the Model Optimizer.

43. Cannot read the model file: it is incorrect TensorFlow* model file or missing

The model file should contain a frozen TensorFlow* graph in the text or binary format. Make sure that --input_model_is_text is provided for a model in the text format. By default, a model is interpreted as binary file.

44. Cannot preprocess TensorFlow* graph after reading from model file. File is corrupted or has unsupported format

Most likely, there is a problem with the specified file for model. The file exists, but it has bad formatting or is corrupted.

45. Found custom layer. Model Optimizer does not support this layer. Please, register it in CustomLayersMapping.xml or implement extension

This means that the layer {layer_name} is not supported in the Model Optimizer. You can find a list of all unsupported layers in the corresponding section. You should add this layer to CustomLayersMapping.xml (Legacy Mode for Caffe* Custom Layers) or implement the extensions for this layer (Extending Model Optimizer with New Primitives).

46. Custom replacement configuration file does not exist

Path to the custom replacement configuration file was provided with the --tensorflow_use_custom_operations_config flag, but the file could not be found. Please, make sure that the specified path is correct and the file exists.

47. Extractors collection have case insensitive duplicates

When extending Model Optimizer with new primitives keep in mind that their names are case insensitive. Most likely, another operation with the same name is already defined. For more information, see Extending the Model Optimizer with New Primitives.

48. Input model name is not in an expected format, cannot extract iteration number

Model Optimizer can not load an MXNet* model in the specified file format. Please, use the .json or .param format.

49. Cannot convert type of placeholder because not all of its outputs are Cast to float operations

There are models where Placeholder has the UINT8 type and the first operation after it is 'Cast', which casts the input to FP32. Model Optimizer detected that the Placeholder has the UINT8 type, but the next operation is not 'Cast' to float. Model Optimizer does not support such a case. Please, change the model to have placeholder FP32 data type.

50. Data type is unsupported

Model Optimizer cannot convert the model to the specified data type. Currently, FP16 and FP32 are supported. Please, specify the data type with the --data_type flag. The available values are: FP16, FP32, half, float.

51. No node with name ..

Model Optimizer tried to access a node that does not exist. This could happen if you have incorrectly specified placeholder, input or output node name.

52. Module mxnet was not found. Please, install MXNet* 1.0.0

To convert MXNet* models with Model Optimizer, MXNet 1.0.0 must be installed. For more information about prerequisites, see Configuring the Model Optimizer.

53. The following error happened while loading MXNet* model ..

Most likely, there is a problem with loading of the MXNet* model. Please, make sure that the specified path is correct, the model exists, it is not corrupted, and you have sufficient permissions to work with it.

54. The following error happened while processing input shapes: ..

Please, make sure that inputs are defined and have correct shapes. You can use --input_shape with positive integers to override model input shapes.

55. Attempt to register of custom name for the second time as class. Note that custom names are case-insensitive

56. Both --input_shape and --batch were provided. Please, provide only one of them

You cannot specify the batch and the input shape at the same time. You should specify a desired batch as the first value of the input shape.

57. Input shape ... cannot be parsed

The specified input shape cannot be parsed. Please, define it in one of the following ways:

python3 mo.py --input_model <INPUT_MODEL>.caffemodel --input_shape (1,3,227,227)

python3 mo.py --input_model <INPUT_MODEL>.caffemodel --input_shape [1,3,227,227]

In case of multi input topology you should also specify inputs:

python3 mo.py --input_model /path-to/your-model.caffemodel --input data,rois --input_shape (1,3,227,227),(1,6,1,1)

Keep in mind that there is no space between and inside the brackets for input shapes.

58. Please, provide input layer names for input layer shapes

When specifying input shapes for several layers, you must provide names for inputs, whose shapes will be overwritten. For usage examples, see Converting a Caffe* Model. Additional information for --input_shape is in FAQ #57.

59. Values cannot be parsed

Mean values for the given parameter cannot be parsed. It should be a string with a list of mean values. For example, in '(1,2,3)', 1 stands for the RED channel, 2 for the GREEN channel, 3 for the BLUE channel.

60. .. channels are expected for given values

The number of channels and the number of given values for mean values do not match. The shape should be defined as '(R,G,B)' or '[R,G,B]'. The shape should not contain undefined dimensions (? or -1). The order of values is as follows: (value for a RED channel, value for a GREEN channel, value for a BLUE channel).

61. You should specify input for each mean value

Most likely, you have not specified inputs using --mean_values. Please, specify inputs with the --input flag. For usage examples, please, refer to FAQ #63.

62. You should specify input for each scale value

Most likely, you have not specified inputs using --scale_values. Please, specify inputs with the --input flag. For usage examples, please, refer to FAQ #64.

63. Number of inputs and mean values do not match

The number of specified mean values and the number of inputs must be equal. Please, refer to Converting a Caffe* Model for a usage example.

64. Number of inputs and scale values do not match

The number of specified scale values and the number of inputs must be equal. Please, refer to Converting a Caffe* Model for a usage example.

65. No class registered for match kind .. . Supported match kinds are ..

A replacement defined in the configuration file for sub-graph replacement using node names patterns or start/end nodes has the match_kind attribute. The attribute may have only one of the values: scope or points. If a different value is provided, this error is displayed.

66. No instance(s) is(are) defined for the custom replacement

A replacement defined in the configuration file for sub-graph replacement using node names patterns or start/end nodes has the instances attribute. This attribute is mandatory, and it causes this error if it is missing. Refer to documentation with a description of the sub-graph replacement feature.

67. The instance must be a single dictionary for the custom replacement with id ..

A replacement defined in the configuration file for sub-graph replacement using start/end nodes has the instances attribute. For this type of replacement, the instance must be defined with a dictionary with two keys start_points and end_points. Values for these keys are lists with the start and end node names, respectively. Refer to documentation with a description of the sub-graph replacement feature.

68. No instances are defined for replacement with id ..

A replacement for the specified id is not defined in the configuration file. Please, refer to FAQ #66 for more information.

69. Custom replacements configuration file ... does not exist

Path to a custom replacement configuration file was provided with the --tensorflow_use_custom_operations_config flag, but it cannot be found. Please, make sure that the specified path is correct and the file exists.

70. Failed to parse custom replacements configuration file ...

The file for custom replacement configuration provided with the --tensorflow_use_custom_operations_config flag cannot be parsed. In particular, it should have a valid JSON structure. For more details, refer to JSON Schema Reference.

71. One of the custom replacements in the configuration file .. does not contain attribute `id`

Every custom replacement should declare a set of mandatory attributes and their values. For more details, refer to FAQ #72.

72. File .. validation failed

The file for custom replacement configuration provided with the --tensorflow_use_custom_operations_config flag cannot pass validation. Make sure that you have specified id, instances and match_kind for all the patterns.

73. Cannot update the file .. because it is broken

The custom replacement configuration file provided with the --tensorflow_custom_operations_config_update cannot be parsed. Please, make sure that the file is correct and refer to FAQs #69, #70, #71, and #72.

74. End node .. is not reachable from start nodes: ..

This error occurs when you try to make a sub-graph match. It is detected that between the start and end nodes that were specified as inputs/outputs of the subgraph to find, there are nodes that are marked as outputs but there is no path from them to the input nodes. Make sure that the subgraph you want to match does actually contain all the specified output nodes.

75. Sub-graph contains network input node ..

Start or end node for the sub-graph replacement using start/end nodes is specified incorrectly. Model Optimizer finds internal nodes of the sub-graph strictly "between" the start and end nodes. Then it adds all input nodes to the sub-graph (and inputs of their inputs and so on) for these "internal" nodes. The error reports, that the Model Optimizer reached input node during this phase. This means that the start/end points are specified incorrectly in the configuration file. Refer to documentation with a description of the sub-graph replacement feature.

76. ... elements of ... were clipped to infinity while converting a blob for node [...] to ...

This message may appear when the --data_type=FP16 command line option is used. This option implies conversion of all the blobs in the node to FP16. If a value in a blob is out of the range of valid FP16 values, the value is converted to positive or negative infinity. It may lead to incorrect results of inference or may not be a problem, depending on the model. The number of such elements and the total number of elements in the blob is printed out together with the name of the node, where this blob is used.

77. ... elements of ... were clipped to zero while converting a blob for node [...] to ...

This message may appear when the --data_type=FP16 command line option is used. This option implies conversion of all blobs in the mode to FP16. If a value in the blob is so close to zero that it cannot be represented as a valid FP16 value, it is converted to a true zero FP16 value. Depending on the model, it may lead to incorrect results of inference or may not be a problem. The number of such elements and the total number of elements in the blob are printed out together with a name of the node, where this blob is used.

78. The amount of nodes matched pattern ... is not equal to 1

This error occurs when the SubgraphMatch.node_by_pattern function is used with a pattern that does not uniquely identify a single node in a sub-graph. Try to extend the pattern string to make unambiguous match to a single sub-graph node. For more details, refer to Sub-graph Replacement in the Model Optimizer.

79. The topology contains no input layers

Your Caffe* topology .prototxt file is intended for training. Model Optimizer expects a deployment-ready .prototxt file. To fix the problem, prepare a deployment-ready .prototxt file. Usually, preparation of a deploy-ready topology results in removing data layer(s), adding input layer(s), and removing loss layer(s).

80. Warning: please expect that Model Optimizer conversion might be slow

You are using an unsupported Python* version. Use only versions 3.4 - 3.6 for the C++ protobuf implementation that is supplied with the CV SDK. You can still boost conversion speed by building protobuf library from sources. For complete instructions about building protobuf from sources see

Known Issues

Old proto Compiler Breaks protobuf Library

With Python protobuf library version 3.5.1 an incompatibility is possible. This is a known issue for CentOS 7.4

Error log report:

File "../lib64/python3.5/site-packages/google/protobuf/descriptor.py", line 829, in _new_ 
return _message.default_pool.AddSerializedFile(serialized_pb) 
TypeError: expected bytes, str found

A possible workaround is to upgrade the default protobuf compiler (libprotoc 2.5.0) to a newer version, such as libprotoc 2.6.1.

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Arria, Core, Movidius, Pentium, Xeon, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used with permission by Khronos.

*Other names and brands may be claimed as the property of others.