What is Image Classification?
Image classification is a computer vision problem that aims to classify a subject or an object present in an image into predefined classes. A typical real-world example of image classification is showing an image flash card to a toddler and asking the child to recognize the object printed on the card. Traditional approaches to providing such visual perception to machines have relied on complex computer algorithms that use feature descriptors, like edges, corners, colors, and so on, to identify or recognize objects in the image.
Deep learning takes a rather interesting, and by far most efficient approach, to solving real-world imaging problems. It uses multiple layers of interconnected neurons, where each layer uses a specific computer algorithm to identify and classify a specific descriptor. For example if you wanted to classify a traffic stop sign, you would use a deep neural network (DNN) that has one layer to detect edges and borders of the sign, another layer to detect the number of corners, the next layer to detect the color red, the next to detect a white border around red, and so on. The ability of a DNN to break down a task into many layers of simple algorithms allows it work with a larger set of descriptors, which makes DNN-based image processing much more effective in real-world applications.
NOTE: the above image is a simplified representation of how a DNN would identify different descriptors of an object. It is by no means an accurate representation of a DNN used to classify STOP signs.
Image classification is different from object detection. Classification assumes there is only one object in the entire image, sort of like the ‘image flash card for toddlers’ example I referred to above. Object detection, on the other hand, can process multiple objects within the same image. It can also tell you the location of the object within the image.
Practical learning!
You will build...
A program that reads an image from a folder and classifies them into the top 5 categories.
You will learn...
- How to use pre-trained networks to do image classification
- How to use Intel® Movidius™ Neural Compute SDK’s API framework to program the Intel Movidius NCS
You will need...
- An Intel Movidius Neural Compute Stick - Where to buy
- An x86_64 laptop/desktop running Ubuntu 16.04
If you haven’t already done so, install NCSDK on your development machine. Refer NCS Quick Start Guide for installation instructions.
Fasttrack…
If you would like to see the final output before diving into programming, download the code from our sample code repository (NC App Zoo) and run it.
make run
downloads and builds all the dependent files, like the pre-trained networks, binary graph file, ilsvrc dataset mean, etc. We have to runmake run
only for the first time; after which we can runpython3 image-classifier.py
directly.
You should see an output similar to:
Let’s build!
Thanks to NCSDK’s comprehensive API framework, it only takes a couple lines of Python scripts to build an image classifier. Below are some of the user configurable parameters of image-classifier.py:
GRAPH_PATH
: Location of the graph file, against with we want to run the inference- By default it is set to
~/workspace/ncappzoo/caffe/GoogLeNet/graph
- By default it is set to
IMAGE_PATH
: Location of the image we want to classify- By default it is set to
~/workspace/ncappzoo/data/images/cat.jpg
- By default it is set to
IMAGE_DIM
: Dimensions of the image as defined by the choosen neural network- ex. GoogLeNet uses 224x224 pixels, AlexNet uses 227x227 pixels
IMAGE_STDDEV
: Standard deviation (scaling value) as defined by the choosen neural network- ex. GoogLeNet uses no scaling factor, InceptionV3 uses 128 (stddev = 1/128)
IMAGE_MEAN
: Mean subtraction is a common technique used in deep learning to center the data- For ILSVRC dataset, the mean is B = 102 Green = 117 Red = 123
Before using the NCSDK API framework, we have to import mvncapi module from mvnc library
Step 1: Open the enumerated device
Just like any other USB device, when you plug the NCS into your application processor’s (Ubuntu laptop/desktop) USB port, it enumerates itself as a USB device. We will call an API to look for the enumerated NCS device.
Did you know that you can connect multiple Neural Compute Sticks to the same application processor to scale inference performance? More about this in a later article, but for now let’s call the APIs to pick just one NCS and open it (get it ready for operation).
Step 2: Load a graph file onto the NCS
To keep this project simple, we will use a pre-compiled graph of a pre-trained AlexNet model, which was downloaded and compiled when you ran make
inside the ncappzoo
folder. We will learn how to compile a pre-trained network in an another blog, but for now let’s figure out how to load the graph into the NCS.
Step 3: Offload a single image onto the Intel Movidius NCS to run inference
The Intel Movidius NCS is powered by the Intel Movidius visual processing unit (VPU). It is the same chip that provides visual intelligence to millions of smart security cameras, gesture controlled drones, industrial machine vision equipment, and more. Just like the VPU, the NCS acts as a visual co-processor in the entire system. In our case, we will use the Ubuntu system to simply read images from a folder and offload it to the NCS for inference. All of the neural network processing is done solely by the NCS, thereby freeing up the application processor’s CPU and memory resources to perform other application-level tasks.
In order to load an image onto the NCS, we will have to pre-process the image.
- Resize/crop the image to match the dimensions defined by the pre-trained network.
- GoogLeNet uses 224x224 pixels, AlexNet uses 227x227 pixels.
- Subtract mean per channel (Blue, Green and Red) from the entire dataset.
- This is a common technique used in deep learning to center the data.
- Convert the image into a half-precision floating point (fp16) array and use
LoadTensor
function-call to load the image onto NCS.- skimage library can do this in just one line of code.
Step 4: Read and print inference results from the NCS
Depending on how you want to integrate the inference results into your application flow, you can choose to use either a blocking or non-blocking function call to load tensor (previous step) and read inference results. We will learn more about this functionality in a later blog, but for now let’s just use the default, which is a blocking call (no need to call a specific API).
Step 5: Unload the graph and close the device
In order to avoid memory leaks and/or segmentation faults, we should close any open files or resources and deallocate any used memory.
Congratulations! You just built a DNN-based image classifier.
Further experiments
- This example script reads only one image; modify the script to read and infer multiple images from a folder
- Use OpenCV to display the image(s) and their inference results on a graphical window
- Replicate this project on an embedded board like RPI3 or MinnowBoard
- You can use ‘Run NCS apps on RPI’ article as a reference for this experiment
Further reading
- Understand the entire development workflow for Intel Movidius NCS.
- Here’s a good write-up on network configuration, which includes mean subtraction and scaling topics.