Traffic Light Detection Using the TensorFlow* Object Detection API

Abstract

This case study evaluates the ability of the TensorFlow* Object Detection API to solve a real-time problem such as traffic light detection. The experiment uses the Microsoft Common Objects in Context (COCO) pre-trained model called Single Shot Multibox Detector MobileNet from the TensorFlow Zoo for transfer learning. Intel® Xeon® and Intel® Xeon Phi^TM processor-based machines were used for the study. At the end of this experiment, we obtained an accurate model that was able to identify the traffic signals at more than 90 percent accuracy.

Introduction

With the advancements in technology, there has been a rapid increase in the development of autonomous cars or smart cars. Accurate detection and recognition of traffic lights is a crucial part in the development of such cars. The concept involves enabling autonomous cars to automatically detect traffic lights using the least amount of human interaction. Automating the process of traffic light detection in cars would also help to reduce accidents.

Traditional approaches in machine learning for traffic light detection and classification are being replaced by deep learning methods to provide state-of-the-art results. However, these methods create various challenges. For example, the distortion or variation in images due to orientation, illumination, and speed fluctuation of vehicles could result in false recognition.

The experiment was implemented using transfer learning of the Microsoft Common Objects in Context (COCO) pre-trained model called Single Shot Multibox Detector (SSD) with MobileNet. A subset of the ImageNet* dataset, which contains traffic lights, was used for further training to improve the performance. For this particular experiment, the entire training was done on an Intel® Xeon Phi^TM processor and the inferencing was done on an Intel® Xeon® processor. However, the Intel Xeon processor-based machine can be used for both training and inferencing.

Hardware Details

Tables 1 and 2 list the configuration used for the Intel Xeon Phi and Intel Xeon processors:

Table 1. Intel® Xeon Phi™ processor configuration.

Table 2. Intel® Xeon® processor configuration.

Software Configuration

The development of this use case had the following dependencies as shown in Table 3.

Library	Version
TensorFlow*	1.4.0 (built from source)
Python*	3 or later
Operating system	CentOS* 7.3.1
Protobuf	2.6
Pillow	1.0
Lxml	4.1.1
Matplotlib	2.1.0
MoviePy	0.2
GCC* (GNU Compiler Collection*)	6+

Table 3. Software configuration

Installation

Building and Installing TensorFlow Optimized for Intel® Architecture

TensorFlow can be installed and used with several combinations of development tools and libraries on a variety of platforms. The following are the steps to build and install TensorFlow optimized for Intel® architecture¹ with the Intel® Math Kernel Library 2017 on Ubuntu*-based systems.

git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout r1.4
wget https://github.com/bazelbuild/bazel/releases/download/0.7.0/bazel-0.7.0-installer-linux-x86_64.sh
wget --no-check-certificate -c --header "Cookie: oraclelicense=accept-securebackupcookie" http://download.oracle.com/otn-pub/java/jdk/8u151-b12/e758a0de34e24606bca991d704f6dcbf/jdk-8u151-linux-x64.tar.gz
export PATH=/opt/intel/intelpython3.5/bin/:${PATH}
conda create -n tensorflow python=3.5
source activate tensorflow
tar -zxvf jdk-8u151-linux-x64.tar.gz
export JAVA_HOME=$WRKDIR/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
export PATH=$PATH:$JAVA_HOME/bin:/home/intel-user3/bazel/output
chmod 755 bazel-0.7.0-installer-linux-x86_64
./bazel-0.7.0-installer-linux-x86_64 --user --prefix=~/bazel
./configure
bazel build --config=mkl -c opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install /tmp/tensorflow_pkg/tensorflow-1.4.0

Installing LabelImg

Download the latest version of LabelImg, an annotation tool for Microsoft Windows*². Extract the zip file, and then rename the folder as LabelImg.

Solution Design

The solution was implemented with the TensorFlow Object Detection API using Intel architecture. The detection pipeline is given below.

Traffic detection pipeline

Algorithm 1: Detection Pipeline
boxAssigned← false
while true do
		f ←nextFrame
		while boxAssigned == false do
           		 InvokeDetection(f)
		if Bounding Box is detected then
				boxAssigned ← true
				class ← identfiedClass
				if class is Trafficlight then
					drawBoundingBox
				end if
        end if
end while

Why choose TensorFlow Object Detection API?

TensorFlow’s Object Detection API is a powerful tool that makes it easy to construct, train, and deploy object detection models³. In most of the cases, training an entire convolutional network from scratch is time consuming and requires large datasets. This problem can be solved by using the advantage of transfer learning with a pre-trained model using the TensorFlow API. Before getting into the technical details of implementing the API, let’s discuss the concept of transfer learning.

Transfer learning is a research problem in machine learning that focuses on storing the knowledge gained from solving one problem and applying it to a different but related problem. Transfer learning can be applied three major ways⁴:

Convolutional neural network (ConvNet) as a fixed feature extractor: In this method the last fully connected layer of a ConvNet is removed, and the rest of the ConvNet is treated as a fixed feature extractor for the new dataset.

Fine-tuning the ConvNet: This method is similar to the previous method, but the difference is that the weights of the pre-trained network are fine-tuned by continuing backpropagation.

Pre-trained models: Since modern ConvNets takes weeks to train from scratch, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, TensorFlow Zoo⁵ is one such place where people share their trained models/checkpoints.

In this experiment, we used a pre-trained model for the transfer learning. The advantage of using a pre-trained model is that instead of building the model from scratch, a model trained for a similar problem can be used as a starting point for training the network. Many pre-trained models are available. This experiment used the COCO pre-trained model/checkpoints SSD MobileNet from the TensorFlow Zoo. This model was used as an initialization checkpoint for training. The model was further trained with images of traffic lights from ImageNet. This fine-tuned model was used for inference.

Now let’s look at how to implement the solution. The TensorFlow Object Detection API has a series of steps to follow, as shown in Figure 1.

Solution design

Figure 1. Solution design

1. Dataset download

The dataset for fine-tuning the pre-trained model was prepared using over 600 traffic light images from ImageNet⁶. The dataset contains over ten million URLS of images from various classes. The traffic light images were downloaded from the URLs and saved for annotation.

2. Image Annotation

Configuring the LabelImg tool. Before starting with the annotation of images, the classes for labelling needs to be defined in the LabelImg/data/predefined_classes.txt file. In this case, there’s only one class which is trafficlight.
Launch labelimg.exe and then select the dataset folder by clicking the OpenDir icon on the left pane.
For each image that appears, draw a rectangular box across each traffic light by clicking the Create RectBox icon. These rectangular boxes are known as bounding boxes. Select the category trafficlight from the drop-down list that appears.
Repeat this process for every traffic light present in the image. Figure 2 shows an example of a completely annotated image.

Annotated image

Figure 2. Annotated image

Once the annotations for an image are completed, save the image to any folder.

The corresponding eXtensible Markup Language (XML) files will be generated for each image in the specified folder. XML files contain the coordinates of the bounding boxes, filename, category, and so on for each object within the image. These annotations are the ground truth boxes for comparison. Figure 3 represents the XML file of the corresponding image in Figure 2.

XML file structure

Figure 3. XML file structure

3. Label map preparation

Each dataset requires a label map associated with it, which defines a mapping from string class names to integer class IDs. Label maps should always start from ID 1.

As there is only one class, the label map for this experiment file has the following structure:

item {
	id: 1
	name: 'trafficlight'
}

4. TensorFlow records (TFRecords) generation

TensorFlow accepts inputs in a standard format called a TFRecord file, which is a simple record-oriented binary format. Eighty percent of the input data is used for training and 20 percent is used for testing. The split dataset of images and ground truth boxes are converted to train and test TFRecords. Here, the XML files are converted to csv, and then the TFRecords are created. Sample scripts for generation are available here.

5. Pipeline configuration

This section discusses the configuration of the hyperparameters, and the path to the model checkpoints, ft. records, and label map. The protosun files are used to configure the training process that has a few major configurations to be modified. A detailed explanation is given in Configuring the Object Detection Training Pipeline. The following are the major settings to be changed for the experiment.

In the model config, the major setting to be changed is the num_classes that specifies the number of classes in the dataset.
The train config is used to provide model parameters such as batch_size, learning_rate and fine_tune_checkpoint. fine_tune_checkpoint field is used to provide path to the pre-existing checkpoint.
The train_input_config and eval_input_config fields are used to provide paths to the TFRecords and the label map for both train as well as test data.

Table 4 depicts the observations of hyperparameter tuning for various trials of batch_size and learning_rate.

Hyperparameter Tuning
LEARNING RATE	BATCH SIZE	LOSS
0.005	16	~7.2 to 3.4
0.001	16	~3.5 to 1.4
0.0001	8	~1.8 to 0.5

Table 4. Hyperparameter tuning

Note: The numbers in Table 4 are indicative. Results may vary depending on hyperparameter tuning.

6. OpenMP* (OMP) parameters configuration

There are various optimization parameters that can be configured to improve the system performance. The experiment was attempted with OMP_NUM_THREADS equal to 8. However the experiment could be tried with OMP_NUM_THREADS up to four less than the number of cores.

7. Training

The final task is to assemble all that has been configured so far and run the training job (see Figure 4). Once the optimization parameters like OMP_NUM_THREADS, KMP_AFFININTY, and the rest are set, the training file is executed. By default, the training job will continue to run until the user terminates it explicitly. The models will be saved at various checkpoints.

Training pipeline

Figure 4. Training pipeline

8. Inference

The inferencing video was first converted into frames using MoviePy, a Python* module for video editing. These sets of frames are given to our model trained using transfer learning. After the frames pass through the Object Detection pipeline, the bounding boxes will be drawn on the detected frames. These frames are finally merged to form the inferred video (see Figure 5).

Inference pipeline

Figure 5. Inference pipeline

Experimental Results

The following detection (see Figures 6 and 7) was obtained when the inference use case was run on a sample YouTube* video available at: https://www.youtube.com/watch?v=BMYsRd7Qq0I

Raw frame

Figure 6. Raw frame

Inferenced frame

Figure 7. Inferenced frame

Conclusion and Future Work

From the results, we observed that the traffic lights were detected with a high level of accuracy. Future work involves parallel inferencing across multiple cores.