Using the Intel® Distribution for Python* to Solve the Scene-Classification Problem Efficiently

Abstract: The objective of this task is to get acquainted with image and scene categorization. Initially, we try to extract the image features, prepare a classifier using the training samples, and then assess the classifier on the test set. Later, we considered pre-trained AlexNet and ResNet models, and fine-tuned and applied them on the considered dataset.

Technology stack: Intel® Distribution for Python*

Frameworks: Intel® Optimization for Caffe* and Keras

Libraries used: NumPy, scikit-learn*, SciPY Stack

Systems used: Intel^® Core™ i7-6500U processor with 16 GB RAM (Model: HP envy17t-s000cto) and Intel® AI DevCloud

Dataset

The scene database provides pictures from the eight classes: coast, mountain, forest, open country, street, inside city, tall buildings, and highways, respectively. The dataset is divided into a training set (1,888 images) and testing set (800 images), which are placed separately in their respective folders. The associated labels are stored in "train labels.csv" and "test labels.csv." The SIFT word descriptors are likewise included in "train sift features" and "test sift features" directories.

The following are a few of the images from the dataset:

Training set

Testing set

K-nearest neighbor (knn) classifier

Bag of visual words

We execute the K-means cluster algorithm to register a visual word dictionary. The component measurement (feature dimension) of the SIFT feature is 128. To build a bag of visual words, we utilize the included SIFT word descriptors incorporated into the "train sift features" and "test sift features" directories.

Classifying the test images

The method used to classify the images is called k-nearest neighbor (kNN) classifier.

Results

Number of Clusters	k value	Accuracy (%)
50	5	49.375
50	15	52.25
64	15	53.125
75	15	52.375
100	15	54.5
100	9	55.25
150	18	53.125

Discriminative Classifier—support Vector Machines (SVMs)

Bag of visual words

We execute the K-means cluster algorithm to register a visual word dictionary. The component measurement (feature dimension) of the SIFT feature is 128. Along these lines, we are utilizing an indistinguishable technique from above for the bag of visual word representation.

SVMs

Support Vector Machines (SVMs) are inherently two-class classifiers. We utilize one vs. all SVMs for preparing the multiclass classifier.

Results

Number of Clusters	Accuracy (%)
50	40.625
64	46.375
75	47.375
100	52.5
150	51.875

Transfer Learning and Fine Tuning

Transfer learning

A popular approach in deep learning where pre-trained models that are developed to solve a specific task are used as the starting point for a model on a second task.

Fine tuning

This process takes a network model that has already been trained for a given task, and makes it perform a second similar task.

How to use it

Select source model: A pre-trained source model is chosen from available models. Many research institutions release models on large and challenging datasets that may be included in the pool of candidate models from which to choose from.
Reuse model: The pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
Tune model: Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.

When and why to use it

Transfer learning is an optimization; it's a shortcut to save time or get better performance.

In general, it is not obvious that there will be a benefit to using transfer learning in the domain until after the model has been developed and evaluated.

There are three possible benefits to look for when using transfer learning:

Higher start: The initial skill (before refining the model) on the source model is higher than it otherwise would be.
Higher slope: The rate of improvement of skill during training of the source model is steeper than it otherwise would be.
Higher asymptote: The converged skill of the trained model is better than it otherwise would be.

We apply transfer learning with the pre-trained AlexNet model to demonstrate the results over the chosen subset of places database. Furthermore, we supplant only class score layer with another completely associated layer having eight nodes for eight classifications.

Results

Architectures Used	Top-1 Accuracy (%)	Top-3 Accuracy (%)	Top-5 Accuracy (%)
AlexNet	51.25	68.65	81.35
ResNet	53.45	74	87.25
GoogLeNet	52.33	71.36	82.84

Top-1 Accuracy: Accuracies obtained while considering the top-1 prediction.

Top-3 Accuracy: Accuracies obtained while considering the top-3 predictions.

Top-5 Accuracy: Accuracies obtained while considering the top-5 predictions.

Training Time Periods (For Fine Tuning)

Architecture Used	System	Training Time
AlexNet	Intel® AI DevCloud	~23 min
AlexNet	HP envy17t-s000cto	~95 min
ResNet	Intel® AI DevCloud	~27 min
ResNet	HP envy17t-s000cto	~135 min
GoogLeNet	Intel® AI DevCloud	~23 min
GoogLeNet	HP envy17t-s000cto	~105 min

Note: Have considered smaller datasets and experimented to test the speeds and accuracies that can be achieved by using Intel Distribution fot Python.

Conclusion

From the above experiments, it is quite clear that deep-learning methods are performing much better than extracting the features using traditional methods and applying machine-learning techniques for the scene-classification problem.

In the future, I want to design a new deep neural network by making some changes to the proposed architecture so that accuracies can be further increased. I would also like to deploy in AWS* DeepLens and make it real time.

Click GitHub for source code.

Please visit Places for more advanced techniques and datasets.

Dataset

Training set

Testing set

K-nearest neighbor (knn) classifier

Bag of visual words

Classifying the test images

Results

Discriminative Classifier—support Vector Machines (SVMs)

Bag of visual words

SVMs

Results

Transfer Learning and Fine Tuning

Transfer learning

Fine tuning

How to use it

When and why to use it

Results

Training Time Periods (For Fine Tuning)

Conclusion

Trending Articles