Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Using the Intel® Distribution for Python* to Solve the Scene-Classification Problem Efficiently

$
0
0

Abstract: The objective of this task is to get acquainted with image and scene categorization. Initially, we try to extract the image features, prepare a classifier using the training samples, and then assess the classifier on the test set. Later, we considered pre-trained AlexNet and ResNet models, and fine-tuned and applied them on the considered dataset.

Technology stack: Intel® Distribution for Python*

Frameworks: Intel® Optimization for Caffe* and Keras

Libraries used: NumPy, scikit-learn*, SciPY Stack

Systems used: Intel® Core™ i7-6500U processor with 16 GB RAM (Model: HP envy17t-s000cto) and Intel® AI DevCloud

Dataset

The scene database provides pictures from the eight classes: coast, mountain, forest, open country, street, inside city, tall buildings, and highways, respectively. The dataset is divided into a training set (1,888 images) and testing set (800 images), which are placed separately in their respective folders. The associated labels are stored in "train labels.csv" and "test labels.csv." The SIFT word descriptors are likewise included in "train sift features" and "test sift features" directories.

The following are a few of the images from the dataset:

Training set

mountain view
ocean view
house
building

Testing set

street view
mountain view
house
ocean view

K-nearest neighbor (knn) classifier

Bag of visual words

We execute the K-means cluster algorithm to register a visual word dictionary. The component measurement (feature dimension) of the SIFT feature is 128. To build a bag of visual words, we utilize the included SIFT word descriptors incorporated into the "train sift features" and "test sift features" directories.

Classifying the test images

The method used to classify the images is called k-nearest neighbor (kNN) classifier.

Results

Number of Clustersk valueAccuracy (%)
50549.375
501552.25
641553.125
751552.375
1001554.5
100955.25
1501853.125

Discriminative Classifier—support Vector Machines (SVMs)

Bag of visual words

We execute the K-means cluster algorithm to register a visual word dictionary. The component measurement (feature dimension) of the SIFT feature is 128. Along these lines, we are utilizing an indistinguishable technique from above for the bag of visual word representation.

SVMs

Support Vector Machines (SVMs) are inherently two-class classifiers. We utilize one vs. all SVMs for preparing the multiclass classifier.

Results

Number of ClustersAccuracy (%)
5040.625
6446.375
7547.375
10052.5
15051.875

Transfer Learning and Fine Tuning

Transfer learning

A popular approach in deep learning where pre-trained models that are developed to solve a specific task are used as the starting point for a model on a second task.

Fine tuning

This process takes a network model that has already been trained for a given task, and makes it perform a second similar task.

How to use it

  1. Select source model: A pre-trained source model is chosen from available models. Many research institutions release models on large and challenging datasets that may be included in the pool of candidate models from which to choose from.
  2. Reuse model: The pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
  3. Tune model: Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.

When and why to use it

Transfer learning is an optimization; it's a shortcut to save time or get better performance.

In general, it is not obvious that there will be a benefit to using transfer learning in the domain until after the model has been developed and evaluated.

There are three possible benefits to look for when using transfer learning:

  1. Higher start: The initial skill (before refining the model) on the source model is higher than it otherwise would be.
  2. Higher slope: The rate of improvement of skill during training of the source model is steeper than it otherwise would be.
  3. Higher asymptote: The converged skill of the trained model is better than it otherwise would be.

We apply transfer learning with the pre-trained AlexNet model to demonstrate the results over the chosen subset of places database. Furthermore, we supplant only class score layer with another completely associated layer having eight nodes for eight classifications.

Results

Architectures UsedTop-1 Accuracy (%)Top-3 Accuracy (%)Top-5 Accuracy (%)
AlexNet51.2568.6581.35
ResNet53.457487.25
GoogLeNet52.3371.3682.84

Top-1 Accuracy: Accuracies obtained while considering the top-1 prediction.

Top-3 Accuracy: Accuracies obtained while considering the top-3 predictions.

Top-5 Accuracy: Accuracies obtained while considering the top-5 predictions.

Training Time Periods (For Fine Tuning)

Architecture UsedSystemTraining Time
AlexNetIntel® AI DevCloud~23 min
AlexNet

 

HP envy17t-s000cto~95 min
ResNetIntel® AI DevCloud~27 min
ResNetHP envy17t-s000cto~135 min
GoogLeNetIntel® AI DevCloud~23 min
GoogLeNetHP envy17t-s000cto~105 min

Note: Have considered smaller datasets and experimented to test the speeds and accuracies that can be achieved by using Intel Distribution fot Python.

Conclusion

From the above experiments, it is quite clear that deep-learning methods are performing much better than extracting the features using traditional methods and applying machine-learning techniques for the scene-classification problem.

In the future, I want to design a new deep neural network by making some changes to the proposed architecture so that accuracies can be further increased. I would also like to deploy in AWS* DeepLens and make it real time.

Click GitHub for source code.

Please visit Places for more advanced techniques and datasets.


Viewing all articles
Browse latest Browse all 3384

Trending Articles