ABSTRACT

Face It is a mobile application that uses computer vision to acquire data about a user’s facial structure as well as machine learning to determine the user’s face shape. This information is then combined with manually inputted information to give the user a personalized set of hair and beard styles that are guaranteed to make the user look his best. A personalized list of tips are also generated for the user to take into account when getting a haircut.

1. INTRODUCTION

To create this application, various procedures, tools and coding languages were utilized.
The procedures that were used include:
(1) Computer vision with haar-Cascade files to detect a person’s face
(2) Machine learning, specifically using a convolutional neural network and transfer learning to identify a person’s face shape
(3) A preference sorting algorithm to determine what styles look best on a person based on collected data

The programs/tools that were used include:
(1) Ubuntu v17.04
(2) Android Studios
(3) Intel Optimized TensorFlow
(4) Intel’s OpenCV

The coding languages that were used include:
(1) Java
(2) Python

2. Computer Vision

For this application we used Intel’s OpenCV library along with haar cascade files to detect a person’s face.

Haar-like features are digital features used in object recognition. They owe their name to their intuitive similarity with Haar wavelets and were used in the first real-time face detector. ^[1] A large amount of these haar-like features are put together to determine an object with sufficient accuracy and these files are called haar-cascade classifier files. These methods were used and tested in the Viola-Jones object detection framework. ^[2]

In particular the Frontal Face Detection file is being used to detect the user’s face. This file, along with various other haar-cascade files can be found here: http://alereimondo.no-ip.org/OpenCV/34.

This library and file was incorporated into our application to ensure that the user’s face is detected since the main objective is to determine the user’s face shape.

Figure 1: Testing out the OpenCV Library as well as the Frontal Face Haar-Cascade Classifer file in real-time.

OpenCV was integrated into Android’s camera2 API in order for this real-time processing to occur. An android device with an API level of 21 or higher is required to run tests and use the application because the camera2 API can only be used by phones of that version or greater.

3. Machine Learning

3.1 Convolutional Neural Networks

For the facial recognition aspect of our application, the process of using machine learning with a convolutional neural network(CNN) was used.

CNN’s are very commonly associated with image recognition and they can be trained with little difficult. The accuracy of a trained CNN is very high when it comes to detecting a correct image.

CNN architectures are inspired by biological processes and include variations of multilayer receptors that result in minimal amounts of preprocessing. ^[3] In a CNN, there are multiple layers that each have distinct functions to help us recognize an image. These layers include a convolutional layer, pooling layer, rectified linear unit (ReLU) layer, fully connected layer and loss layer.

Figure 2: A diagram of a convolutional neural network in action^[4]

- The Convolutional layer acts as the core of any CNN. The network of a CNN develops a 2-dimensional activation map that detects the special position of a feature at all the given spatial positions which are set by the parameters.

- The Pooling layer acts as a form of down sampling. Max Pooling is the most common implementation of pooling. Max Pooling is ideal when dealing with smaller data sets which is why we are choosing to use it.

- The ReLU layer is a layer of neurons which applies an activation function to increase the nonlinear properties of the decision function and of the overall network without affecting the receptive fields of the convolutional layer itself.

- The Fully Connected Layer, which occurs after several convolutional and max pooling layers, does the high-level reasoning in the neural network. Neurons in this layer have connections to all the activations amongst the precious layers. After, the activations for the Fully Connected layer are computed by a matrix multiplication and a bias offset.

- The Loss layer specifies how the network training penalizes the deviation between the predicted and true layers. Softmax Loss is the best for this application as this is ideal for detecting a single class in a set of mutually exclusive classes.

3.2 Transfer Learning with TensorFlow

The layers of a CNN can be connected in various different orders and variations. The order depends on what type of data you are using and what kind of results you are trying to get back.

There are various well-known CNN models that have been created and put out into the public for research and use. These models include the AlexNet^[5]which uses two GPU’s to train the model and various separate and combined layers. This model was entered in the ImageNet Large Scale Visual Recognition Competition ^[6] in 2012 and won. Another example is the VGGNet^[7] that is a very deep net and uses many convolutional layers in its architecture.

A very popular CNN architecture for image classification is the Inception v3 or GoogLeNet model created by Google. This model was entered in the ImageNet Large Scale Visual Recognition Competition in 2014 and won.

Figure 3: A diagram of Google’s Inception v3 convolutional neural network model^[8]

As you can see, there are various convolutional, pooling, ReLU, fully connected and loss layers being used in a specific order which will help output extremely accurate results when trying to classify an image.

This model is so well put together that many developers use a method called transfer learning with the Inception v3 model. Transfer learning is a technique that shortens the process of training a model from scratch by taking a fully-trained model from a set of categories like ImageNet and re-training it with the existing weights but for new classes.

Figure 4: Diagram showing the difference between Traditional Machine Learning and Transfer Learning^[9]

To use the process of transfer learning for the application, TensorFlow was used along with a Docker image. This image had all the repositories needed for the process. Then the Inception v3 re-train model was loaded on to TensorFlow where we were able to re-train it with the dataset needed for our application to recognize face shapes.

Figure 5: How the Inception v3 model looks during the process of transfer learning^[10]

During the process of transfer learning, only the last layer of the pre-trained model is dissected and modified. This is where the dataset for our application was inputted to be trained. The model uses all the previous knowledge it has acquired from the previous data to train the new data as accurately as possible.

This is the beauty of transfer learning and this is why using this time can save so much time and extremely accurate. Through a re-train algorithm the images within the dataset were passed through the last layer of the model and the model was accurately re-trained.

3.3 Dataset

There are many popular datasets that were created and collected by many individuals to help further the advancement and research of convolutional neural networks. One common dataset used is the MNIST dataset for recognizing handwritten digits.

Figure 6: Example of the MNIST dataset that is used for training and recognizing hand written digits. ^[11]

This dataset consists of thousands of images of handwritten digits and people can uses this dataset to train and test the accuracies of their own convolutional neural networks. Another popular dataset is the CIFAR-10^[12] dataset that consists of thousands of images of 10 different objects/animals: an airplane, an automobile, a bird, a cat, a deer, a dog, a frog, a horse, a ship and a truck.

It is good to have large amounts of data but it is very hard to collect large amounts of data so that is why many collections are already made and ready to use for practice and training.

The objective of our CNN model was to recognize a user’s face shape and in order for it to do so, it was fed various images of people with different face shapes.

The face shapes were categorized into six different shapes: square, round, oval, oblong, diamond and triangular. A folder was created for each face shape and each folder contained various images of people with that certain face shape.

Figure 7: Example of the contents inside the folder for the square face shape

These images were gathered from various reliable articles about face shapes and hairstyles. We made sure to collect as accurate data as possible to get the best results. In total we had approximately 100 images of people with each type of face shape within each folder.

Figure 8: Example of a single image saved in the square face shape folder.^[13]

These images were fed and trained through the model for 4000 iterations (steps) for get maximum accuracy.

While these images were being trained various bottlenecks were created. Bottlenecks contain the information about every image after it has been trained through the model various amounts of times.

Figure 9: Various bottlenecks being created while re-training the Inception v3 CNN

A few other files are also created including a retrained graph that has all the new information that you will need if you want to now recognize the images that you have just trained the model on.

This file is fine to use if they are to be used to recognize images on a computer but if we want to use this file on a mobile device then we would have to compress it but have it contain all the information necessary for it to still be accurate.

In order to do this we have to optimize the file to fit the size that we need. To do this we modify the following features of the file:

(1) We remove all nodes that aren't needed for a given set of input and output nodes

(2) We merge explicit batch normalization operations

After this we are left with two main files that we will load into Android Studio to use with our application.

Figure 10: Files that need to be imported into Android Studio

These files consist of the information needed to identify an image that the model has been trained to recognize once it is seen through a camera.

3.4 Accuracy

The accuracy of the retrained model is very important since the face shape being determined should be as accurate as possible for the user.

To have a high level of accuracy we had to make sure that various factors were taken into account. We had to make sure that we had a sufficient amount of images for the model to be trained on. We also had to make sure that the model trained on the images a sufficient amount of iterations.

For the first few trials we were getting a lot of mixed results and the accuracy for a predicted face shape was all over the place. For one image we were getting a 82% accuracy while for another image we were getting a 62% accuracy. This was obviously not good and we wanted to have much more accurate and precise data.

Figure 11: An example of a low accuracy level that we were receiving with our initial dataset.

At first we were using approximately 50 images of each face shape but to improve our low accuracy we increased this number to approximately 100 images of each face shape. These images were carefully hand-picked to fit the needs of our application and face shape recognition software. We wanted to reach a benchmark average accuracy of approximately 90%.

Figure 12: An example of a high accuracy level we were receiving after the changes we made with the dataset.

After these adjustments we saw a huge difference with our accuracy level and reached the benchmark we were aiming for. When it came time to compress the files necessary for the face shape detection software to work, we made sure that the accuracy level was not affected.

For ease of use by the user, after testing the accuracy levels of our application, we adjusted the code to output the highest percentage face shape that it detected in a simple and easy to read sentence rather than having various percentages appearing on the screen.

4. Application Functionality

4.1 User Interface

The user interface of the application consists of three main screens:

(1) The face detection screen with the front-side camera. This camera screen will appear first so that the user can figure out his face shape right away with no hesitation. After the face shape detector has figured out the user’s face shape, the user can click on the “Preferences” button to go to the next screen.

(2) The next screen is the preferences screen where the user inputs information about himself. The preference screen will ask the user to input certain characteristics about himself including the user’s face shape that he just discovered through the first screen (square, round, oval, oblong, diamond or triangular), the user’s hair texture (straight, wavy or coiled), the user’s hair thickness (thick or thin), if the user has facial hair (yes or no), the acne level of the user (none, moderate, excessive or prefer not to answer), and the lifestyle of the user (business, athlete or student). After the user has selected all of his preferences he can click on the “Get Hairstyles!” button to go to the final screen.

(3) The final output screen is where a list of recommended hair/ beard styles along with tips the user can use when getting a haircut will be presented. The preferences that the user selects will go through a sorting algorithm that was created for this application. Afterwards, the user will be able to swipe through the final recommendation screen and be able to choose from various hair/beard styles. An image will complement each style so the user has a better idea of how the style looks. Also a list of tips will be generated so that the user will know what to say to his barber when getting a haircut.

Figure 13: This is a display of all the screens of the application. From left to right: Face shape detection screen, preferences screen, final recommendation screen with tips that the user can swipe through.

The application was meant to have a very simplistic design so we chose very basic complementary colors and a simple logo that got the point of the application across. To integrate our ideas of how the application should look into Android Studio we made sure to create a .png file of our logo and to take down the hexcolor code of the colors that we wanted to use. Once we had those, we used Android Studio’s easy to use user interface creator and added a layer for the toolbar and a layer for the logo.

4.2 Preference Sorting Algorithm

The preference screen was organized with six different spinners, one for every preference. Each option for each preference was linked to a specific array full of various different hair/beard styles that fit that one preference.

Figure 14: Snippet of the code used to assign each option of every preference an array of hairstyles.

These styles were categorized by doing extensive research on what styles fit every option within each preference. Then these arrays were sorted to find the hairstyles that were in common with every option the user chose.

For example, let’s say the user has a square face shape and straight hair. The hair styles that look good with a square face shape may be a fade, a combover and a crew cut. These three hairstyles would be put into an array for the square face shape option. The hairstyles that look good with straight hair may be a combover, a crew cut and a side part.. These three hairstyles would be put into an array for the straight hair option. Now these two arrays would be compared and whatever hairstyles the two arrays have in common would be placed into a new and final array with the personalized hairstyles that look good for the user based on both the face shape and hair type preferences. In this case, the final array would consist of combover and a crew cut since these are the two hairstyles that both preferences had in common. These hairstyles would then be outputted and recommended to the user.

Figure 15: Snippet of the code used to compare the six different preference arrays so that one final personalized array of hairstyles can be formed.

Once the final list of hairstyles is created, an array of images is created to match the same hairstyles in the final list and this array of images is used to create a gallery of personalized hairstyles that the user can swipe through and see what he likes and what he doesn’t like.

In addition, a list of tips are outputted for the user to view and take into consideration. These tips are based on what preference the user selected. For example, if the user selected excessive acne, a tip may be to go for a long hair style to keep the acne slightly hidden. These tips are generated by various if-statements and outputted on the final screen. Since this application cannot control every aspect of a user’s haircut we are hoping that these tips will be taken into consideration by the user and hopefully used when describing to the barber what type of haircut the user is looking for.

Figure 16: An example of how the outputted tips would look for the user once he selects his preferences.

5. Programs and Hardware

5.1 Intel Optimized TensorFlow

TensorFlow was a key framework that made it possible for us to train our model and have our application actually detect a user’s face shape.

TensorFlow was installed onto the Linux operating system, Ubuntu by following this tutorial:

https://www.tensorflow.org/install/install_linux

Intel’s TensorFlow optimizations were installed by following this tutorial:

https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture

Intel has optimized the TensorFlow framework in various ways to help improve the results of training a neural network and using TensorFlow in general. They have made many modifications to help people use CPUs for this process through Intel’s MKL (Math Kernal Library) optimized operations. They have also developed a custom pool allocator and a faster way to perform back propagation to also help improve results.

After all this had been installed, Python was used to write commands to facilitate with the transfer learning process and to re-train the convolutional neural network.

5.2 Android Studio

Android Studio is the main development kit used to create the application and make it come to life. Since both TensorFlow and Android are run under Google, they had various detailed tutorials explaining how to combine the trained data from TensorFlow and integrate it with Android Studio. ^[14] This made the process very simple as long as the instructions were followed.

Figure 17: Snippet of code that shows how the viewPager is used for sliding through various images

Android Studio also made it simple to create basic .xml files for the application. These .xml files were very customizable and allowed the original mock-ups of the application to come to life and take form. When creating these .xml files we were sure to click on the option to “infer constraints.” Without this option being checked, the various displays such as the text-view box or the spinners would be in random positions when the application is fully built. Also, the application should run very smoothly. Tutorials on how to connect two activities together^[15] and how to create a view-page image gallery^[16] were used to help make the application easily useable and smooth.

Figure 18: An example of inferring constraints to make sure everything appears properly during the full build.

5.3 Mobile Device

A countless number of tests were required to make sure certain parts of the code were working whenever a new feature was added to the application. This tests were done through an actual android smart phone that was given to us by Intel.

The camera2 that is used for this application requires an android phone with an API level of 21 or higher or of version 5.1 or higher so we used a phone model with an API level of 23. Though the camera was slow at time, the overall functionality of this device was great.

Whenever a slight modification was done to the code for this application, a full build and test was always done on this smartphone to ensure that the application was still running smoothly.

Figure 19: The Android phone we used with an API level of 23. You can see the Face It application logo in the center of the screen.

6. Summary and Future Work

Using various procedures, programs, tools and languages, we were able to form an application that that uses computer vision to acquire data about a user’s facial structure and machine learning, specifically transfer learning, to detect a person’s face shape. We then put this information as well as user inputted information through a preference sorting algorithm to output a personalized gallery of hairstyles for the user to view and choose from as well as personalized tips the user can tell his barber when getting a haircut or take into consideration when styling or growing out his hair.

There is always room for improvement and we definitely plan to improve many aspects of this application including even more accurate face shape detection results, an even cleaner looking user interface and many more hair and beard styles for the user to choose and select from.

ACKNOWLEDGEMENTS

I would like to personally thank the Intel Student Ambassador Program for AI for supporting us through the creation of this application and for the motivation to keep on adding to it. I would also like to thank Intel for providing us with the proper hardware and software that was necessary for us to create and test the application.