Use TensorFlow* for Deep Learning Training & Testing on a Single-Node Intel® Xeon® Scalable Processor

Introduction

This document provides step-by-step instructions on how to train and test a single-node Intel® Xeon® Scalable processor platform system, using TensorFlow* framework with CIFAR-10 image recognition datasets. This document provides beginner-level instructions, and both training and inference is happening on the same system. The steps have been verified on Intel Xeon Scalable processors, but should work on any latest Intel Xeon processor-based system. None of the software pieces used in this document were performance optimized. This document is follow-on documentation to the article Deep Learning Training and Testing on a Single-Node Intel® Xeon® Scalable Processor System Using Intel® Optimized Caffe* .

This document is targeted toward a beginner-level audience who want to learn how to proceed with training and testing a deep learning dataset using TensorFlow framework once they have Intel Xeon CPU-based hardware. The document assumes that the reader has basic Linux* knowledge and is familiar with the concepts of deep learning training. The instructions can be confidently used as they are, or can be the foundation for enhancements and/or modifications.

There are various ways to install TensorFlow. You can install it using binary packages or from GitHub* sources. This document describes one of the ways that was successfully deployed and tested on a single Intel Xeon Scalable processor system, running CentOS* 7.3. Some other installation methods can be found in^2,18. The goal of this document is not to give an elaborate description of how to reach state of the art performance; rather, it’s to dip a toe into TensorFlow and run a simple train and test using the CIFAR-10 dataset on a single-node Intel Xeon Scalable processor system.

This document is divided into six major sections including the introduction. Section II details hardware and software bill of materials used to implement and verify the training. Section III covers installing CentOS Linux as the base operating system. Section IV covers the details on installing and deploying TensorFlow using one of the many ways to install it. Sections V and VI enlist the steps needed to train and test the model with the CIFAR-10 dataset.

The hardware and software bill of materials used for verified implementation is mentioned in Section II. Users can try a different configuration, but the configuration in Section II is recommended. Intel® Parallel Studio XE Cluster Edition is an optional installation for single-node implementation. It provides you with most of the basic tools and libraries in one package. Starting with Intel Parallel Studio XE Cluster Edition from the beginning accelerates the learning curve needed for multi-node implementation of the same training and testing, as this software is significantly instrumental on a multi-node deep learning implementation.

Hardware and Software Bill of Materials

Item	Manufacturer	Model/Version
Hardware
Intel® Server Chassis	Intel	R1208WT
Intel® Server Board	Intel	S2600WT
(2x) Intel® Xeon® Scalable processor	Intel	Intel Xeon® Gold 6148 processor
(6x) 32 GB LRDIMM DDR4	Crucial*	CT32G4LFD4266
(1x) Intel® SSD 1.2 TB	Intel	S3520
Software
CentOS Linux* Installation DVD		7.3.1611
Intel® Parallel Studio XE Cluster Edition		2017.4
TensorFlow*		setuptools-36.7.2-py2.py3-none-any.whl

Installing the Linux* Operating System

This section requires the following software component: CentOS-7-x86_64-*1611.iso. The software can be downloaded from the CentOS website.

DVD ISO was used for implementing and verifying the steps in this document, but the reader can use Everything ISO and Minimal ISO, if preferred.

Insert the CentOS 7.3.1611 install disc/USB. Boot from the drive and select Install CentOS 7.
Select Date and Time.
If necessary, select Installation Destination.
1. Select the automatic partitioning option.
2. Click Done to return home. Accept all defaults for the partitioning wizard if prompted.
Select Network and host name.
1. Enter “<hostname>” as the hostname.
  1. Click Apply for the hostname to take effect.
2. Select Ethernet enp3s0f3 and click Configure to set up the external interface.
  1. From the General section, check Automatically connect to this network when it’s available.
  2. Configure the external interface as necessary. Save and Exit.
3. Select the toggle to ON for the interface.
4. Click Done to return home.
Select Software Selection.
1. In the box labeled Base Environment on the left side, select Infrastructure server.
2. Click Done to return home.
Wait until the Begin Installation button is available, which may take several minutes. Then click it to continue.
While waiting for the installation to finish, set the root password.
Click Reboot when the installation is complete.
Boot from the primary device.
Log in as root.

Configure YUM*

If the public network implements a proxy server for Internet access, Yellowdog Updater Modified* (YUM*) must be configured in order to use it.

Open the /etc/yum.conf file for editing.
Under the main section, append the following line:
Proxy=http://<address>:<port>
Where <address> is the address of the proxy server and <port> is the HTTP port.
Save the file and Exit.

Disable updates and extras. Certain procedures in this document require packages to be built against the kernel. A future kernel update may break the compatibility of these built packages with the new kernel, so we disable repository updates and extras to provide further longevity to this document.

This document may not be used as is when CentOS updates to the next version. To use this document after such an update, it is necessary to redefine repository paths to point to CentOS 7.3 in the CentOS vault. To disable repository updates and extras:

Yum-config-manager --disable updates --disable extras

Install EPEL

Extra Packages for Enterprise Linux (EPEL) provides 100 percent, high-quality add-on software packages for Linux distribution. To install EPEL (must have the latest version for all packages):

Yum –y install (download from https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm)

Install GNU* C Compiler

Check whether the GNU Compiler Collection* is installed. It should be part of the development tools install. You can verify the installation by typing:

gcc --version or whereis gcc

Install TensorFlow* Using virtualenv¹⁸

Update to the latest distribution of EPEL:
yum –y install epel-release
To install TensorFlow, you must have the following dependencies installed¹⁰:
1. NumPy*: a numerical processing package that TensorFlow requires.
2. Devel*: this enables adding extensions to Python*.
3. Pip*: this enables installing and managing certain Python packages.
4. Wheel*: this enables managing Python compressed packages in wheel formal (.whl).
5. Atlas*: Automatically Tuned Linear Algebra Software.
6. Libffi*: Library provides Foreign Function Interface (FFI) that allows code written in one language to call code written in another language. It provides a portable, high-level programming interface to various calling conventions¹¹.

Install dependencies:

sudo yum -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel python-numpy

Install virtualenv.
There are various ways to install TensorFlow¹⁸. In this document we will use virtualenv. Virtualenv is a tool to create isolated Python environments¹⁶:
pip install --upgrade virtualenv
Create a virtualenv in your target directory:
virtualenv --system-site-packages <targetDirectory>
Example: virtualenv --system-site-packages tensorflow
Activate your virtualenv¹⁸:
source ~/<targetDirectory>/bin/activate
Example: source ~/tensorflow/bin/activate
Upgrade your packages, if needed:
pip install --upgrade numpy scipy wheel cryptography
Install the latest version of Python compressed TensorFlow packages:
pip install --upgrade
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl (none of the other versions worked for me so I tried this). OR just do this:
pip install --upgrade tensorflow

Screenshot of code or command prompt

Train a Convolutional Neural Network (CNN) using a CIFAR-10 dataset³

Download the CIFAR10 training data in /tmp/ directory:
Download the cifar-10 python version from^4,8: https://www.cs.toronto.edu/~kriz/cifar.html
Unzip the tar file in the /tmp/ area as the python script (cifar10_train.py) looks for data in this directory:
tar –zxf <dir>/cifar-10-python.tar.gz
Change directory to tensorflow:
cd tensorflow
Make a new directory:
mkdir git_tensorflow
Change directory to the one created in last step:
cd git_tensorflow
Get a clone of the tensorflow repository from GitHub⁹:
Git clone https://github.com/tensorflow/tensorflow.git
If you notice the Models folder is missing from the tensorflow/tensorflow directory, you can get a Git of models from⁹
https://github.com/tensorflow/models.git:
cd tensorflow/tensorflow
git clone https://github.com/tensorflow/models.git
Upgrade TensorFlow to the latest version or you might see errors when training your model:
pip install --upgrade tensorflow
Change directory to CIFAR-10 dir to get the training and evaluation Python scripts¹⁴:
cd models/tutorials/image/cifar10
Before running the training code, you are advised to check the cifar10_train.py code and change steps from 100K to 60K if needed, as well as logging frequency from 10 to whatever you prefer.
For this document, tests were done for both 100K steps and 60K steps, for a batch size of 128, and logging frequency of 10.
Now, run the training Python script to train your network:
python cifar10_train.py

This will take few minutes and you will see something like the image below:

Screenshot of code or command prompt

Testing Script and Dataset Terminology:

In the neural network terminology:

One epoch = one forward pass and one backward pass ofallthe training examples.
Batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need. TensorFlow (TF) pushes all of those through one forward pass (in parallel) and follows with a back-propagation on the same set. This is one iteration, or step.
Number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass equals one forward pass plus one backward pass (we do not count the forward pass and backward pass as two different passes).
Steps parameter tells TF to run X of these iterations to train the model.

Example: if you have 1000 training examples, and your batch size is 500, then it will take two iterations to complete one epoch.

To learn more about the difference between epoch versus batch size versus iterations, read the article¹⁵.

In the cifar10_train.py script:

Batch size is set to 128. It is the number of images to process in a batch.
Max_step is set to 100000. It is the number of iterations for all epochs. On GitHub code there is a typo; instead of 100K, the number shows 1000K. Please update before running.
The CIFAR-10 binary dataset in⁴ has 60000 images; 50000 images to train and 10000 images to test. Each batch size is 128, so the number of batches needed to train is 50000/128 ~ 391 batches for one epoch.
The cifar10_train.py used 256 epochs, so the number of iterations for all the epochs is ~391*256 ~ 100K iterations or steps.

Evaluate the Model

To evaluate how well the trained model performs on a hold-out data set, we will be using the cifar10_eval.py script⁸:

python cifar10_eval.py

Once expected accuracy is reached you will see a precision@1 = 0.862 printed on your screen when you run the above command. This can be run while your training script in the steps above is still running and is reaching the end number of steps. Or it can be run after the training script has finished.

Screenshot of code or command prompt

Sample Results

Notice that the cifar10_train.py script shows the following results:

Screenshot of code or command prompt

I added a similar-looking result below that was achieved with the system described in Section II of this document. Please be advised that these numbers are only for educational purposes and no specific CPU optimizations were done.

System	Step Time (sec/batch)	Accuracy
2S Intel® Xeon® Gold processors	~ 0.105	85.8% at 60K steps (~2 hrs)
2S Intel Xeon Gold processors	~0.109	86.2% at 100K steps (~3 hrs)

Once you have finished training and testing for your CIFAR-10 dataset, the same Models directory has images for MNIST* and AlexNet* benchmarks. It could be educational to go into MNIST and AlexNet directories and try running the Python scripts there to see the results.

References:

Install TensorFlow on CentOS7, https://gist.github.com/thoolihan/28679cd8156744a62f88
Installing TensorFlow on Ubuntu*, https://www.tensorflow.org/install/install_linux
Install TensorFlow on CentOS7, http://www.cnblogs.com/ahauzyy/p/4957520.html
The CIFAR-10 dataset, https://www.cs.toronto.edu/~kriz/cifar.html
Tensorflow, MNIST and your own handwritten digits, http://opensourc.es/blog/tensorflow-mnist
TensorFlow Tutorial, https://github.com/Hvass-Labs/TensorFlow-Tutorials
Tutorial on CNN on TensorFlow, https://www.tensorflow.org/tutorials/deep_cnn
CIFAR-10 Details, https://www.tensorflow.org/tutorials/deep_cnn
TensorFlow Models, https://github.com/tensorflow/models.git
Installing TensorFlow from Sources, https://www.tensorflow.org/install/install_sources
Libffi, https://sourceware.org/libffi/
Performance Guide for TensorFlow, https://www.tensorflow.org/performance/performance_guide#optimizing_for_cpu
What is batch size in neural network? https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network
Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009
Epoch vs Batch Size vs Iterations, https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9
Virtualenv, https://virtualenv.pypa.io/en/stable/
CPU Optimizations: https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture
Download and Setup, https://www.tensorflow.org/versions/r0.12/get_started/os_setup

Use TensorFlow* for Deep Learning Training & Testing on a Single-Node Intel® Xeon® Scalable Processor

Introduction

Hardware and Software Bill of Materials