Caffe* Scoring Optimization Intel® Xeon® Processor E5 Series

In continued efforts to optimize Deep Learning workloads on Intel® architecture, our engineers explore various paths leading to the maximum performance. Not long ago, a technical preview of optimized AlexNet training on Caffe was published. Now we are sharing another preview of our work completely focused on the classification path and bringing it to new performance levels never demonstrated before on an Intel CPU.

Once again, we used Caffe deep learning framework developed by Berkeley Vision and Learning Center (BVLC) as the vehicle for demonstrating our results.

We have focused solely on optimizing the classification path of AlexNet* and CaffeNet* topologies. Neither the training path nor other topologies are parts of this preview and they are not intended to be run with this package.

In the future, highly optimized routines for both scoring and training will become available in Intel® Math Kernel Library (Intel® MKL) and Intel® Data Analytics Acceleration Library (Intel® DAAL).

Performance achieved by this package was possible thanks to innovative approach to code generation and very low level optimizations applied to critical routines, i.e. convolution function. The code is parallel at CPU level (multithreaded) and Caffe is modified to execute on both sockets in a dual socket system, but doesn’t use OpenMP* for this purpose.

The package supports AlexNet and CaffeNet topologies classification with batches of 96 and uses Python* interface provided by Caffe, because of its ease of use. This interface has been tweaked to provide more accurate metrics on actual network classification performance. The original version included Python code's overhead to the actual metrics. We believe this is a non-optimal way to perform these tests because actual network input data can be continuously loaded into memory, in an asynchronous manner by different devices, without interrupting classification.

Getting started

First of all, you will need all dependencies required by Caffe and its Python scripts from the June version (commit SHA: b051ce474425d4e991cf674107ee5f6999a2be38). Then, you need to unpack the package which contains step-by-step instructions how to run our demo.

For your convenience, we are showing these instructions here:

Consider DEMO_ROOT as a main directory of unpacked archive.
Get reference ilsvrc12 data.
Go to DEMO_ROOT/data/ilsvrc12/ subdir and run:
get_ilsvrc.sh
Get reference networks' parameters:

Go to DEMO_ROOT/models/bvlc_reference_caffenet/ subdir and run:
Go to DEMO_ROOT/models/bvlc_alexnet/ subdir and run:
get_reference_model.sh

Go to DEMO_ROOT/python subdir.
Make sure you have sudo access.

sudo is required to change default thread scheduler from round-robin to fifo and to change threads' priority. It provides better performance in our case.
In case you don't want to run it in sudo mode a change is required in the execute_classification.sh script.

Get some images in png format.
Required once per dataset – run:

sudo ./split_data.sh <path/to/image/dir>

Run:
sudo ./execute_classification.sh <path/to/image/dir> <iterations> <topology>

<iterations> is count of internal time-averaging loop iterations
use >1 if measuring internal implementation throughput, see the *note below

use 1 otherwise

<topology> can be "caffenet" or "alexnet" (without quotes)

After several seconds (depends on the number of images to classify) needed to initialize input by the Python script, you will see metrics printed to screen from all sockets in the system.
Classification results:

both binary and text output files are provided
each node has its own result file

*) Because of Python heavy interference after each batch, performance metrics can be underestimated when using single run. More iterations will give more accurate results, achieving maximum performance.

A special thank you to Krzysztof Badziak, Jacek Czaja, Jaroslaw Dukat, Bartosz Kalinczuk, Piotr Majcher, Piotr Majchrzak, Jacek Reniecki and Maciej Urbanski from Intel’s Visual Cloud Computing team and Vadim Pirogov from Intel’s Software Services Group. They were the driving force behind the performance optimizations shown and the Caffe work illustrated in this blog post.

Caffe* is a third-party trademark owned by Berkeley Vision and Learning Center (BLVC). Other names and brands may be claimed as the property of others.

Caffe* Scoring Optimization Intel® Xeon® Processor E5 Series

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112