Deep Neural Networks (DNNs) are on the cutting edge of the Machine Learning domain. These algorithms received wide industry adoption in the late 1990s and were initially applied to tasks such as handwriting recognition on bank checks. Deep Neural Networks have been widely successful in this task, matching and even exceeding human capabilities. Today DNNs have been used for image recognition and video and natural language processing, as well as in solving complex visual understanding problems such as autonomous driving. DNNs are very demanding in terms of compute resources and the volume of data they must process. To put this into perspective, the modern image recognition topology AlexNet takes a few days to train on modern compute systems and uses slightly over 14 million images. Tackling this complexity requires well optimized building blocks to decrease the training time in order to meet the needs of the industrial application.

Intel MKL 2017 introduces the DNN domain, which includes functions necessary to accelerate the most popular image recognition topologies, including AlexNet, VGG, GoogleNet and ResNet.

These DNN topologies rely on a number of standard building blocks, or primitives, that operate on data in the form of multidimensional sets called tensors. These primitives include convolution, normalization, activation and inner product functions along with functions necessary to manipulate tensors. Performing computations effectively on Intel architectures requires taking advantage of SIMD instructions via vectorization and of multiple compute cores via threading. Vectorization is extremely important as modern processors operate on vectors of data up to 512 bits long (16 single-precision numbers) and can perform up to two multiply and add (Fused Multiply Add, or FMA) operations per cycle. Taking advantage of vectorization requires data to be located consecutively in memory. As typical dimensions of a tensor are relatively small, changing the data layout introduces significant overhead; we strive to perform all the operations in a topology without changing the data layout from primitive to primitive.

Intel MKL provides primitives for most widely used operations implemented for vectorization-friendly data layout:

Direct batched convolution
Inner product
Pooling: maximum, minimum, average
Normalization: local response normalization across channels (LRN), batch normalization
Activation: rectified linear unit (ReLU)
Data manipulation: multi-dimensional transposition (conversion), split, concat, sum and scale.

Programming model

Execution flow for the neural network topology includes two phases: setup and execution. During the setup phase the application creates descriptions of all DNN operations necessary to implement scoring, training, or other application-specific computations. To pass data from one DNN operation to the next one, some applications create intermediate conversions and allocate temporary arrays if the appropriate output and input data layouts do not match. This phase is performed once in a typical application and followed by multiple execution phases where actual computations happen.

During the execution step the data is fed to the network in a plain layout like BCWH (batch, channel, width, height) and is converted to a SIMD-friendly layout. As data propagates between layers the data layout is preserved and conversions are made when it is necessary to perform operations that are not supported by the existing implementation.

Intel MKL DNN primitives implement a plain C application programming interface (API) that can be used in the existing C/C++ DNN framework. An application that calls Intel MKL DNN functions should involve the following stages:

Setup stage: for given a DNN topology, the application creates all DNN operations necessary to implement scoring, training, or other application-specific computations. To pass data from one DNN operation to the next one, some applications create intermediate conversions and allocate temporary arrays if the appropriate output and input data layouts do not match.

Execution stage: at this stage, the application calls to the DNN primitives that apply the DNN operations, including necessary conversions, to the input, output, and temporary arrays.

The appropriated examples for training and scoring computations may be find out into MKL package directory: <mklroot>\examples\dnnc\source

Performance

Caffe, a deep learning framework developed by Berkeley Vision and Learning Center (BVLC), is one of the most popular community frameworks for image recognition. Together with AlexNet, a neural network topology for image recognition, and ImageNet, a database of labeled images, Caffe is often used as a benchmark. The chart below shows performance comparison of original Caffe implementation and Intel optimized version, that takes advantage of optimized matrix-matrix multiplication and new Intel MKL 2017 DNN primitives on Intel Xeon E5-2699 v4 (codename Broadwell) and Intel Xeon Phi 7250 (codename Knights Landing).

Summary

DNN primitives available in Intel MKL 2017 can be used to accelerate Deep Learning workloads on Intel Architecture. Please refer to Intel MKL Developer Reference Manual and examples for detailed information.

Introducing DNN primitives in Intel(R) MKL

Programming model

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112