Solving Latency Challenges in End-to-End Deep Learning Applications

Intel® Student Ambassador David Ojika Uses Intel® Movidius™ Myriad™ 2 Technology for Specialized Vision Processing at the Edge

Image may be NSFW.
Clik here to view. ai banner

Abstract

The Intel® Student Ambassador Program for Artificial Intelligence, part of the Intel® AI Academy, collaborates with universities around the globe. The program offers key resources to artificial intelligence (AI) students, data scientists and developers, including education, access to newly optimized frameworks and technologies, hands-on training, and workshops. This paper details the decoupling of cloud-based deep learning training from accelerated inference at the edge.

While the compute-intensive process of training convolutional neural networks (CNNs) can be greatly enhanced in the cloud, cloud communication introduces the problem of latency which may lead to lagging inference performance in edge devices and mission-critical applications.

Image may be NSFW.
Clik here to view. Movidius Myriad 2 Intel fellowship recipient David Ojika and graduate research assistant Vahid Daneshmand set out to resolve the problem using specialized vision processors and distributed computing architecture. Their technique, conclusions and future work as they explored end-to-end image analytics with the Intel® Movidius™ Myriad™ 2 vision processing unit (VPU) are examined here.

The compute-intensive process of training machine learning models is being accelerated by cloud computing. Cloud communications, however, introduce the problem of latency during model inference, leading to lagging performance for edge applications.

Solve Deep Learning Challenges with Intel® Technology

Ojika is an Intel fellowship for Code Modernization recipient and a recent doctoral graduate in computer engineering at the University of Florida. He has completed several internships at Intel, where he worked on near-memory accelerators and heterogeneous platforms including Intel® Xeon® processors and FPGAs. Ojika’s research interest spans systems research, focusing on machine learning platforms and architectures for large-scale, distributed data analytics.

Ojika’s Intel internship exposed him to a broad range of hardware and software systems from the company that enabled him to advance his Ph.D. studies. That exposure prompted him to continue his collaboration with Intel as an Intel® Student Ambassador, helping build an AI community at the University of Florida.

Image may be NSFW.
Clik here to view. ai cpu brain The training of CNNs is highly computation-expensive, often requiring several hours or days of training with moderate hardware. Deploying the trained model for inference can present unique challenges depending on specific application requirements, for example real-time response, low power utilization, reduced form factor, ease of updating and managing trained models, and so forth. Intel Movidius Myriad 2 technology was chosen as a development platform to address some of these challenges.

Accelerating CNN Architectures with VPUs at the Edge

Much research has gone into utilizing GPUs to train CNNs, which are commonly used in image recognition. But, researchers have dedicated less attention to real-time performance of CNNs in resource-constrained environments where low latency or low power is of utmost importance.

This project leveraged a specialized, low-power VPU at the edge to accelerate the inferencing process of CNNs. The researchers presented a method that simplifies CNN/end-application integration with a microservices approach, presenting a loosely-coupled architecture, allowing for the elastic scaling of CNN “services” per requests. These processing inference requests, feature a light-weight front-end (for request-admission) and a load-sensitive back-end (for request-processing), exposing to end-applications simplified web interfaces and language-independent APIs serving CNN models.

Image may be NSFW.
Clik here to view. Software architecture diagram

Figure 1. Software architecture

Key to the success of their research was the Intel Movidius 2 VPU, the industry’s first always-on vision processor. Offering high performance using low power, this family of vision processors gives developers immediate access to the vision processing core, enabling them to differentiate their development for proprietary capabilities. The Intel Movidius 2 VPU also offers its dedicated vision processing platform in a small footprint.

Image may be NSFW.
Clik here to view. system overview diagram

Figure 2. System Overview

Image may be NSFW.
Clik here to view. Intel Movidius Myriad 2

Intel Movidius 2 VPU

The first step in their development was to integrate trained CNN models into the Intel Movidius technology tool chain. For demonstration purposes, the team obtained publicly available, pre-trained models, including GoogleNet*, ResNet-50 trained with ImageNet dataset on Caffe* and TensorFlow*. Next, they compiled each of the Caffe and TensorFlow models into Movidius-specific file formats using the provided Intel® Movidius™ Neural Compute Stick (NCS) toolkit. This toolkit also supports other advanced features such as checking and profiling of compiled models.

Next, the team designed and implemented two microservices, a Java-based front-end and a Python* based back-end (figure 1) which were then deployed on an Intel Atom® processor-based platform as shown in the figure 2. Requests were received by the Intel Atom processor-based platform on behalf of the Intel Movidius Myriad 2 VPU, which then processed those requests accordingly.

Finding Workarounds for Virtualization Support

A major issue Ojika encountered involved virtualization support for the Intel Movidius NCS. Although his team managed to find a workaround, they have alerted the Intel Movidius NCS team to the challenge and hope to integrate a solution in their future development efforts.

The Intel Movidius NCS toolkit, it should be noted, provides an important tool for dealing with trained CNNs in end-to-end deployment scenarios such as Ojika’s use case. The toolkit is Python based, with intuitive APIs that allowed the team to easily integrate the Intel Movidius NCS tool chain into custom applications.

A Simpler Way to Deploy Deep Neural Networks

Ojika’s solution will significantly reduce the management complexity of deploying CNNs at scale in resource-constrained environments. And, it will help maximize resource utilization, including energy, and network bandwidth, as well as return on hardware investment. Currently, the solution is useful for real-time video analytics, such as in drones, surveillance and facial recognition.

At present, the number of clients and back-end components limits performance. In the future, they plan to implement an automated, elastic scaling mechanism for handling requests within a set of defined service-level agreements. And, they will design an efficient resource utilization scheme based on network traffic and power constraints. The researchers also plan to explore the use of overlay networks for a larger-scale deployment of their proposed architecture.

The Intel Movidius Myriad 2 VPU was found to achieve real-time performance for CNN inference on embedded devices. Ojika and Daneshmand proposed a software architecture that presents inference as a web service, enabling a shared platform for image analytics on embedded devices and latency-sensitive applications.

Check out David's Intel® Developer Mesh project for more details and updates.

Join the Intel® AI Academy

Sign up for the Intel® AI Academy and access essential learning materials, community, tools and technology to boost your AI development. Apply to become an Intel AI Student Ambassador and share your expertise with other student data scientists and developers.

References

D. Guo, W. Wang, G. Zeng and Z. Wei, "Microservices Architecture Based Cloudware Deployment Platform for Service Computing," 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE), Oxford, 2016

Ganguly, Arijit, et al. “IP over P2P: enabling self-configuring virtual IP networks for grid computing.” Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. IEEE, 2006

Solving Latency Challenges in End-to-End Deep Learning Applications

Abstract

Solve Deep Learning Challenges with Intel® Technology

Accelerating CNN Architectures with VPUs at the Edge

Finding Workarounds for Virtualization Support

A Simpler Way to Deploy Deep Neural Networks

Join the Intel® AI Academy

References

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112