Privacy-Preserving Face Features Detection

July 9, 2018, 10:34 am

Latest and popular articles on Intel Technologies

≫ Next: Developer Success Stories Library

≪ Previous: Game Dev with Unity* ML-Agents and Intel® Optimized Python* (Part Two)

Introduction

Advances in computer vision (CV) aim at automating the process of detecting features in human faces for a myriad of applications. CV enables facial recognition for biometric security of devices⁵, changes in facial characterizations due to developing disease, gauging a person’s level of alertness while driving⁶, predicting a pedestrian’s intent to enter the road as interpreted by an autonomous vehicle⁷, and many other critical applications. These advances however come with a price—the users’ privacy is at risk. In this new age, where data is extremely important as well as valuable and potentially very dangerous in the wrong hands, it is paramount that privacy concerns be addressed fully. Privacy is almost always an afterthought in a CV solution design, only addressed at the end to close the security gaps that should have been addressed earlier in the design phase. Between 1999 and 2014, the U.S. government fined companies more than USD 130 million for privacy violations¹—a symptom suggesting that companies still lack the necessary measures and processes to comply with privacy regulations.

We advocate that privacy needs to be considered early on as part of a system design requirement. This strategy enables:

better integration of privacy measures to guide the development process and design decisions,
data minimax principle; that is, using as much data as is required for an application and nothing more,
eliminating the need for computationally expensive privacy by encryption techniques by employing better privacy-preserving methods customized to each application.

In this article, we discuss how to design for privacy preservation in a face detection framework. The design approach enables the extraction of facial features and does not compromise the user’s identity.

Privacy Preservation Approach for Features Detection

To preserve the privacy of users, some applications detect and blur faces in pictures and videos taken; for example, by an autonomous vehicle. Obviously, blurring a face makes learning/training from it much more difficult but not impossible, depending on the application. Our approach is to couple the photo data with the features learned by a face detection neural network. These features can be attached to a photo, similar to embedding geotags within photo metadata.

Figure 1. The architecture of the O-Net stage of the MTCNN.

For our discussion, we use a multitask convolutional neural network (MTCNN)³ to detect features of a human face. The MTCNN generates intermediary feature maps that provide an interesting perspective in terms of what the MTCNN architecture sees and eventually learns. For example, after the first two convolution and pooling layers, the third cascade learns what closely resembles an eigenface² (an average image of what a face should look like).

Filter face features 18
Figure 2a

Filter face features 12
Figure 2b

Filter face features 12 c
Figure 2c

Examples of learned feature maps from the O-Net stage

The MTCNN works as a function of three cascaded convolutional neural networks (CNNs). The first CNN proposes facial regions, via P-Net, and then the second CNN refines those results, via R-Net. The last CNN is an output network, O-Net, that determines the best bounding box regressions for faces and five facial attributes per face. As shown in Figure 1, these filters/feature maps could most definitely be passed along to work in scenarios where one needs to do a binary face classification. Not only does the MTCNN learn general faces, later feature activations pulled from the O-Net can be used to glean how it targets specific facial attributes such as the nose and the corners of the eyes, mouth, and so on.

Figure 3. Estimated facial attributes pulled from last convolution in O-Net.

Unfortunately, given that CNNs typically generate a hierarchy of features, the more complex they become, the harder they are to intuitively recognize via their feature maps. In both instances, each subfigure is reminiscent of Haar-like features used in the Viola-Jones face detector⁴. Figure 2a resembles a feature map or Haar-like features that could be seen as the bridge of a nose due to the relatively bright activations in the middle and starkly dark non-activations on each side. Also, it is worth noting that the activations gradually grow darker, insinuating some kind of illumination or special coloration of the nose. Figure 2b resembles a feature map that could easily be associated with the corner of one of the key facial attributes such as the eye or the mouth. However, packaging all of these feature maps as metadata allows consumers to leverage what has already been learned from a dataset, thus saving them precious compute time.

Conclusions

In this article, we advocate the need for embedding privacy measures early in designing CV methodologies. We show, with examples, a process to detect features in human faces while preserving their identities and hence privacy. We believe that similar techniques can be applied to different machine learning algorithms and we pursue that path in our future research.

About the Authors

Cory Ilo is a computer vision engineer in the Automotive Solutions group at Intel. He helps prototype and research the feasibility of various computer vision solutions in relation to privacy, ethics, deep learning, and autonomous vehicles. In his spare time, Cory focuses on his passion for fitness, video games, and wanderlust, in addition to finding ways on how they tie into computer vision.

Iman Saleh is a research scientist with the Automotive Solutions group at Intel. She holds a Ph.D. from the Computer Science department at Virginia Tech, a master’s degree in Computer Science from Alexandria University, Egypt, and, a master’s degree in Software Engineering from Virginia Tech. Dr. Saleh has 30+ technical publications in the areas of big data, formal data specification, service-oriented computing, and privacy-preserving data mining. Her research interests include machine learning, privacy-preserving solutions, software engineering, data modeling, web services, formal methods, and cryptography.

References

Top 20 Government-imposed Data Privacy Fines Worldwide, 1999-2014.
M. Turk and A. Pentland (1991). Face recognition using eigenfaces (PDF). Proc. IEEE Conference on Computer Vision and Pattern Recognition pp. 586–591.
K. Zhang, Z. Zhang, Z. Li and Y. Qiao. Joint Face Detection and Alignment Using Multi-Task Cascaded Convolutional Networks. IEEE Winter Conference on Applications of Computer Vision, 2014.
P. Viola and M. Jones. Robust Real-time Object Detection. IJCV 2001 pp. 1, 3.
IPhone X. Apple*.
W. Zhang, et al. Driver Drowsiness Recognition Based on Computer Vision Technology. Tsinghua Science and Technology, vol. 17, no. 3, 2012, pp. 354–362.
M. Garzon, D. G. Ramos, A. Barrientos and J. Del Cerro. (2016). Pedestrian Trajectory Prediction in Large Infrastructures - A Long-term Approach based on Path Planning., pp. 381-389.

↧

Developer Success Stories Library

July 10, 2018, 9:51 am

Latest and popular articles on Intel Technologies

≫ Next: Using Sentiment Analysis to Gauge Cryptocurrency Value

≪ Previous: Privacy-Preserving Face Features Detection

Intel® Parallel Studio XE | Intel® System Studio | Intel® Media Server Studio

Intel ® Advisor | OpenVINO™ Toolkit | Intel® Data Analytics Acceleration Library

Intel® Distribution for Python* | Intel® Inspector XE | Intel® Integrated Performance Primitives

Intel® Math Kernel Library | Intel® Media SDK | Intel® MPI Library | Intel® Threading Building Blocks

Intel® VTune™ Amplifer

Intel® Parallel Studio XE

Altair Creates a New Standard in Virtual Crash Testing

Altair advances frontal crash simulation with help from Intel® Software Development products.

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

Envivio Helps Ensure the Best Video Quality and Performance

Intel® Parallel Studio XE helps Envivio create safe and secured code.

ESI Group Designs Quiet Products Faster

ESI Group achieves up to 450 percent faster performance on quad-core processors with help from Intel® Parallel Studio.

F5 Networks Profiles for Success

F5 Networks amps up its BIG-IP DNS* solution for developers with help from
Intel® Parallel Studio and Intel® VTune™ Amplifer.

Fixstars Uses Intel® Parallel Studio XE for High-speed Renderer

As a developer of services that use multi-core processors, Fixstars has selected Intel® Parallel Studio XE as the development platform for its lucille* high-speed renderer.

Golaem Drives Virtual Population Growth

Crowd simulation is one of the most challenging tasks in computer animation―made easier with Intel® Parallel Studio XE.

Lab7 Systems Helps Manage an Ocean of Information

Lab7 Systems optimizes BioBuilds™ tools for superior performance using Intel® Parallel Studio XE and Intel® C++ Compiler.

Mentor Graphics Speeds Design Cycles

Thermal simulations with Intel® Software Development Tools deliver a performance boost for faster time to market.

Massachusetts General Hospital Achieves 20X Faster Colonoscopy Screening

Intel® Parallel Studio helps optimize key image processing libraries, reducing compute-intensive colon screening processing time from 60 minutes to 3 minutes.

Moscow Institute of Physics and Technology Rockets the Development of Hypersonic Vehicles

Moscow Institute of Physics and Technology creates faster and more accurate computational fluid dynamics software with help from Intel® Math Kernel Library and Intel® C++ Compiler.

NERSC Optimizes Application Performance with Roofline Analysis

NERSC boosts the performance of its scientific applications on Intel® Xeon Phi™ processors up to 35% using Intel® Advisor.

Nik Software Increases Rendering Speed of HDR by 1.3x

By optimizing its software for Advanced Vector Extensions (AVX), Nik Software used Intel® Parallel Studio XE to identify hotspots 10x faster and enabled end users to render high dynamic range (HDR) imagery 1.3x faster.

Novosibirsk State University Gets More Efficient Numerical Simulation

Novosibirsk State University boosts a simulation tool’s performance by 3X with Intel® Parallel Studio, Intel® Advisor, and Intel® Trace Analyzer and Collector.

Pexip Speeds Enterprise-Grade Videoconferencing

Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.

Schlumberger Parallelizes Oil and Gas Software

Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.

Ural Federal University Boosts High-Performance Computing Education and Research

Intel® Developer Tools and online courseware enrich the high-performance computing curriculum at Ural Federal University.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® System Studio

CID Wireless Shanghai Boosts Long-Term Evolution (LTE) Application Performance

CID Wireless boosts performance for its LTE reference design code by 6x compared to the plain C code implementation.

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and the OpenVINO™ toolkit.

NERSC Optimizes Application Performance with Roofline Analysis

NERSC boosts the performance of its scientific applications on Intel® Xeon Phi™ processors up to 35% using Intel® Advisor.

Daresbury Laboratory Speeds Computational Chemistry Software

Scientists get a speedup to their computational chemistry algorithm from Intel® Advisor’s vectorization advisor.

Novosibirsk State University Gets More Efficient Numerical Simulation

Novosibirsk State University boosts a simulation tool’s performance by 3X with Intel® Parallel Studio, Intel® Advisor, and Intel® Trace Analyzer and Collector.

Pexip Speeds Enterprise-Grade Videoconferencing

Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.

Schlumberger Parallelizes Oil and Gas Software

Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.

OpenVINO ™ Toolkit

GE Healthcare and Intel Optimize Deep Learning Performance for Healthcare Imaging

Intel® Math Kernel Library and OpenVINO™ toolkit help bring the power of AI to clinical diagnostic scanning and other healthcare workﬂows.

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and the OpenVINO™ toolkit.

Intel® Data Analytics Acceleration Library

MeritData Speeds Up a Big Data Platform

MeritData Inc. improves performance—and the potential for big data algorithms and visualization.

Intel® Distribution for Python*

DATADVANCE Gets Optimal Design with 5x Performance Boost

DATADVANCE discovers that Intel® Distribution for Python* outpaces standard Python.

Intel® Inspector XE

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

Envivio Helps Ensure the Best Video Quality and Performance

Intel® Parallel Studio XE helps Envivio create safe and secured code.

ESI Group Designs Quiet Products Faster

ESI Group achieves up to 450 percent faster performance on quad-core processors with help from Intel® Parallel Studio.

Fixstars Uses Intel® Parallel Studio XE for High-speed Renderer

As a developer of services that use multi-core processors, Fixstars has selected Intel® Parallel Studio XE as the development platform for its lucille* high-speed renderer.

Golaem Drives Virtual Population Growth

Crowd simulation is one of the most challenging tasks in computer animation―made easier with Intel® Parallel Studio XE.

Schlumberger Parallelizes Oil and Gas Software

Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.

Intel® Integrated Performance Primitives

JD.com Optimizes Image Processing

JD.com Speeds Image Processing 17x, handling 300,000 images in 162 seconds instead of 2,800 seconds, with Intel® C++ Compiler and Intel® Integrated Performance Primitives.

Tencent Optimizes an Illegal Image Filtering System

Tencent doubles the speed of its illegal image filtering system using SIMD Instruction Set and Intel® Integrated Performance Primitives.

Tencent Speeds MD5 Image Identification by 2x

Intel worked with Tencent engineers to optimize the way the company processes millions of images each day, using Intel® Integrated Performance Primitives to achieve a 2x performance improvement.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® Math Kernel Library

Aier Eye Hospital Brings Deep Learning to Blindness Prevention

Intel® Math Kernel Library helps speed deep learning throughput for efficient, high-qualitiy, and cost-effective eye health screening.

DreamWorks Puts the Special in Special Effects

DreamWorks Animation’s Puss in Boots uses Intel® Math Kernel Library to help create dazzling special effects.

GE Healthcare and Intel Optimize Deep Learning Performance for Healthcare Imaging

Intel® Math Kernel Library and OpenVINO™ toolkit help bring the power of AI to clinical diagnostic scanning and other healthcare workﬂows.

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and the OpenVINO™ toolkit.

MeritData Speeds Up a Big Data Platform

MeritData Inc. improves performance―and the potential for big data algorithms and visualization.

Qihoo360 Technology Co. Ltd. Optimizes Speech Recognition

Qihoo360 optimizes the speech recognition module of the Euler platform using Intel® Math Kernel Library (Intel® MKL), speeding up performance by 5x.

Intel® Media SDK

NetUP Gets Blazing Fast Media Transcoding

NetUP uses Intel® Media SDK to help bring the Rio Olympic Games to a worldwide audience of millions.

Intel® Media Server Studio

ActiveVideo Enhances Efficiency

ActiveVideo boosts the scalability and efficiency of its cloud-based virtual set-top box solutions for TV guides, online video, and interactive TV advertising using Intel® Media Server Studio.

Kraftway: Video Analytics at the Edge of the Network

Today’s sensing, processing, storage, and connectivity technologies enable the next step in distributed video analytics, where each camera itself is a server. With Kraftway* video software platforms can encode up to three 1080p60 streams at different bit rates with close to zero CPU load.

Slomo.tv Delivers Game-Changing Video

Slomo.tv's new video replay solutions, built with the latest Intel® technologies, can help resolve challenging game calls.

SoftLab-NSK Builds a Universal, Ultra HD Broadcast Solution

SoftLab-NSK combines the functionality of a 4K HEVC video encoder and a playout server in one box using technologies from Intel.

Vantrix Delivers on Media Transcoding Performance

HP Moonshot* with HP ProLiant* m710p server cartridges and Vantrix Media Platform software, with help from Intel® Media Server Studio, deliver a cost-effective solution that delivers more streams per rack unit while consuming less power and space.

Intel® MPI Library

Moscow Institute of Physics and Technology Rockets the Development of Hypersonic Vehicles

Moscow Institute of Physics and Technology creates faster and more accurate computational fluid dynamics software with help from Intel® Math Kernel Library and Intel® C++ Compiler.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® Threading Building Blocks

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

Johns Hopkins University Prepares for a Many-Core Future

Johns Hopkins University increases the performance of its open-source Bowtie 2* application by adding multi-core parallelism.

Mentor Graphics Speeds Design Cycles

Thermal simulations with Intel® Software Development Tools deliver a performance boost for faster time to market.

Pexip Speeds Enterprise-Grade Videoconferencing

Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.

Quasardb Streamlines Development for a Real-Time Analytics Database

To deliver first-class performance for its distributed, transactional database, Quasardb uses Intel® Threading Building Blocks (Intel® TBB), Intel’s C++ threading library for creating high-performance, scalable parallel applications.

University of Bristol Accelerates Rational Drug Design

Using Intel® Threading Building Blocks, the University of Bristol helps slash calculation time for drug development—enabling a calculation that once took 25 days to complete to run in just one day.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® VTune™ Amplifer

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

F5 Networks Profiles for Success

F5 Networks amps up its BIG-IP DNS* solution for developers with help from
Intel® Parallel Studio and Intel® VTune™ Amplifer.

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and the OpenVINO™ toolkit.

Mentor Graphics Speeds Design Cycles

Thermal simulations with Intel® Software Development Tools deliver a performance boost for faster time to market.

Nik Software Increases Rendering Speed of HDR by 1.3x

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

↧

Using Sentiment Analysis to Gauge Cryptocurrency Value

July 9, 2018, 11:04 am

Latest and popular articles on Intel Technologies

≫ Next: Case Study - Authentication and Authorization for the Autonomous Vehicle Data Center Platform

≪ Previous: Developer Success Stories Library

Deep learning provides a way to analyze sentiments about cryptocurrencies by scanning and evaluating comments across the web, including news headlines, Twitter posts, and Reddit posts.

"I have learned that there is correlation between sentiment and cryptocurrency prices. This is something that may be helpful to other developers: to explore new industries and see how current AI technologies can be applied to create a solution or insight within a new area."

—Teju Tadi, Artificial Intelligence Ambassador Program, Intel Corporation

Challenge

Determining the valuation of cryptocurrencies is difficult because the worth does not tightly correspond to factors such as cash flow or available assets, as it does in the conventional stock market. With dozens of currencies available in the market, a system is needed to methodically evaluate their worth.

Solution

Emerging neural network models, including recursive neural tensor networks (RNTN), provide a promising mechanism for determining the feelings about any given currency by scanning and parsing comments in the social media. Favorable sentiments about a currency can be shown to correspond with an uptick in the currency’s value across digital coin exchanges.

Background and Project History

Teju Tadi, an Intel AI Ambassador with a strong interest in blockchain and cryptocurrencies, launched a project in May 2017, applying deep learning techniques to investigate the correlation between trader sentiments for cryptocurrencies and their market value. Sentiment analysis as a way of helping gauge trading trends has already gained adherents in the traditional stock market. Because the valuation data available for cryptocurrencies is more nebulous, Teju is refining techniques to combine trader sentiments with other factors to create better ways to anticipate trends.

"Many firms in the equities space have been employing techniques," he said, "which make key investment decisions based on social media data and news headlines. There are algorithms that make decisions to instantaneously buy a certain equity as soon as positive news is released—faster than any one person could. The same methodology, I thought, could be applied to the cryptocurrency space, because a lot of these currencies are sentiment driven. Social media and news inevitably affect the prices of various currencies greatly."

"I first got interested in blockchain technology and cryptocurrencies back when I was high school," Teju said. "At that time there were fewer than 100 cryptocurrencies and the industry was still in its infancy. The market evolved once the Ethereum blockchain protocol launched. Ethereum and other platform tokens enabled developers to deploy software for blockchain protocols much faster. As of today, there are over 1600 cryptocurrencies and blockchain protocols and many projects still launching."

Teju’s interest in machine learning and deep learning grew as he began working on projects outside of school. A project using machine-learning strategies, ProvidR*, won first place in the Google* Community Leaders Program (CLP) Case Competition. ProvidR offered a scalable solution to food insecurity, aggregating requests by people who visited a particular food bank. He then began a project working with Intel, Face It*, which employed deep learning techniques to map an individual’s facial structure and then recommend a hairstyle suited to their appearance. This early work led to his current endeavor, Deep Learning for Cryptocurrency Trading, that capitalizes on the expertise he has gained in finance, blockchain architecture, cryptocurrency, and AI.

Figure 1. Teju Tadi presenting his project at the Intel® AI DevCon 2018 (Intel® AIDC) in San Francisco.

"Working with Intel’s AI Student Ambassador program, Teju said, "has opened so many opportunities for me. It’s allowed me to connect with peers working in AI space across the world. It gives me access to Intel engineers who are more than willing to help me on my projects. Intel also provides me with access to hardware which helps me test and deploy any applications that I am working on. Most of all it gives me a great environment, which fosters and enables me to pursue my AI-related interests and projects."

Watch a video showing Teju talking about his project.

The evolution of cryptocurrencies and blockchain

Cryptocurrencies serve as a medium for exchanging digital assets; transactions within this medium are recorded using blockchain techniques, which operate as an encrypted, electronic ledger providing a permanent history of all activities. Because cryptocurrencies are issued based on a finite supply, investors anticipate that rising demand will generate increasing value in the long term. The tamper-resistant nature of the blockchain's historical entries—logically linked together in a continuous chain—provide a mechanism to securely authorize and log transfers of cryptocurrencies from one party to another.

Even as the potential of blockchain as a secure digital alternative to traditional financial processes is being realized, the underlying blockchain architecture—enabling distributed database capabilities—presents opportunities in other applications. Enterprises are exploring ways in which blockchain could be used as a part of Internet of Things (IoT) solutions to manage and gain insights into supply chain activities and global operations. Blockchain architecture could also be used in healthcare information systems to distribute information from medical devices and consolidate and distribute an individual's healthcare records.

"Because blockchain is based on a distributed, peer-to-peer topology where data can be stored globally on thousands of servers—and anyone on the network can see everyone else’s entries in real-time—it’s virtually impossible for one entity to gain control of or game the network."¹

—Lucas Mearian, Senior Reporter, Computerworld

Commercial opportunities

In an article for the Intel® Developer Zone (Intel® DZ), Teju stated, "The long-term vision of this project is to be able to develop an AI cryptocurrency trading bot that can not only consider trader sentiment to make trading decisions but also take advantage of other opportunities such as arbitrage, which is the purchase and sale of an asset to profit from a difference in the price."

Teju took the insights gained from his research as he helped establish a business, Mycointrac*, focused on providing cryptocurrency market intelligence. "Once the product is fully developed," he said, "I plan to utilize the data provided by it as one of the factors to make key investment decisions for my new cryptocurrency hedge fund, Sentience Investments L.P., which has been operational since January first. The plan is to develop trading strategies based on a number of high-frequency, machine-learning techniques, as well as deep learning and sentiment analysis."

Each individual exchange, Teju explained, has its own supply and demand and its own set of buyers and sellers. The market as it stands is very inefficient. A cryptocurrency in a certain exchange might be trading at USD 3 and then, in another market, it will be trading at USD 5, for the same currency. Traders use these differential valuations to advantage by executing arbitrage—buying on the exchange where it is trading for USD 3 and then selling it at the exchange where it is trading for USD 5 for a riskless profit of USD 2.

"In Mycointrac, if you click on one of the coins on the market and scroll," said Teju, "you'll see the gap price versus the index. That, basically, is arbitrage. That is showing you how much it is trading up or down at a certain exchange. Sometimes it is like a few percent. Sometimes it could be 50 percent or 100 percent. You will easily see 5 to 10 percent differences on most coins." Figure 2 shows live sentiment tracking displayed in Mycointrac. Working with a team, Teju is taking insights from this project to launch a new site, Mycoinrisk, focusing on fraud prevention and risk mitigation in the cryptocurrency space.

Live sentiment mapping cryptocurrency on MyCointrac
Figure 2. Live sentiment mapping on Mycointrac gauges cryptocurrency chatter on the web.

"As machine learning and artificial intelligence (AI), applications continue to increase and impact accounting and finance responsibilities, the human professionals have an opportunity as well. Not only will they be more productive and proficient, but they will be able to handle more clients and deliver more value because they can determine actionable insight rather than just crunch numbers. Machines will be able to propel innovation in the industry."²

—Bernard Marr, Author and Keynote Speaker on Business and Technology

As is the case with all AI projects, models are continually refined over time, with iterative training to strengthen the results and improve precision of the output. Teju explored several neural network models before deciding on an RNTN as the most effective way to perform natural language processing of social media feeds and news items.

"Many of these cryptocurrency price movements," Teju said, "could be determined by herd instinct. Herd Instinct, according to behavioral finance, is a mentality characterized by lack of individual decision making, causing people to think and act in the same way as the majority of those around them. The price movements tend to be based on market sentiment and the opinions of the communities surrounding the cryptocurrency. Based on these reasons, I believe that sentiment analysis of news headlines, Reddit posts, and Twitter* posts should be the best indicator of the direction of cryptocurrency price movements."

Recurrent neural networks (RNNs) have been a prominent technique for sentiment analysis, Teju noted. RNNs parse a string of text and tokenize the words, determining the frequency of words used and creating what is called a bag-of-words model, often used in document classification with word frequency being used to train a classifier. The subjectivity of each word is searched from a lexicon in which emotional values were prerecorded by researchers. From this data, the overall sentiment is gauged.

"RNNs work well for longer texts," Teju said, "but are ineffective at analyzing sentiment in shorter texts, such as news headlines, Reddit posts, and Twitter posts. RNNs fail to consider all the semantics of linguistics by failing to consider compositionality—the order of words in a string. Because of this, RNNs are ineffective at identifying change in sentiment and understanding the scope of negation."

Recursive neural tensor network

After considering the alternatives, Teju decided that an RNTN would be the best option for his project because of its capability of being able to assess the semantic compositionality of text. For shorter pieces of text, such as a tweet, the compositionality is vital to being able to accurately determine sentiment from a sparse set of information.

"RNTNs," Teju said, "are great at considering syntactical order. RNTNs are made up of multiple parts including the parent group known as the root, the child groups known as the leaves, and the scores. Leaf groups receive input and the root group uses a classifier to determine the class and score."

Recursive neural tensor networks (RNTNs) are neural nets useful for natural-language processing. They have a tree structure with a neural net at each node. You can use recursive neural tensor networks for boundary segmentation, to determine which word groups are positive and which are negative. The same applies to sentences as a whole.

Word vectors are used as features and serve as the basis of sequential classification. They are then grouped into subphrases, and the subphrases are combined into a sentence that can be classified by sentiment and other metrics.

—Excerpt from Recursive Neural Tensor Network, Deeplearning4J

Data that is ingested by the sentiment analyzer is parsed into a binary tree. Specific vector representations are formed of all the words and are represented as leaves. From the bottom up, these vectors become the parameters to optimize and serve as feature inputs to a softmax classifer*. Vectors are classified into five classes and assigned a score.

"The next step," Teju said, "is where recursion occurs. When similarities are encoded between two words, the two vectors move across to the next root. A score and class are outputted. A score represents the positivity or negativity of a parse while the class encodes the structure in current parses. The first leaf group receives the parse and then the second leaf receives the next word. The score of the parse with all three words are outputted and it moves on to the next root group."

"The recursion process continues until all inputs are used up, with every single word included. In practical applications RNTN’s end up being more complex than this. Rather than using the immediate next word in a sentence for the next leaf group, an RNTN would try all the next words and eventually checks vectors that represent entire sub-parses. Performing this at every step of the recursive process, the RNTN can analyze every possible score of the syntactic parse."

Figure 3 shows an example of how a sentence is parsed and analyzed using an RNTN approach.

More details of Teju’s techniques in this project can be found on Intel® Developer Zone (Intel® DZ) in the paper Deep Learning for Cryptocurrency Trading.

Figure 3. Example of scoring from the Stanford Treebank.

Enabling Technologies

Teju gave a nod to the benefits of using of Intel® technologies in his projects. "My solution utilizes Intel® AI Devcloud, Intel® Distribution for Python*, and Intel® Optimization for Caffe*," he said.

Intel AI DevCloud served as the development platform during the early stages of the project. "At the start of the project, for the sandbox version, I used Intel AI DevCloud to run the recurrent neural networks and experiment with Twitter data to see how the models were working. For my initial project with Intel and for my current project with Intel, it is completely using Intel AI DevCloud and its supporting technologies."

Intel AI DevCloud, a server cluster featuring Intel® Xeon® Scalable processors, available to Intel® AI Academy members free of charge, is preloaded with frameworks and tools to quickly launch machine learning and deep learning projects. Pre-installed components include neon™ framework, Intel® Optimization for Theano*, Intel® Optimization for TensorFlow*, Intel® Optimization for Caffe*, and the Keras* library. Connections, once approved, take only about 10 minutes to set up, by means of a Linux® terminal or graphical user interface client, such as PuTTY*. Access through Microsoft Windows* is also supported. At this point, you're ready to begin training models or running Python code. For a thorough introduction to the process, read Getting started with the Intel AI DevCloud. Deep learning can be a challenge for those just getting started, but fortunately there are many resources for gaining an understanding of models and initiating training.

Connections with libraries, code, and examples also proved invaluable during the project development. "Intel® Developer Zone (Intel® DZ) has been a great resource to learn about how I could utilize Intel technologies to better build my product," Teju said. "There are also a great number of projects out there built by peers in the AI industry from which I got both motivation and valuable insight into the various ways AI and ML were being used in a wide range of industries and use cases."

"I recommend checking out the Stanford Sentiment Treebank to learn about recursive neural tensor networks. I also recommend taking a look at the Intel Developer Zone as there are many libraries, tutorials, technical articles, and a plethora of digital content that you could learn from."

Other valuable resources for those getting started with AI include:

Video: What is Intel® Optimized Caffe*

Article: Manage Deep Learning Networks with Caffe* Optimized for Intel® Architecture

Article: Get Started with Intel® Distribution for Python*

AI is Providing New Opportunities in the Financial Sector

Through the design and development of specialized chips, sponsored research, educational outreach, and industry partnerships, Intel is firmly committed to advancing the state of AI to solve difficult challenges in medicine, manufacturing, agriculture, scientific research, and other industry sectors. Intel works closely with government organizations, non- government organizations, educational institutions, and corporations to uncover and advance solutions that address major challenges in the sciences.

For example, consistently gaining high returns on stock market investments has been phenomenally difficult, even for very experienced investors. Bringing AI techniques to bear on this challenge, an international team devised algorithms using past market data to simulate real-time investment. These techniques demonstrated a 73 percent return on investment compared with the 9 percent typical of real market scenarios. The algorithms proved particularly effective during times of extreme market volatility, suggesting that AI can detect and respond to patterns that human investment managers fail to recognize. The lead author of the study was Dr. Christopher Kraus, chair for Statistics and Econometrics at the School of Business and Economics at Germany’s Friedrich-Alexander- Universität Erlangen-Nürnberg.⁴

"AI's role in finance may not attract as much attention in Hollywood, but it is likely to have a far greater economic impact than consumer tech. From extending investment opportunities to the underbanked to thwarting fraud to mitigating investment risks, AI has the potential to not only revolutionize the industry, but also to improve the financial health of millions of people in the US and across the world."⁵

—Kevin Dinino, President of KCD PR

The Intel AI technologies used in this implementation included:

Intel® Xeon® Scalable processor: Tackle AI challenges with a compute architecture optimized for a broad range of AI workloads, including deep learning.

Framework Optimization: Achieve faster training of deep neural networks on a robust scalable infrastructure.

For Intel® AI Academy members, the Intel® AI DevCloud provides a cloud platform and framework for machine learning and deep learning training. Powered by Intel Xeon Scalable processors, the Intel AI DevCloud is available for up to 30 days of free remote access to support projects by academy members.

Join today at: software.intel.com/ai/sign-up

For a complete look at our AI portfolio, visit ai.intel.com/technology.

Naveen Rao

"At Intel, we’re encouraged by the impact that AI is having, driven by its rich community of developers. AI is mapping the brain in real time, discovering ancient lost cities, identifying resources for lunar exploration, helping to protect Earth's oceans, and fighting fraud that costs the world billions of dollars per year, to name just a few projects. It is our privilege to support this community as it delivers world-changing AI across verticals, use cases, and geographies."⁶

—Naveen Rao, Vice President and General Manager, Artificial Intelligence Products Group, Intel

Resources

Intel® AI Academy

Inside Artificial Intelligence: Next-level computing powered by Intel AI

Deep Learning for Cryptocurrency Trading. Details the sentiment analysis techniques for evaluating cryptocurrency trading

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (PDF)

Intel® AI DevCloud. Free cloud compute for Intel AI Academy members

Intel® Developer Zone. Provides libraries, tutorials, technical articles, and digital content for developers

Intel® Software Innovator Program. Supports innovative, independent developers

Intel® Optimization for Caffe*

Intel® Math Kernel Library (Intel® MKL)

References

1. Mearian, Lucas. What is blockchain? The most disruptive tech in decades. Computerworld, May 2018.

2. Marr, Bernard. The Digital Transformation of Accounting and Finance – Artificial Intelligence, Robots, and Chatbots. Forbes, June 2018.

3. Barsolai, Chris. The Complete Developer's Guide to Intel AI Resources and Tools. Medium. October 2017.

4. Walters, Greg. Artificially Intelligent Investors Rack Up Massive Returns in Stock Market Study. Seeker. March 2017.

5. Dinino, Kevin. Five ways AI is disrupting financial services. FinTech Futures, April 2018.

6. Rao, Naveen. Helping Developers Make AI Real. Intel, May 16, 2018.

↧

Case Study - Authentication and Authorization for the Autonomous Vehicle Data Center Platform

July 10, 2018, 1:46 pm

Latest and popular articles on Intel Technologies

≪ Previous: Using Sentiment Analysis to Gauge Cryptocurrency Value

Abstract

The Autonomous Vehicle Data Center (AVDC) platform supports research and development of technology for the fully autonomous car. Platform security involves controlling access to the data, both of users and the petabytes (PB) of sensor data, the various processes such as machine learning and simulation tasks, and the physical resources of compute, storage, and networking. The whole system must be protected from accidental and/or malicious tampering and hijacking. This case study discusses the factors that influenced our choice to implement an authentication and authorization solution.

AVDC Platform

The AVDC platform handles data ingestion, storage, machine-learning training and inference, simulation, and web access. The web portal provides the ability to launch and monitor tasks including manual data labeling tools and monitor progress of various tasks. Vehicles drive into an ingestion garage where data recorded during a drive are uploaded into the data center. Depending on the variety and number of sensors contained in a vehicle, up to 20 terabytes (TB) of data may be recorded per vehicle per hour.

The platform employs an Apache Hadoop* ecosystem to provide big data support and a microservice architecture to support data ingestion, machine learning, and simulation tasks. To provide its security needs, we needed an authorization and authentication mechanism that works across both Hadoop and microservice solutions.

Figure 1. Autonomous Vehicles Data Center platform - components and tasks

Existing Authentication and Authorization Solutions

JSON Web Token

JSON Web Token (JWT) is an open standard that defines a compact and self-contained way to securely transmit information between parties as a JSON object.^{1, 2} A JWT concatenates header, payload, and signature information "header.payload.signature."

The digital signature could be a shared secret or public-private key pair using RSA based HMAC³. The signature can be used to both verify integrity of the claim and its authorship. Additionally, the token can be encrypted to provide privacy to the claims granted.

Next, we explore authentication and authorization solutions that use JWT. The service itself is deployed in high availability (HA) mode.

OAuth 2

OAuth 2 is an open authorization framework accessible via HTTP that follows a delegation model to an authorization service, which issues tokens that capture access scope, type, and validity time interval among other attributes. OAuth 2 provides authorization flows for web, desktop, and mobile applications^4,5.

The single authentication service could be shared by multiple applications, such as a Twitter*, email etc. With OAuth 2, the users are registered with the Authorization Service (or with a backend identity service that the Authorization Service uses), as opposed to being registered with each of the applications, which in turn reduces the number of personal data exposure points.

OAuth 2 supports multiple grant types, where a grant determines access type. Some examples of grant type are: Authorization Code, Implicit, Password, Client Credentials, Device Code, and Refresh Token⁶. For example, web applications, which typically desire user awareness implemented by challenging the user for username and password input, employ Authorization Code type grants. Longer running microservices that support used-initiated web services could be transparent to the user, requiring only an implicit grant.

Apache Knox*

The Apache Knox Gateway*⁷ is a system that provides centralized authentication, access, and auditing for Apache Hadoop* services. The single entry point implies the need to open only a single firewall port to access the entire Hadoop cluster; in addition, it hides the Hadoop cluster topology from attackers. Knox can be configured to work with an LDAP/AD⁸backend for identity and access credentials.

Comparison of differences between OAuth 2 and Apache Knox*

PROS

OAuth 2

Industry standard, very flexible.
Works with different resources types and protocols.
Supports application-determined granular access control based on user roles.
Supports multilayer authorization control.
Authorization delegation model reduces user data exposure.

Apache Knox*

Protects and controls access to Hadoop services possibly across multiple clusters.
Clients need to interact with a single service across their Hadoop ecosystem.

CONS

OAuth 2

Authorization Service single point of failure; could bring down access to multiple applications. Typically deployed in high availability (HA) mode with load balancing.

Apache Knox*

Knox Gateway single point of failure; could deny access to all dependent Hadoop services. Deployed in HA mode.
Focused on Hadoop ecosystem.
Requires development of custom plugins for non-Hadoop applications.
Microservices can be accessed directly if their endpoints are known.
Low granularity access control.
Difficult to provide multilayered authorization control.

AVDC-Adopted Authentication and Authorization Solution

Before we introduced microservices on our platform, we adopted Knox for authentication and authorization. However, the custom plugins that we needed to develop for our microservices soon became development burden that prompted the decision to replace Knox with OAuth 2. OAuth's uniform token support across HTTP and other resources was generally a more applicable solution. Further, it supported the ability for individual applications to define fine-grained access control. For example, a single user might possess an admin role with respect to an application, while with respect to another application just read access and with yet another application perhaps no access. OAuth essentially provides the ability to define multiple roles and what those roles translate to with respect to each application.

A V D C authentication and authorization data flow
Figure 2. AVDC authentication and authorization data flow

The following steps illustrate control flow on the AVDC platform.

User provides credentials to the web portal. These are sent to the Authorization Service.
The Authorization Service contacts the Active Directory instance to establish whether the credentials are valid.
The Authorization Service returns a JWT signed using its private key if the credentials are valid.
The return token is saved in the web portal for the duration of the session.
When any service on the AVDC platform (for example, a data search request, launching a machine-learning training task, or other) is accessed through the web portal, the token saved in the session is forwarded along with the request.
The AVDC Authentication and Authorization service retrieves the token and establishes whether the signature is valid using the Authorization Service's public key. If it is valid, n access scope-related information in the token payload is used by the respective applications to determine accessible functionality and the same provided (step 6a in Figure 2). For Cloudera*-based services, the request is passed on with the token stripped and replaced with the Kerberos*⁹ principal and the keytab associated with the service (step 6b in Figure 2).

The AVDC platform uses an LDAP/AD identity backend to maintain user information and access capabilities. The identity service returns Kerberos tokens in JWT format. The token signature is based on an RSA public-private key to avoid having to deal with shared secrets and protecting them on each of the registered applications. Instead, Authentication Service's public key was distributed to each of the registered applications, to enable them to verify the token signature.

Summary

The AVDC platform uses the OAuth 2 authentication and authorization framework in conjunction with JWT with RSA public private key signatures because it provides ease of use across both Hadoop resources and microservices. OAuth 2 also facilitates fine-grained access control, enabling our platform to provide a rich set of access capabilities across applications that span data ingestion, machine learning, and simulation.

References

1. Introduction to JSON Web Tokens

2. 5 Easy Steps to Understanding JSON Web Tokens

3. Hash-based message authentication codes (HMAC)

4. Introduction to OAuth 2

5. OAuth 2 Authorization Framework

6. A Guide to OAuth 2 Grants

7. Announcing Apache Knox 1.0.0!

8. LDAP user authentication using Microsoft Active Directory - IBM

9. Kerberos Principals and Keytabs - Cloudera

↧