Intel® IoT Gateway Developer Hub and Software Suite/Pro Software Suite Release Notes

November 13, 2016, 11:00 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® IoT Gateway Developer Hub and Software Suite/Pro Software Suite Release Notes ARCHIVE

≪ Previous: SMB Platform Upgrade Instructions 10.8.0

This is the latest release notes for the Intel® IoT Gateway Developer Hub, Intel® IoT Gateway Software Suite, and Intel® IoT Gateway Pro Software Suite.

↧

Intel® IoT Gateway Developer Hub and Software Suite/Pro Software Suite Release Notes ARCHIVE

November 13, 2016, 11:00 pm

Latest and popular articles on Intel Technologies

≫ Next: Saffron Technology™ Cognitive API FAQ

≪ Previous: Intel® IoT Gateway Developer Hub and Software Suite/Pro Software Suite Release Notes

Use this ZIP file to access each available version of the release notes for the Intel® IoT Gateway Developer Hub, Intel® IoT Gateway Software Suite, and Intel® IoT Gateway Pro Software Suite, beginning with production version 3.1.0.17 through the currently released version. The release notes include information about the products, new and updated features, compatibility, known issues, and bug fixes.

↧

Saffron Technology™ Cognitive API FAQ

November 14, 2016, 11:01 am

Latest and popular articles on Intel Technologies

≫ Next: Saffron Technology™ Elements

≪ Previous: Intel® IoT Gateway Developer Hub and Software Suite/Pro Software Suite Release Notes ARCHIVE

What are synchronous and asynchronous APIs?

Saffron’s Thought Processes (THOPs) are user-defined functions that allow you to tie together various Saffron MemoryBase (SMB) and other capabilities using a scripting language. Thought Processes can help perform a wide variety of actions such as: simplifying the execution of complex sequential queries, calling outside applications to be used in conjunction with Saffron reasoning functions, or converting query results into data suitable for UI widgets.

A key feature of Saffron's thought processes is that they can run synchronously or asynchronously.

By default, Saffron APIs run synchronous thought processes. A synchronous process typically runs via an HTTP GET call with the calling client and then waits for the result. Use synchronous THOPs when when you use the default (single-threaded) WebService engine which is known as THOPs 1.0. This process returns results one at a time; thus, it is slower than asynchronous processes. Still, this process is better for developers who need to troubleshoot or debug issues. Typically, synchronous thought processes (THOPs 1.0) are used for the following operations:

simple single queries
fast operations
operations that do not need asynchronization
troubleshooting and debugging in a development environment

Saffron APIs can also run asynchronous thought processes. These processes communicate with calling clients through messages in real time and can operate as long-running operations. Asynchronous APIs are only available with the latest Saffron WebService engine known as THOPS 2.0. This process is much faster than synchronous processes. Typically, asynchronous thought processes (THOPs 2.0) are used for the following operations:

complex queries that cannot be expressed in a single query
business logic when writing apps based on SUIT and THOPs
integrating Saffron APIs and third-party APIs
using a stored procedure in a relational database
deploying code at runtime

Learn more about thought processes.

What are Batch APIs?

Batch APIs allow you to run the same API (or set of APIs) repeatedly over a large number of items. The Batch API collects a large number of items such as records, rows, ids, and attributes. For each item, a Batch API calls one of the core APIs (such as similarity, classification, recommendation) to complete the process.

A key component of batch APIs are the thought processes under which they run. Thought processes (THOPs) are stored procedures that can run synchronously or asynchronously.

Example Synchronous APIs:

Example Asynchronous APIs:

What is a signature?

A signature is a list of attributes (category:value) that best characterizes a query item. It represents the most informative and relative attributes for that item. Once the signature is found, it can be used to provide useful and relevant comparison data.

What is the difference between the Classify Item API and the Nearest Neighbor API?

Both APIs can find the classification of a query item. For example, assume that we want to find out the classification (type) of animal:bear. The way to find the answer differs among the two APIs.

The Classify Item API gathers a list of attributes (signature) that best represents the animal:bear. Next, it finds classifications (or types) that are similar to the bear by comparing the attributes of the classifications against the signature of the bear. It then returns the top classification values based on these similar items.

The Nearest Neighbor API also gathers a list of attributes (signature) that best represents the animal:bear. It is different in that it uses the similarity feature to find similar animals (as opposed to finding similar classifications). From the top list of animals that are the most similar to the bear, the API initiates a voting method to return the top classification values.

When should the Classify Item API be used versus the Nearest Neighbor API?

The decision to use the Classify Item API or the Nearest Neighbor API depends on the available ingested data. Datasets that contain a high percentage of one particular classification negatively affect both the algorithm and probability if the Classify Item API is used. Because the data is swayed towards the same type, the query item could be incorrectly labeled. In this situation, the Nearest Neighbor API can cut through too much weight by finding neighbors that are similar to the query item. Even if it finds only one neighbor, that could be enough to get a correct label.

For example, assume that a dataset contains 100 animals. Of these, 60% are classified as invertebrates and 20% are classified as mammals. In spite of the weighted list, we can use the Nearest Neighbor API to find the classification of animal:bear by finding another animal that shares the attribute produces:milk. Since mammals are the only animals that produce milk, we can accurately conclude that the bear is a mammal.

What does "confidence" measure?

Confidence is a measuring tool in the Classification API suite that answers how confident the algorithm is with a classification decision (I am 99% confident that the bear can be classified as a mammal). It is the algorithm's self-assessment (or rating) of a prediction based on the amount of evidence it has. Typically, low confidence indicates a small amount of evidence in the dataset. Examples of evidence might include similarity strength, homogeneity of the neighborhood, information strength, and/or disambiguation level between classes.

The Classification APIs use confidence to:

automatically remove the low confidence records by human intervention
correct human mistakes
detect anomalies
better extrapolate overall accuracy from the "truth" set to a "training" set

Note: Do not confuse confidence with real accuracy or with Statistical Confidence.

How do "percent" and "similarity" influence "confidence" when using the Nearest Neighbor API to classify an item?

Confidence is the ultimate metric in that it indicates how confident we are that a query item is properly classified. Percent and similarity are used as evidence to compute confidence. Similarity indicates how similar a query item is to its nearest neighbors and percent shows how many of the neighbors have the same classification (or type). So, in a case where a query item has lots of nearest neighbors and those neighbors are the same type, we can conclude with a high level of confidence that the query item shares the same classification as its nearest neighbors.

Confidence levels decrease as the percent and/or similarity values decrease. A lower percentage indicates that not all of the nearest neighbors share the same classification. A lower similarity score indicates that some of the attributes of the nearest neighbors do not closely match the query item. It also indicates that some of the attributes have low "score" values, which means that they are not as relevant to selecting a classification.

What is the metric score in a signature? Why is it important?

For classification APIs, the metric score measures the relevance of an attribute (in a signature) for predicting the classification of a query item. A higher metric score (1) means an attribute has a higher predictive value against the label of the query item.

For example, assume that we are attempting to classify animal:bear. The classification API returns a list of attributes (signature) that characterizes the bear in hopes that we can find similar attributes that will help us classify it. The attribute behaves:breathes has a lower metric score (.5) because it does not help us narrow down the classification of the bear (mammals, reptiles, amphibians, and other types have the same attribute). The attribute produces:milk has a higher metric score (1) because it provides very useful and accurate information that can help us properly classify the bear. Since our data indicates that all animals with the produces:milk attribute are mammals, we can also label the bear as a mammal.

The higher a metric score is for attributes in a signature, the greater the chances of making an accurate classification. For similarity, a higher score means a better chance of finding similar items.

How can I learn more about APIs?

Refer to our API section of SMB documentation.

How can I learn more about thought processes (THOPs)?

If you are familiar with other database products, think of Thought Processes as stored procedures. THOPs can be created via the REST API or by using the developer tools in Saffron Admin. Once a Thought Process is defined, it becomes callable through the standard REST API.

Learn more about thought processes.

↧

Saffron Technology™ Elements

November 14, 2016, 9:13 am

Latest and popular articles on Intel Technologies

≫ Next: MIT License

≪ Previous: Saffron Technology™ Cognitive API FAQ

Saffron Technology™ Elements is a robust developer solution that includes APIs, Widgets, and other items that enable you to take advantage of Saffron's many offerings.

APIs

Saffron Technology APIs enable you to include our offerings in your environment.

View our APIs.

Widgets

Saffron Technology widgets include items such as bar charts, interactive tables, and heat maps that provide visual analytics you can embed in your application.

View our Widgets.

Visual Analytics

Saffron Technology uses the Tableau® web data visualization tool to visualize output from our APIs.

View information about Visual Analytics.

↧

MIT License

November 16, 2016, 9:54 am

Latest and popular articles on Intel Technologies

≫ Next: testing sample update

≪ Previous: Saffron Technology™ Elements

Intel Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

↧

testing sample update

November 17, 2016, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Close Calls, Hard Choices, and a Dog Named Waffles: Developing Overland*

≪ Previous: MIT License

testing sample zip file update

Download for Windows

↧

Close Calls, Hard Choices, and a Dog Named Waffles: Developing Overland*

November 16, 2016, 2:05 pm

Latest and popular articles on Intel Technologies

≫ Next: Simple, Powerful HPC Clusters Drive High-Speed Design Innovation

≪ Previous: testing sample update

Download [PDF 0.98 MB]

Close calls and hard choices punctuate the gameplay of Overland, a post-apocalyptic, turn-based survival strategy game from Finji. Overland’s interface is surprisingly intuitive, and takes little time to learn. Each random level is a procedurally generated tile populated by strange creatures and an evolving cast of survivors, encountered during a cross-country road-trip that leads straight into trouble. Just how much trouble depends on the choices players make. Some can be deadly. Some help you survive. Some involve a dog. No apocalypse would be complete without a dog named Waffles, right?

In other words, Overland is great fun, and has the awards to prove it, including Best Art Design in the 2016 Intel® Level Up Contest.

The Developers’ Journey

Finji was founded in 2006 by Rebekah and Adam Saltsman—two veteran indie game-developers, and parents to two young boys—who run the thriving game studio out of their Michigan home. The Saltsmans had a lot to say about how they turned a 2D iPad*-based prototype into a 3D cross-platform, touch-enabled PC game for the Intel Level Up Contest. They also shared what it’s like balancing parenthood with game development, the role alt-games storefront itch.io played in helping them test and refine gameplay, and the importance of building a game engine that supports fast, iterative prototyping.

Figure 1: UI elements give players easy-to-understand choices through overlays of icons inspired by US National Park Service signs.

Overland Origins

“The original whiteboard doodle that inadvertently spawned Overland was a mashup of 868-HACK*, by Michael Brough, and XCOM: Enemy Unknown*, by Firaxis Games,” Adam told us. Like many game developers, the Saltsmans are students of gaming. They draw inspiration and take away lessons from every game they’ve ever played.

As freelancers, they have more than a decade of experience applying those lessons to game design, art, and code for other studios. They’ve also released titles of their own, starting with a Flash*-based game called Gravity Hook*, followed by an iOS* game called Wurdle*. Between 2009 and 2013, they created six more iOS titles. The award-winning Hundreds*, a popular puzzle game, relied on the multi-touch interaction capabilities of the iPad.

“When we did Hundreds, there wasn’t much hardware support for multi-touch interaction outside of an iPad,” Adam said. Mobile gaming quickly became a very crowded space to compete in, so Bekah and Adam knew they would need to diversify by creating cross-platform PC games. “We’d spent 18 months collaborating with four other developers to make Hundreds. Financially, it did fine, but we couldn’t port it to PC (in 2013) because it was a multi-touch game.”

If they were going to plunge into the world of PC gaming, they knew they needed more resources. So, the Saltsmans focused on contract work. “We built up a war-chest of money,” Bekah said. “The question was: how far could it get us?”

The Saltsmans knew that what they were about to do was part-and-parcel of being indie developers. They’d seen their friends go through periods of having no health insurance and limited income before finding success. “We had kids and a mortgage. The prospect of investing everything we’d made in a cross-platform title was terrifying,” Bekah said.

The Prototype

Overland started as a 2D iPad game. “We prototyped most of the core gameplay in the current version of Overland in about two weeks, for a few thousand dollars,” Adam explained. Then they sat on it for six months. “We didn’t make any significant investments in it during that time. Instead, we kept refining the rules and adding new features. We wanted to get to a point where we could hand it off to a stranger, or a journalist, and have them get excited about experiencing it, even though it was a hot mess of missing elements, UI … the usual stuff.”

Figure 2: Gameplay takes place on board-game-like tiles where everything is easy to comprehend.

The couple also knew that to succeed as a PC game, the gameplay had to have, as Adam put it, “real strategy legs.” Art direction, sound, and story would be crucial, because many elements typically used in strategy game designs—RPG elements and tech trees, for example—were out of bounds for this project, for a variety of reasons.

“We were founding members of the Austin indie game collective,” Bekah said. “So we would take the Overland prototype—which was this horribly ugly 2D grid—to meetups where game-developers, journalists, and enthusiasts could give us feedback. That was invaluable."

Rules to Design By

“Weird things happen when you reduce a strategy game down to board-game-like spaces,” Adam said. “It ends up having a lot in common with puzzle games. This is actually reinforced by research that uses CAT scans and MRI technology to look at different parts of the brain during action or casual gameplay.”

According to Adam, however, it was one year into development before he realized that Overland’s level generator had a lot in common with a puzzle generator. That discovery led to three core design-principles that drive level creation. “As a post-apocalyptic road-trip game, Overland is a big design space—as soon as you tell someone about it, they have five cool ideas to add to the game. We used three design principles to vet ideas, and decide which ones were worth implementing.”

They call the first principle “the Minotaur in a china shop,” after a game in which a Minotaur enters a china shop, knocks something over, and then goes into a rage, destroying everything in the store. In Overland, this idea is used to determine whether a design idea will lead to a place where a sloppy move by a player can start a chain reaction that produces interesting consequences.

“It’s a principle that’s more interesting than a design in which you come to a level with three turns. On the third turn, you die. That would be like a poison-gas level,” Adam explained. “That’s not very Overland-y. Whereas a level in which an enemy chases you, you injure it, and then get put in a position where it’s a poison-gas level, that’s something you’d see in Overland. Because it’s the result of something the player did.”

The other principles go hand-in-hand. Randomness is fine, as long as it’s the player’s fault; the player gets a proportional warning about incoming random events. Each level is created on the fly by the game engine, which randomly combines ingredients to produce a fun and exciting experience for the player based on where they are in the country, and other factors.

“For example, one of the core mechanics of the game is that when you make noise, it attracts more creatures that will try to chase you down,” Adam said. “When that happens, you get a two-turn warning about where a new creature is going to appear. That’s because new creatures can be really bad. We want players to have some windup.”

Another example is that on a windy day, fire will spread, even if there’s nothing flammable for it to spread to, so players get a one-turn warning: this tile is heating up. Such “random” events aren’t random at all. “They are unforeseen, or very hard to foresee, non-random consequences of player decisions. For example, there’s a monster here. It’s too close to kill with the weapon in hand, so I’m going to kill it by setting it on fire. Except now there’s a fire that can spread throughout the tile if weather conditions permit.”

All of this creates a lot of opt-in complexity. “Players get to decide how much trouble they want to participate in,” Adam said. “Our team was too small to build a game with two-layers of difficulty, one easy, the other hard,” Bekah added. "The way people can experience more difficulty in their Overland runs, is by choosing to venture further from the road.”

Figure 3: Opt-in difficulty is based on whether a player chooses to drive into more, or less, danger.

Building complexity into the core gameplay ratcheted up the tension. “I love that a slow-paced game can give people adrenaline jitters,” Bekah said. “Even when a player dies in Overland, they’re laughing about it.”

Team-Based Collaboration, Fast-Paced Iteration

Overland’s art, coding, sound, and gameplay are the collaborative effort of a core team of four—Bekah, Adam, art director Heather Penn, and sound designer Jocelyn Reyes. “I think of our design process as old-school game design,” Adam said. “We all wear multiple hats, no one works in a silo.” It’s an approach that encourages cross-discipline collaboration. For example, Penn’s art influences Reyes’ sound design, and vice versa. Everyone contributes gameplay ideas.

“If someone has an idea, we prototype it to see if it works,” Bekah said. Pitching solutions instead of ideas is encouraged. “We try to craft solutions to nagging issues—for example, a graphics problem that will also solve a gameplay issue.” A value is assigned to how long it might take, and if it’s within reason, it gets developed. “We all contribute to this very iterative, prototype-intensive process,” Adam said.

The Overland team isn’t afraid to spend development cycles making pipeline course corrections. “I’d rather spend a week fixing the system, than two days building a system Band-Aid,” Adam said. “Having a game engine that allows us to quickly prototype in this really cool iterative way, with a team of people, is invaluable to how we’re building Overland.”

Tools

Overland is being built in Unity*, which Adam estimated would save them two years of 3D engineering work. “The tradeoff for using a closed-source tool was worth it.” They’re running Unity under Mac OS* on their development system, a late 2012 iMac* with Intel inside. Unity also gives them easy cross-platform portability.

They use Slack* for team collaboration. Or as Adam put it, “Overland would not exist without Slack, period.” They’re using SourceTree* and GitHub* with the Git LFS (Large File Sizes) extension for audio and graphics source file control; while mainstay art tools such as Adobe* Photoshop* and Autodesk* Maya* are being used to create the assets that Unity’s game engine pulls into its procedurally generated levels. Wwise* from Audiokinetic is Overland’s interactive sound engine.

Early Access Play Testing

Another crucial element in honing Overland’s gameplay came in the form of itch.io, an alternative games platform that provided Bekah and Adam the ability to dole out limited early-access to builds, and get feedback from users. One of itch.io’s benefits was its automatable command-line utility for uploading code patches. “Itch.io uses code assembled from open-source components like rsync that can generate a patch and upload it for you,” Adam explained. “The whole build script to generate a build for Windows* 64, Windows 32, Linux* Universal, and Mac OS Universal, and then upload it to itch.io, took an hour or two. And half of that time was spent figuring out how to print a picture in ASCII.”

Figure 4: Heather Penn’s award-winning art design drew on a variety of influences, including American artist Edward Hopper. Scenes were crafted to take advantage of shaders that would look great across a variety of systems.

A Level Up

The Saltsmans learned of the Intel Level Up Contest through friends who happened to be former winners. Those friends reported having great experiences with the contest, and working with Intel. As a result, the Saltsmans didn’t hesitate to enter, even though Overland was a work-in-progress that still used a lot of placeholder art. That art was so gorgeous it earned Overland top honors in the Art Design category, in a year that saw more entries than ever before.

The Intel Level Up Contest required entries to be playable on a touch-enabled Razer* Blade Stealth, which has a 4K-resolution display. Unity 5.3.6 was instrumental in enabling Overland’s 4K shadows, which on some systems were blowing out video memory at that resolution. Overland makes use of Intel® HD Graphics, because, as Adam put it, “we want our work to be accessible to as wide an audience as possible. Part of that is game design, but part of it is supporting as wide a range of hardware as we can.”

Figure 5: Adam and Bekah Saltsman demo Overland in the Intel Level Up booth at PAX West, in Seattle.

As part of that philosophy, Adam wants his games to look equally great whether they’re played on a state-of-the-art VR rig, or on an aging desktop. “Ninety-five percent of Overland runs at something like 500 fps on a five-year-old Intel® Core™ i3 processor, which I know, because that’s what’s in my dev system.” As they get closer to release, Adam plans on optimizing his code to spread the workload across cores.

Another key requirement of the contest was that games needed to be touch-enabled. Overland was touch-enabled from the start. “It was a mobile game, with mobile game controls,” Bekah said, before admitting that the current builds are no longer touch-screen friendly. “Touch was a fundamental part of the game’s design for the first 18 months,” Adam explained. “I’m a touch-screen interaction perfectionist, and there were things about our focused state and information previewing that needed attention. I’m looking forward to bringing it back.”

Balancing Game Development and Kids

With two young kids at home, Bekah and Adam built Finji with raising a family in mind. “When we both had ‘real’ jobs,” Bekah said, “each of us wanted to be the stay-at-home parent. It took a really long time before we could introduce children to our chaos.” Bekah describes balancing work and kids as being “different all the time. They’re five and three. The youngest is about to start pre-school, so this will be the first year both kids won’t be home during the day.”

The studio where Adam works is downstairs in their home, facing the back yard. Bekah’s office faces the front yard. “If the kids are outside, one of us can keep an eye on them while we’re working. There are always times when one of us has to jump up mid-whatever we’re doing, and stop them from whatever mischief they’re getting into. In that way, we need to be flexible.”

Conclusion

Overland is a work-in-progress that started life as a 2D tablet-based prototype. Winning Best Art Design in the 2016 Intel Level Up Contest has not only raised Overland’s profile among the game community, but also opened the door for access to Intel software tools and optimization expertise, particularly in multithreading code. Although no release date has been set for Overland, Finji has big plans for Q4 2016, when they will begin implementing new levels and features. The game has garnered plenty of awards in its pre-release state—who knows what accolades might follow?

↧

Simple, Powerful HPC Clusters Drive High-Speed Design Innovation

November 18, 2016, 10:08 am

Latest and popular articles on Intel Technologies

≫ Next: OpenCL™ Drivers and Runtimes for Intel® Architecture

≪ Previous: Close Calls, Hard Choices, and a Dog Named Waffles: Developing Overland*

Up to 17x Faster Simulationsthrough Optimized Cluster Computing

Scientists and engineers across a wide range of disciplines are facing a common challenge. To be eﬀective, they need to study more complex systems with more variables and greater resolution. Yet they also need timely results to keep their research and design eﬀorts on track.

A key criterion for most of these groups is the ability to complete their simulations overnight, so they can be fully productive during the day. Altair and Intel help customers meet this requirement using Altair HyperWorks* running on high performance computing (HPC) appliances based on the Intel® Xeon® processor E5-2600 v4 product family.

Download Complete Solution Brief (PDF)

↧

OpenCL™ Drivers and Runtimes for Intel® Architecture

November 21, 2016, 9:56 pm

Latest and popular articles on Intel Technologies

≫ Next: Use Case: Intel® Edison Board to Microsoft Azure* Part 1

≪ Previous: Simple, Powerful HPC Clusters Drive High-Speed Design Innovation

What to Download

By downloading a package from this page, you accept the End User License Agreement.

Installation has two parts:

Intel® SDK for OpenCL™ Applications Package
Driver and library(runtime) packages

The SDK includes components to develop applications. Usually on a development machine the driver/runtime package is also installed for testing. For deployment you can pick the package that best matches the target environment.

The illustration below shows some example install configurations.

SDK Packages

Please note: A GPU/CPU driver package or CPU-only runtime package is required in addition to the SDK to execute applications

Standalone:

Intel® SDK for OpenCL™ Applications 2016 R2 for Linux* (64-bit)
Intel® SDK for OpenCL™ Applications 2016 R2 for Windows* (64-bit) (assumes Windows* graphics driver installed)

Suite: (also includes driver and Intel® Media SDK)

Intel® Media Server Studio

Driver/Runtime Packages Available

GPU/CPU Driver Packages

CPU-only Runtime Packages

OpenCL™ Runtime for Intel® Core™ and Intel® Xeon® Processors

Deprecated

(Deprecated) OpenCL™ Runtime 14.2 for Intel® CPU and Intel® Xeon Phi™ coprocessors

Intel® SDK for OpenCL™ Applications 2016 R2 for Linux (64-bit)

This is a standalone release for customers who do not need integration with the Intel® Media Server Studio. It provides components to develop OpenCL applications for Intel processors.

Visit https://software.intel.com/en-us/intel-opencl to download the version for your platform. For details check out the Release Notes.

Intel® SDK for OpenCL™ Applications 2016 R2 for Windows* (64-bit)

This is a standalone release for customers who do not need integration with the Intel® Media Server Studio. The Windows* graphics driver contains the driver and runtime library components necessary to run OpenCL applications. This package provides components for OpenCL development.

Visit https://software.intel.com/en-us/intel-opencl to download the version for your platform. For details check out Release Notes.

OpenCL™ 2.0 GPU/CPU driver package for Linux* (64-bit)

The Intel intel-opencl-r3.1 (SRB3.1) Linux driver package provides access to the GPU and CPU components of these processors:

Intel® 5th, 6th, or 7th generation Intel® Core™ processors
Intel® Celeron® J4000 and Intel® Celeron® J3000
Intel® Xeon® processor v4 or v5 with Intel® Graphics Technology (if enabled by OEM in BIOS and motherboard)

Installation instructions

Intel has validated this package on CentOS 7.2 for the following 64-bit kernels.

Linux 4.7 kernel patched for OpenCL 2.0

Supported OpenCL devices:

Intel® graphics (GPU)
CPU

For detailed information please see the driver package Release Notes .

For Linux drivers covering earlier platforms such as 4th generation Intel Core processor please see the versions of Media Server Studio in the Driver Support Matrix.

OpenCL™ Driver for Iris™ graphics and Intel® HD Graphics for Windows* OS (64-bit and 32-bit)

The Intel graphics driver includes components needed to run OpenCL* and Intel® Media SDK applications on processors with Intel® Iris™ Graphics or Intel® HD Graphics on Windows* OS.

You can use the Intel Driver Update Utility to automatically detect and update your drivers and software. Using the latest available graphics driver for your processor is usually recommended.

Supported OpenCL devices:

Intel graphics (GPU)
CPU

For the full list of Intel® Architecture processors with OpenCL support on Intel Graphics under Windows*, refer to the Release Notes.

OpenCL™ Runtime for Intel® Core™ and Intel® Xeon® Processors

This runtime software package adds OpenCL CPU device support on systems with Intel Core and Intel Xeon processors.

Supported OpenCL devices:

Latest release (16.1.1)

Previous Runtimes (16.1)

Previous Runtimes (15.1):

For the full list of supported Intel® architecture processors, refer to the OpenCL™ Runtime Release Notes.

Deprecated Releases

Note: These releases are no longer maintained or supported by Intel

OpenCL™ Runtime 14.2 for Intel® CPU and Intel® Xeon Phi™ Coprocessors

This runtime software package adds OpenCL support to Intel Core and Xeon processors and Intel Xeon Phi coprocessors.

Supported OpenCL devices:

Intel Xeon Phi coprocessor
CPU

Available Runtimes

For the full list of supported Intel architecture processors, refer to the OpenCL™ Runtime Release Notes.

↧

Use Case: Intel® Edison Board to Microsoft Azure* Part 1

November 22, 2016, 9:13 am

Latest and popular articles on Intel Technologies

≫ Next: NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

≪ Previous: OpenCL™ Drivers and Runtimes for Intel® Architecture

Once you've moved past the prototype development stage, you might find yourself in the position to deploy an actual IoT solution for your business product.

Let’s say you own a transportation of goods company that has to deliver food and other temperature sensitive products to shops throughout the country. Storage and transportation conditions such as temperature and moisture contribute greatly to the loss of food, as it provides favorable conditions for pests or mold multiplications. One very efficient solution to this problem is to use IoT devices such as the Intel® Edison board to capture the temperatures in these storage devices, the gateway to gather the information and route it appropriately, and Microsoft Azure* to store the information and analyze it so you can get valuable feedback.

The following use case will provide you with an example of how to implement an IoT solution so that value can be gained through an IoT deployment, using the power of interconnectivity between the board, the gateway and the cloud. We will dive into detailing the implementation process of the prototype of our solution, with this use case in mind.

Using Microsoft Azure*, the Intel® Edison board, and Intel® IoT Gateway Technology

To create a high value solution for the temperature problem described, we need to setup the board, the gateway and Azure*. The following sections address setup for Intel® Edison boards, Microsoft Azure, and Wyse* 3000 Series x86-Embedded Desktop Thin Client.

Intel® Edison board

The Intel® Edison board runs a simple Yocto* linux distribution and can be programmed using Node.js*, Python*, Arduino*, C, or C++. For this use case we used the Intel® Edison board and Arduino breakout board, a Seeed* Studio Grove* Starter Kit Plus (Gen 2), a base shield, and many sensors to get started with.

The first time you use your Intel® Edison board you have to configure it, flash it to the latest firmware, and test it. In order to do so you can access this simple Getting Started Guide.

Working with the IDE and Wi-Fi

Now that you have set up your Intel® Edison board, you can start programming it. You can choose the programming language and IDE that you want and you have to load up one of the blink examples in order to test if everything is set:

For Arduino: Arduino
For C/C++: Eclipse for C/C++
For Javascript: Intel® XDK IoT Edition
For Java: Eclipse for Java

In order to finish the setup you need to follow a few more steps:

Open the device manager again and find out on which COM port is the Intel® Edison Virtual COM Port and set the port in the IDE.
Load up the blink example and if everything went well so far, the on-board LED should start blinking.
To connect with Wi-Fi, open Putty, and once you login, type configure_edison --wifi.
Work through the setup process and use the guide to connect to your Wi-Fi network (See Get Started with Intel® Edison on Windows).

Microsoft Azure

Setting up an Azure Event Hub

For this setup, the free trial version will be used.

You need to sign in with a Microsoft account. After signing in, click on Sign up for a free trial. In the next tab, you have to fill in information about yourself, including credit card data. Do not worry, you will not be charged if you do not specifically say so. After clicking the Sign up button at the bottom of the page, the Azure Portal Homepage will appear. Click on New in the top left corner - Internet of Things – Event Hub.
First you need to create a namespace, click on Service Bus on the left and then CREATE at the bottom
Fill in the required information and then click the check sign
Now our namespace was created. Next, we need to create the Event Hub. Make sure you have selected the Service Bus tab on the left and then click on New on the bottom-left
Next, select App Services – Service Bus – Event Hub
Next, click on Quick create and fill in the required information
Now you should be able to see on your screen something like this in your portal, under the Service Bus tab :
You will need to further configure your Event Hub. Click on your namespace, and you’ll be prompted with the following window:
Click on Event Hubs tab, you will see on your screen your Event Hub:
Click on the event hub and then click on Configure
You will now need to create a new shared access policy for your Hub, make sure you select Manage from the permissions tab
Now you will have to create a consumer group. At your created Hub, click on the Consumer Groups tab and then click on Create.
Name your group and click the checkmark
Your namespace must also have a key defined, return to the namespace (the Service bus tab on the left), click on Configure and then, under Shared access policies type in a key name and make sure you check Manage
Click on the Service bus tab again, then click on Connection information at the bottom (make sure you have selected your namespace)
Here you can see your namespaces and their keys:
If you return to your Event Hub’s dashboard after you begin sending messages to it from the gateway, you will see something like this:

The event hub dashboard is just a way to check if the data was received correctly, see the bandwidth of those messages and check if you are getting errors. To work with the data you will need to read it from the event hub through an application using the many different SDKs that Microsoft offers (for an example see the section below on How to use features of Azure services - Notifications). Also, a full explanation of how to read data from Intel® Edison board and sending it to the just created Event Hub using an IoT Gateway can be found in the How to take the developed solution from the board to the gateway and to the cloud section.

SDKs for Microsoft Azure IoT

The most popular language to develop apps for Azure is C# but if you want to use other languages and platforms visit GitHub.

Setting up the gateway

Wyse* 3000 Series x86-Embedded Desktop Thin Client

Regulatory model number: N03D

The gateway connects legacy and new systems, and enables seamless and secure data flow between edge devices and the cloud. We take data from the Intel® Edison board and send it to the gateway, and the gateway sends the data to the cloud as an event.

In the next section we detail the setup process for the gateway.

The new Wyse* 3000 Series x86-embedded thin client has a powerful performance at an entry-level price. It has a dual-core Intel processor with 1.6GHz, an integrated graphics engine, and multiple connectivity choices. Its various configuration options support a wide variety of peripherals and interfaces, along with unified communications platforms such as Lync 2010, Lync 2013 and the Skype for Business client for Lync 2015 (UI mode) plus high-fidelity protocols such as RemoteFX and Citrix* HDX.

In order to get started, see the quick start guide. To get your thin client up and running, you must do the following:

Make sure that the thin client and the monitor are turned off and disconnected from AC power. Place the thin client on the desk after you attach feet for vertical or horizontal position. Assemble the VESA mount with user-supplied screws, and insert the thin client; try to put the cables facing down or to the side.
Make all desired connections. In order to connect to a network, you can use a Base-T Ethernet network cable. If your device is equipped with a SFP slot, use a SFP module, or use an optional Wi-Fi network adapter for wireless networks.
Connect the power adapter to the thin client power input before connecting to a 100-240V AC. You must wait until the power button light turns off, and then, for turning on the thin client, you need to press the power button. After the initialization sequence is complete, the activity light changes to green.

This article is continued in Industrial Use Case Part 2.

Additional Links:

From Intel® Edison to Microsoft Azure*

What is Azure*?

Getting Started with Microsoft Azure*

Use Case: From Intel® Edison to Microsoft Azure* Part 1

Use Case: From Intel® Edison to Microsoft Azure* Part 2

↧

NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

November 22, 2016, 1:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® HPC Developer Conference 2016 - Session Presentations

≪ Previous: Use Case: Intel® Edison Board to Microsoft Azure* Part 1

In August of 2016, half a million fans came to Rio de Janeiro to witness 17 days and nights of the Summer Olympics. At the same time, millions more people all over the world were enjoying the competition live in front of their TV screens.

Arranging a live TV broadcast to another continent is a daunting task that demands reliable equipment and agile technical support. That was the challenge for Thomson Reuters, the world’s largest multimedia news agency.

To help it meet the challenge, Thomson Reuters chose NetUP as its technical partner, using NetUP equipment for delivering live broadcasts from Rio de Janeiro to its New York and London offices. In developing the NetUP Transcoder, NetUP worked with Intel, using Intel® Media SDK, a cross-platform API for developing media applications on Windows*.

“This project was very important for us,” explained Abylay Ospan, founder of NetUP. “It demonstrates the quality and reliability of our solutions, which can be used for broadcasting global events such as the Olympics. Intel Media SDK gave us the fast transcoding we needed to help deliver the Olympics to a worldwide audience.”

Get the whole story in our new case study.

↧

Intel® HPC Developer Conference 2016 - Session Presentations

November 23, 2016, 10:44 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Deep Learning SDK Tutorial: Installation Guide

≪ Previous: NetUP Uses Intel® Media SDK to Help Bring the Rio Olympic Games to a Worldwide Audience of Millions

The 2016 Intel® HPC Developer Conference brought together developers from around the world to discuss code modernization in high-performance computing. For those who may have missed it or if you want to catch presentations that you may have missed, we have posted the Top Tech Sessions of 2016 to the HPC Developer’s Conference webpage. The sessions are split out by track, including Artificial Intelligence/Machine Learning, Systems, Software Visualization, Parallel Programming and others.

Artificial Intelligence/Machine Learning Track

Systems Track

High Productivity Languages Track

Software Visualization Track

Parallel Programming Track

↧

Intel® Deep Learning SDK Tutorial: Installation Guide

November 23, 2016, 4:58 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Software Guard Extensions Tutorial Series: Part 7, Refining the Enclave

≪ Previous: Intel® HPC Developer Conference 2016 - Session Presentations

Download PDF [792 KB]

Training Tool Installation Guide

Contents

1. Introduction

2. Prerequisites

3. Installing the Intel® Deep Learning SDK Training Tool from a Microsoft Windows* or Apple macOS* Machine 5

4. Installing the Intel® Deep Learning SDK Training Tool on a Linux* Machine

1. Introduction

The Intel® Deep Learning SDK Training Tool can be installed and run on Linux* Ubuntu 14.04 or higher and Cent OS 7 operating systems.

The Training Tool is a web-application that supports both local and remote installation options. You can install it to a Linux server remotely from a Microsoft Windows* or Apple macOS* machine using the installation .exe or .app file respectively. Alternatively you can install it locally on a Linux machine running the installation script.

You don’t need to install any additional software manually as the installation package consists of a Docker* container that contains all necessary components including the Intel® Distribution of Caffe* framework with its prerequisites and provides the environment for running the Training Tool.

2. Prerequisites

Make sure you comply with the following system requirements before beginning the installation process of the Intel® Deep Learning SDK Training Tool.

A Linux Ubuntu* 14.04 (or higher) or Cent OS* 7 machine accessible through a SSH connection.
Root privileges to run the installation script on the Linux machine.
Google Chrome* browser version 50 or higher installed on the computer which will be used to access the Training Tool web user interface.

The system requirements are also available in the Release Notes document that can be found online and in your installation package.

3. Installing the Intel® Deep Learning SDK Training Tool from a Microsoft Windows* or Apple macOS* Machine

To install the Intel® Deep Learning SDK Training Tool from a Microsoft Windows* or Apple macOS* machine, download the installation package from https://software.intel.com/deep-learning-sdk, unpack and launch the TrainingToolInstaller executable file to start the wizard.

The macOS and Windows installation wizards look similar and contain exactly the same steps.

The wizard launches with the steps that guide you through the installation process and advance as you click the Next button. The installation process includes the following steps:

Once you define all the settings, you can check the connection to the server by pressing the Test connection button. If the server is accessible, the test will result in the Connection successful status:

Welcome and License Agreement. Read carefully and accept the License Agreement to continue with the installation.
Defining Settings. This panel configures the installation parameters including network and security settings. Specify all required field values and modify default ones if needed:
- Training Tool password– Password to access the Training Tool web interface
- Server name or IP– The address of the Linux machine in which the Training Tool will be installed
- User name (with root access)– User name with root privileges in the Linux server
- User password–The password of the above user account. These credentials are needed for user authentication in the installation process.
- Private key file for user authentication– The private key used for user authentication when password authentication is not allowed on the Linux server
- Proxy server for http– Set the proxy server IP for HTTP if the connection between the Windows/Mac machine and Linux server is through a proxy
- Proxy server for https– Set the proxy server IP for HTTPS if the connection between the Windows/Mac machine and Linux server is through a proxy
- Mount file system path– Linux file system path which is to be mounted as a volume inside the Docker container
- Web application port - Network port to access the Training Tool web interface.
Once you define all the settings, you can check the connection to the server by pressing the Test connection button. If the server is accessible, the test will result in the Connection successful status:
Installing. Click the Install button to begin the installation. The Installing panel comes up to show the progress of the installation with a progress bar that depicts the status of the current step and the overall completeness:
When the indicator becomes 100%, click the activated Next button to complete the installation.
Complete. Congratulations! You have installed the Intel® Deep Learning SDK Training Tool on your Linux server.
Click the Open now button to open the Training Tool web interface in your browser, or the Download link to download the latest version of the Google* Chrome browser, or the Close button to close the window.

4. Installing the Intel® Deep Learning SDK Training Tool on a Linux* Machine

You can install the Intel® Deep Learning SDK Training Tool on a Linux* operating system using the installation script. Download the script from https://software.intel.com/deep-learning-sdk and run with the following available options:

1. volume <path>

Linux file system path which will be mounted as a volume inside the Docker* container

2. toolpassword <password>

admin password to access the Training Tool web interface

3. toolport <port>

network port to access the Training Tool web interface

4. httpproxy <proxy>

proxy server for HTTP

5. httpsproxy <proxy>

proxy server for HTTPS

6. help

print help message

NOTE: While the parameter is mandatory and must be set to continue the installation, other parameters are all auxiliary and can be omitted.

↧

Intel® Software Guard Extensions Tutorial Series: Part 7, Refining the Enclave

November 28, 2016, 2:07 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Deep Learning SDK Tutorial: Getting Started with Intel® Deep Learning SDK Training Tool

≪ Previous: Intel® Deep Learning SDK Tutorial: Installation Guide

Part 7 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series revisits the enclave interface and adds a small refinement to make it simpler and more efficient. We’ll discuss how the proxy functions marshal data between unprotected memory space and the enclave, and we’ll also discuss one of the advanced features of the Enclave Definition Language (EDL) syntax.

You can find a list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

Source code is provided with this installment of the series. With this release we have migrated the application to the 1.7 release of the Intel SGX SDK and also moved our development environment to Microsoft Visual Studio* Professional 2015.

The Proxy Functions

When building an enclave using the Intel SGX SDK you define the interface to the enclave in the EDL. The EDL specifies which functions are ECALLs (“enclave calls,” the functions that enter the enclave) and which ones are OCALLs (“outside calls,” the calls to untrusted functions from within the enclave).

When the project is built, the Edger8r tool that is included with the Intel SGX SDK parses the EDL file and generates a series of proxy functions. These proxy functions are essentially wrappers around the real functions that are prototyped in the EDL. Each ECALL and OCALL gets a pair of proxy functions: a trusted half and an untrusted half. The trusted functions go into EnclaveProject_t.h and EnclaveProjct_t.c and are included in the Autogenerated Files folder of your enclave project. The untrusted proxies go into EnclaveProject_u.h and EnclaveProject_u.c and are placed in the Autogenerated Files folder of the project that will be interfacing with your enclave.

Your program does not call the ECALL and OCALL functions directly; it calls the proxy functions. When you make an ECALL, you call the untrusted proxy function for the ECALL, which in turn calls the trusted proxy function inside the enclave. That proxy then calls the “real” ECALL and the return value propagates back to the untrusted function. This sequence is shown in Figure 1. When you make an OCALL, the sequence is reversed: you call the trusted proxy function for the OCALL, which calls an untrusted proxy function outside the enclave that, in turn, invokes the “real” OCALL.

Figure 1. Proxy functions for an ECALL.

The proxy functions are responsible for:

Marshaling data into and out of the enclave
Placing the return value of the real ECALL or OCALL in an address referenced by a pointer parameter
Returning the success or failure of the ECALL or OCALL itself as an sgx_status_t value

Note that this means that each ECALL or OCALL has potentially two return values. There’s the success of the ECALL or OCALL itself, meaning, were we able to successfully enter or exit the enclave, and then the return value of the function being called in the ECALL or OCALL.

The EDL syntax for the ECALL functions ve_lock() and ve_unlock() in our Tutorial Password Manager’s enclave is shown below:

enclave {
   trusted {
      public void ve_lock ();
      public int ve_unlock ([in, string] char *password);
    }
}

And here are the untrusted proxy function prototypes that are generated by the Edger8r tool:

sgx_status_t ve_lock(sgx_enclave_id_t eid);
sgx_status_t ve_unlock(sgx_enclave_id_t eid, int* retval, char* password);

Note the additional arguments that have been added to the parameter list for each function and that the functions now return a type of sgx_status_t.

Both proxy functions need the enclave identifier, which is passed in the first parameter, eid. The ve_lock() function has no parameters and does not return a value so no further changes are necessary. The ve_unlock() function, however, does both. The second argument to the proxy function is a pointer to an address that will store the return value from the real ve_unlock() function in the enclave, in this case a return value of type int. The actual function parameter, char *password, is included after that.

Data Marshaling

The untrusted portion of an application does not have access to enclave memory. It cannot read from or write to these protected memory pages. This presents some difficulties when the function parameters include pointers. OCALLs are especially problematic, because a memory allocated inside the enclave is not accessible to the OCALL, but even ECALLs can have issues. Enclave memory is mapped into the application’s memory space, so enclave pages can be adjacent to unprotected memory pages. If you pass a pointer to untrusted memory into an enclave, and then fail to do appropriate bounds checking in your enclave, you may inadvertently cross the enclave boundary when reading or writing to that memory in your ECALL.

The Intel SGX SDK’s solution to this problem is to copy the contents of data buffers into and out of enclaves, and have the ECALLs and OCALLs operate on these copies of the original memory buffer. When you pass a pointer into an enclave, you specify in the EDL whether the buffer referenced by the pointer is being pass into the call, out of the call, or in both directions, and then you specify the size of the buffer. The proxy functions generated by the Edger8r tool use this information to check that the address range does not cross the enclave boundary, copy the data into or out of the enclave as indicated, and then substitute a pointer to the copy of the buffer in place of the original pointer.

This is the slow-and-safe approach to marshaling data and pointers between unprotected memory and enclave memory. However, this approach has drawbacks that may make it undesirable in some cases:

It’s slow, since each memory buffer is checked and copied.
It requires additional heap space in your enclave to store the copies of the data buffers.
The EDL syntax is a little verbose.

There are also cases where you just need to pass a raw pointer into an ECALL and out to an OCALL without it ever being used inside the enclave, such as when passing a function pointer for a callback function straight through to an OCALL. In this case, there is no data buffer per se, just the pointer address itself, and the marshaling functions generated by Edger8r actually get in the way.

The Solution: user_check

Fortunately, the EDL language does support passing a raw pointer address into an ECALL or an OCALL, skipping both the boundary checks and the data buffer copy. The user_check parameter tells the Edger8r tool to pass a pointer as it is and assume that the developer has done the proper bounds checking on the address. When you specify user_check you are essentially trading safety for performance.

A pointer marked with the user_check does not have a direction (in or out) associated with it, because there is no buffer copy taking place. Mixing user_check with in or out will result in an error at compile time. Similarly, you don’t supply a count or size parameter, either.

In the Tutorial Password Manager, the most appropriate place to use the user_check parameter is in the ECALLs that load and store the encrypted password vault. While our design constraints put a practical limit on the size of the vault itself, generally speaking these sorts of bulk reads and writes benefit from allowing the enclave to directly operate on untrusted memory.

The original EDL for ve_load_vault() and ve_get_vault() looks like this:

public int ve_load_vault ([in, count=len] unsigned char *edata, uint32_t len);

public int ve_get_vault ([out, count=len] unsigned char *edata, uint32_t len);

Rewriting these to specify user_check results in the following:

public int ve_load_vault ([user_check] unsigned char *edata);

public int ve_get_vault ([user_check] unsigned char *edata, uint32_t len);

Notice that we were able to drop the len parameter from ve_load_vault(). As you might recall from Part 4, the issue we had with this function was that although the length of the vault is stored as a variable in the enclave, the proxy functions don’t have access to it. In order for the ECALL’s proxy functions to copy the incoming data buffer, we had to supply the length in the EDL so that the Edger8r tool would know the size of the buffer. With the user_check option, there is no buffer copy operation, so this problem goes away. The enclave can read directly from untrusted memory, and it can use its internal variable to determine how many bytes to read.

However, we still send the length as a parameter to ve_get_vault(). This is a safety check to ensure that we don’t accidentally overflow a buffer when fetching the encrypted vault from the enclave.

Summary

The EDL provides three options for passing pointers into an ECALL or an OCALL: in, out, and user_check. These options are summarized in Table 1.

Specifier/Direction	ECALL	OCALL
in	The buffer is copied from the application into the enclave. Changes will only affect the buffer inside the enclave.	The buffer is copied from the enclave to the application. Changes will only affect the buffer outside the enclave.
out	A buffer will be allocated inside the enclave and initialized with zeros. It will be copied to the original buffer when the ECALL exits.	A buffer will be allocated outside the enclave and initialized with zeros. This untrusted buffer will be copied to the original buffer in the enclave when the OCALL exits.
in, out	Data is copied back and forth.	Data is copied back and forth.
user_check	The pointer is not checked. The raw address is passed.	The pointer is not checked. The raw address is passed.

Table 1. Pointer specifiers and their meanings in ECALLs and OCALLs.

If you use the direction indicators, the data buffer referenced by your pointer gets copied and you must supply a count so that the Edger8r can determine how many bytes are in the buffer. If you specify user_check, the raw pointer is passed to the ECALL or OCALL unaltered.

Sample Code

The code sample for this part of the series has been updated to build against the Intel SGX SDK version 1.7 using Microsoft Visual Studio 2015. It should still work with the Intel SGX SDK version 1.6 and Visual Studio 2013, but we encourage you to update to the newer release of the Intel SGX SDK.

Coming Up Next

In Part 8 of the series, we’ll add support for power events. Stay tuned!

↧

Intel® Deep Learning SDK Tutorial: Getting Started with Intel® Deep Learning SDK Training Tool

November 28, 2016, 3:03 pm

Latest and popular articles on Intel Technologies

≫ Next: Long license checkout on remote workstations

≪ Previous: Intel® Software Guard Extensions Tutorial Series: Part 7, Refining the Enclave

Download PDF [PDF 1.39 MB]

Introduction

Release Notes

Please find Release Notes at https://software.intel.com/en-us/articles/deep-learning-sdk-release-notes

Installing Intel® Deep Learning SDK Training Tool

For installation steps please see the Intel® Deep Learning SDK Training Tool Installation Guide.

Introducing the Intel® Deep Learning SDK Training Tool

The Intel® Deep Learning SDK Training Tool is a feature of the Intel Deep Learning SDK, which is a free set of tools for data scientists, researchers and software developers to develop, train, and deploy deep learning solutions. The Deployment Tool is currently unavailable.

With the Intel Deep Learning SDK Training Tool, you can:

Easily prepare training data, design models, and train models with automated experiments and advanced visualizations
Simplify the installation and usage of popular deep learning frameworks optimized for Intel platforms

The Training Tool is a web application running on a Linux* server and provides a user-friendly, intuitive interface for building and training deep learning models.

When you start the Training Tool and login, you are presented with a workspace displaying the home page and a series of the main tabs in the blue panel on the left side. These tabs provide access to a set of features that enable you to upload source images, create training datasets, and build and train your deep learning models.

Uploads Tab

Before you build and train your model, use this tab to upload an archive of images that will form a dataset for training the model. For details see the Uploading Images topic.

Datasets Tab

Create datasets from previously uploaded images using the Datasets panel. A dataset is not just a bunch of images, but it is a database of a specific format that holds all images arranged in categories.

While creating a dataset you can change image color mode (i.e. grayscale or RGB color), encoding format, or multiply the initial image set by data augmentation, i.e. applying different transformations to original images to create modified versions. All changes are stored in the database and do not physically affect the original files.

You can use the entire image set for training and validation (by assigning a percentage for each subset) or use separate folders for each procedure.

For more information see the Creating a Training Dataset section.

Models Tab

Use this tab to create and train your model with an existing dataset. There are three available pre-defined topologies you can use for your model:

While configuring the model, you can apply transformations to images from the selected dataset without changing the original image files.

While each of pre-defined models already have a set of default parameters values that are optimal for general use, the Models tab enables you to efficiently configure the learning process for specific use cases.

For more information see the Creating a Model section.

Uploading Images

Before creating training datasets, you need to upload images you intend to use for training your model.

Use the Uploads tab to upload input images as a RAR archive with the strictly predefined structure.

All images inside the uploaded archive must be divided into separate subfolders which are named after desired training labels/categories. For example, the structure of a sample archive for 0-9 digits, which could be used for training a LeNet model, may look like the following:

digits.rar/

    0/

        0_01.png

        0_02.png

         …

    1/

        1_01.png

        1_02.png

         …

       …



    9/

        9_01.png

        9_02.png

         …

Choose the archive located on your computer or on the web and specify the root directory that will hold all extracted images.

The directory path is relative to the Docker installation directory you specified while installing the Training Tool. For the installation steps, see Installing Intel® Deep Learning SDK Training Tool.

The table under the Upload button provides the information about current uploads and upload history:

Creating a Training Dataset

You can easily create training datasets using the Datasets tab. Once you click the tab, the panel comes up with the New Dataset icon and a list of previously saved datasets:

You can look up saved datasets by searching them via the name, edit, rename, delete them or complete their generation process. For more information see Saving, Editing and Reviewing a Dataset.

To start creating a dataset, click New Dataset that launches the wizard. A wizard screen contains the following elements:

Dataset Name field – Sets the name of the dataset
Dataset Description field – Sets the description for the dataset
Dataset Manage panel – Enables saving, running or deleting the current dataset at any step
Navigation panel – indicates the current step and switches between dataset creation steps.

The wizard divides the workflow of creating training image dataset into three separate steps indicated as tabs on the navigation bar in the wizard screen:

Define the folder that contains source images, number of files to use for training and validation and other settings in the Data folder tab.
Make preprocessing settings in the Image Preprocessing tab for input images.
Choose the image database options in the Database option tab.

Whenever you need to modify the settings you can switch over the steps using the Next and Back buttons or by clicking a tab on the navigation bar directly.

To abort creating the dataset, click the Delete icon in the toolbar in the upper right corner.

Adding Source Images to a Dataset

Start creating a dataset with setting its name and the source folder.

Set the dataset name using the Dataset Name field to identify the dataset in the dataset collection. Using meaningful names can help you find the dataset in the list when you are creating a model.

You can add annotations for the dataset if needed using the Description field.

Use the Source folder field to specify the path to the root folder that holds contents of the extracted RAR archive that you previously uploaded to the system. If you have not completed this step, see Uploading Images to learn about image archives used for datasets.

From the entire set of training images you can define the image groups for each phase of the model training process:

Training – trains the model using a set of image samples.
Validation – could be used for model selection and hyper-parameter fine-tuning.

To define the validation subset, choose a percentage of images for validation in the Validation percentage field. The default value is 10%.

Alternatively, you can also use separate folder for validation. You can specify this folder once you select the Use other folder option.

NOTE: If you are using other folder for validation, the respective percentage field resets to zero value.

Data augmentation

You can extend your dataset using the Training Tool augmentation feature. It enables you to enlarge the set by creating copies of existing images and applying a number of transformations such as rotating, shifting, zooming and reflecting.

You can simply specify the maximum number of transformations to be applied to each image in the dataset using the Max number of transformations per image field.

Alternatively you can use the Advanced section to additionally define which types of transformations to apply, transformation parameters and weights. Weight here is the percentage of the selected augmentation type in the total number of performed augmentations, in percent. The higher the specified weight, the more augmentations of the selected type are performed. The total weights of all selected augmentation types must be 100%.

Sometimes transformations result in exposing undefined parts of the image. For example, after zooming out, an image might have blank areas in the border area. Choose a method to fill those blank areas in augmented images using the Fill Method section:

Constant - Fills the missing pixels with a certain hexadecimal color code value in RGB format
Nearest - Fills the missing pixels with the values of neighboring pixels
Wrap - Fills the missing pixels by tiling the image
Reflect– Fills the missing pixels by reflecting a region of the image.

Preprocessing Input Image Data

You can pre-process images included in a dataset using the Image Processing tab.

Selecting Image Chroma type

The tool enables you to use color or grayscale modes for images. Choose the desired option in the Type group.

If you select the grayscale option in creating a dataset with color images in RGB format, the tool automatically perform the pixel wise RGB-to-grayscale conversion according to the formula:
Y = 0.299*Red + 0.587*Green + 0.114*Blue,
where Y is the intensity/grayscale value.
If you use the Color option for grayscale images, the algorithm uses the intensity value as the values of red, green, blue channels: R = Y, G = Y, B = Y.

Resizing Images

By default, the resize dimensions are set to 28x28 pixels but you can resize the images to arbitrary sixe with one of the available resize options:

Squash
Crop
Fill
Half crop, half fill.

The table below demonstrates the results of resizing an example image of original size 128x174 pixels, to a 100x100 pixels square image using each resizing method.

	Original image, 128x174 pixels
	Squash transforms original image by upsampling or downsampling pixels using bi-cubic interpolation to fill new width and length without keeping the aspect ratio.
	Crop option resizes the image while maintaining the aspect ratio. The image is first resized such that the smaller image dimension fits the corresponding target dimension. Then the larger dimension is cropped equal amounts from either sides to fit the corresponding target.
	Fill option resizes the image while maintaining the aspect ratio. The image is first resized such that the larger image dimension fits the corresponding target dimension. Then the resultant image is centered in the smaller dimension and white noise strips of equal width are inserted to both sides to make that dimension equal to the target.
	Half crop, half fill option resizes the image while maintaining the aspect ratio. The image is first resized using the fill option halfway and then the Crop option is applied. For a transformation with the dimensions of the original image and the target image w (width), h (height) and W,H respectively, it means that the original image is resized to the dimensions W+(w-W)/2, H+(h-H)/2 with the Fill transformation first and then the resulting image is resized to the target dimensions W, H using the Crop transformation.

Setting Database options

The Training Tool stores all input images in a database file. At this step you can set desired database options such as database type and image encoding.

You can choose a certain type of the database using the DB backend drop-down list:

LMDB
LevelDB

To find more information on both types, please see LMDB and LevelDB home pages.

Choose an image encoding format from the Image Encoding drop-down list if needed. Using PNG or JPEG formats can save disk space, but may increase the training time.

Generating a Dataset

After you complete configuring the dataset, you can launch the dataset generation process by clicking the Run icon in the dataset manage panel . The Training Tool starts the process and shows the progress bar and the details of the dataset generation process:

Once the generating completes the dataset changes its status in the right tab list to Complete.

For more information about saving datasets and dataset statuses see Saving, Editing and Reviewing a Dataset.

Saving, Editing and Reviewing a Dataset

When you click the Dataset tab, the Dataset panel comes up with the New Dataset icon and the list of previously saved datasets. Datasets in the list can be in one of three states: Draft, Ready or Completed.

You can save a dataset as a Draft at any moment before all of its mandatory fields are set. The Ready status indicates that you have set all mandatory fields for the saved dataset and it is ready to be generated. The Completed status identifies already generated datasets.

To find a dataset by its unique name, use the Search field.

You can rename, edit or delete a dataset in the Draft state using the toolbar in the right upper corner .

For a dataset in the Ready state, the Run operation is additionally available: .

To view or edit datasets in Draft or Ready state, select it from the list or click the Edit icon in the toolbar.

To view the details of a Completed dataset, select it from the list.

Creating a Model

After images were uploaded and a dataset was generated using those images, you are ready to create and train a deep learning model. To begin the model creation process, choose the Models tab in the vertical blue panel. The panel comes up with the New Model icon and the list of previously trained and drafted models displayed under the Search text field.

You can look up existing models by searching them by the given model name and rename, edit, build or delete them. For more information see Saving, Editing and Reviewing a Model.

To create a new model, use the New Model icon. Once you click it, you are presented with a wizard screen.

Wizard screen contains the following elements:

Model Name field – A mandatory field that sets the name of the model
Model description field – Adds an optional description about the model
Model manage panel – Enables saving, running training or deleting the current model at any step
Navigation panel – indicates the current step and switches between model creation steps

The new model creation process consists of four stages as illustrated in the navigation pane.

Select a dataset from the list of generated datasets in the Dataset Selection tab.
Choose and tune a pre-defined model topology in the Topology tab.
Transform images from the dataset if needed in the Data Transformation tab.
Configure default parameters to tune the learning process in the Parameters tab.

Assigning a Dataset

The first stage of creating a model is the dataset selection stage.

As the first step in this stage, enter a unique name for the model in the Model Name text field. Using meaningful names can help you find the model in the model list.

In order to quickly recognize the model among other existing models in the future, the Training Tool provides an option of entering a descriptive text about the model in the Description field.

Every model should be linked to a particular dataset at any given time. The next step is choosing a dataset that provides the model with training and validation images. Select an existing dataset from the listed datasets or search for one by the name and select it. Press Next to move on to the second stage of the process.

Configuring Model Topology

In the second stage you need to configure the topology of the model.

First step is to select a specific model topology from the pre-loaded three topologies listed in the Topology name list. These pre-loaded topologies come configured with the optimal training/validation settings for that specific topology under general use conditions. However you can customize it to match specific requirements via checking the Fine tine topology check-box. There are two levels of fine tuning available - light and medium - you may pick one as desired. Configuration options in the following stages will change upon selecting the fine tuning option and the level fine tuning chosen. Back/Next buttons in the bottom blue pane allows you to move between the four stages as needed.

Transforming Input Images

The third stage allows you to add pre-processing to the images in before they are fed to the model for training or validation.

You may add three optional pre-processing operations to the training data. Two of them, cropping and horizontal mirroring, add some degree of randomness to the training process, by applying those operations to randomly chosen training images. In image classification tasks with large datasets, these types of random pre-processing are used to enhance the performance of the learned model by making it robust to deviations of input images which may not be covered in the training set.

The mean subtracting, if selected will be done for each and every image, and there are two options: subtract the mean image or pixel.

Configuring Training Parameters

In the fourth and final stage, training parameters (i.e. hyper-parameters) are configured to tune the training process. Pre-loaded models in the Training Tool come with a set of default values for each of the parameter fields. These values are the optimal parameter values for the given module in its general use case.

Typical training of a deep learning module involves hundreds of thousands of parameters (aka weights), hence a module is trained over-and-over with a given training set. One complete pass of the total training dataset is called an epoch. At the end of one epoch every image in the training dataset has passed exactly once through the module. You can adjust the number of epochs using the Training epochs field. This number depends on the module topology, parameter estimation algorithm (solver type), initial learning rate and the learning rate decay curve, required final accuracy, and the size of the training dataset.

Within an epoch, images in the training dataset are partitioned in to batches and the module is trained with one batch at a time. Once a batch of images passe through the module, the parameters of the module are updated, and then a next batch is used. One such pass is called an iteration. In general, a larger batch size reduces the variance in the parameter update process and may leads to a faster conversion. However, larger the batch size higher the memory usage will be during the training.

By specifying the Validation interval value, you can define how often validations should take place in number of epochs. For example setting the value to 1 will lead to validations taking place at the end of each epoch. Use the Validation batch size value to define the size of the batch of validation images .

Training a deep learning module is a lengthy and complex task, therefore the tool regularly takes snapshots to backup the status of the module being trained and the status of the solver. To set the frequency of backups, use the Snapshot intervals field.

Parameter (or weight) estimation is not only about optimizing the loss/error function for the training datasetas the estimated weights should be able to generalize the model to new unseen data. Using the Weight decay setting, you can adjust the regularization term of the model to avoid overfitting.

Learning rate determines the degree an update step influences the current values of the weights of the module. Larger learning rates will cause drastic changes at updates and could lead to either oscillations around the minima or missing the minima all together, while unreasonably smaller rate would lead to very slow convergence. The Base learning rate is the initial learning rate at the start of the learning process.

Momentum captures the direction of the last weiht update and helps to reduce oscilations and the possibility of getting stuck in a local minima. Momentum ranges from 0 to 1 and typically higher value such as 0.9 is used. However, it is important to use a lower learning rate when using a higher momentum to avoid drastic weight updates.

You may choose a solver type from a list of available types, the default one is the stochastic gradient descent.

Use Advanced learning rate options to further specify how the learning rate changes during the training.

There are several learning rate update policies (or curves) to choose from.

Step size determines how often the learning rate should be adjusted (in number of iterations).

Gamma controls the amount of change in the learning rate (determines the learning rate function shape) at every adjustment step.

By checking the Visualize LR box you can visualize the learning rate change as a curve.

Running the Training Process

After you complete configuring the model, you can launch the model training process by clicking the Run icon in the model manage panel . The Training Tool starts the process and shows the progress bar and the status of the model as it is being trained:

Once the training completes the model changes its status in the list to Trainingcompleted.

For more information about saving models and model statuses see Saving, Editing and Reviewing a Model.

Saving, Editing and Reviewing a Model

When you click the Dataset icon, the Dataset panel comes up with the New Dataset icon and the list of previously saved datasets. A dataset in the list can be in one of three states: Draft, Ready or TrainingCompleted.

You can only save the model as a Draft at any moment prior to setting all mandatory fields. The Ready status indicates that you have set all mandatory fields for the saved model and it is ready to be trained. The Train Completed status is achieved when the model has completed training with the associated dataset.

To find a model by the given unique name, use the Search field.

You can Rename, Edit or Delete a model in the Draft state using the toolbar in the right upper corner for that model.

For a model which is in the Ready state the Run operation is additionally available: .

To view or edit a model in either Draft or Ready state, select it from the list or click the Edit icon in the toolbar.

For a model in the TrainingCompleted state, Rename, Duplicate and Delete operations are available: .

To view the details of a completed model, select it from the list.

Additional Resources

To ask questions and share information with other users of the Intel® Deep Learning SDK Training Tool, visit Intel® Deep Learning SDK forum.

↧

Long license checkout on remote workstations

November 29, 2016, 12:47 pm

Latest and popular articles on Intel Technologies

≫ Next: Slow floating license checkout

≪ Previous: Intel® Deep Learning SDK Tutorial: Getting Started with Intel® Deep Learning SDK Training Tool

Problem:

License checkout for 2106 and newer product versions is very slow compared to the 2015 version on machines with remote access to the floating license server.

Environment:

Windows*

Root Cause:

Due to issues with the license caching in the 2015 product versions, it was disabled in the 2016 version. The caching provided a temporary local copy of the license for frequent checkouts, but would only allow features available in the cached license to be checked out, invalidating other licenses. Without the caching, checkout requests over the network can be very slow.

Workaround:

There is no workaround to re-enable the caching in current versions. Try minimizing license checkouts by grouping files in the same compile command line.

↧

Slow floating license checkout

November 29, 2016, 12:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Threading Building Blocks flow graph: using streaming_node

≪ Previous: Long license checkout on remote workstations

The following issues can cause slow license checkouts:

Old license server information or license files. Check the following places for invalid licenses and delete any found:
- INTEL_LICENSE_FILE environment variable - make sure port@host is correct and any folders specified do not contain invalid license files.
- For Linux: /opt/intel/licenses
- For Windows: [Program Files]\Common Files\Intel\Licenses
A bug introduced with RHEL\CentOS 7.2. This adds a 25 second delay to the floating license checkout when IPv6 is disabled. More information here.
Running 2016 or newer versions of the compiler on a remote workstation. A license caching feature available in the 2015 product version was disabled in the 2016 version. More information here.

↧

Intel® Threading Building Blocks flow graph: using streaming_node

November 29, 2016, 2:50 pm

Latest and popular articles on Intel Technologies

≫ Next: What's New? Intel® Threading Building Blocks 2017 Update 3

≪ Previous: Slow floating license checkout

Introduction

The Intel® Threading Building Blocks (Intel® TBB) library provides a set of algorithms that enable parallelism in C++ applications. Since Intel® TBB 4.0, unstructured parallelism, dependency graphs and data flow algorithms can be expressed with flow graph classes and functions. The flow graph interface makes Intel® TBB useful for cases that are not covered by its generic parallel algorithms, while keeping users away from lower-level peculiarities of its tasking API.

Increasingly, systems are becoming heterogeneous and are starting to incorporate not only the power of CPUs but also different kinds of accelerators that are suitable for particular sets of tasks.

In an effort to better support heterogeneous solutions, async_node was added to the flow graph API to support parallel activities, external working threads (threads that are not in TBB thread pool), etc. The main limitation of async_node is that the result is returned to the graph at the same point. You cannot start an async_node activity in one place and return the async_node result at another point of the graph.

The problem described above can be resolved with another new Intel® TBB feature: async_msg. This concept is quite similar to the future/promise concept, a standard C++ feature, and it allows to get back to the graph in any point. You just need to pass the async message from the node where the async activity was started to the node where the async result is needed.

Moreover, Intel® TBB provides a special node with OpenCL support in it: opencl_node. The details can be found here: https://software.intel.com/en-us/blogs/2015/12/09/opencl-node-overview.

During the implementation of the node, we found that some concepts are quite generic and can be used for any heterogeneous APIs. For example, async_msg was developed as an implementation of the postponed asynchronous result concept for the Intel TBB flow graph. Another generic heterogeneous concept was implemented in the streaming_node class, which is described below.

streaming_node main ideas & workflow

As we look at the usual asynchronous and/or heterogeneous usage model, we can find that the model usually includes the following steps:

Receive input data.
Select a device for the kernel execution later.
Send the kernel arguments to the device.
Enqueue the kernel for execution on the device.
Get future result handlers from the device and store them somehow.
Send a future result object (async_msgs in fact) to the next graph node.

The workflow looks quite generic and independent of the particular device API. In Intel® TBB, the schema was implemented in the streaming_node class. However, the schema is quite abstract, so to make it workable we need to select a particular device API. In Intel® TBB, we refer to device APIs as Factories. We tried to make the Factory concept as simple as possible.

Let us look at the steps above from a responsibility areas point of view. This means that some steps can be implemented by streaming_node itself, some through a user-defined functionality, and some through the Factory concept (an abstraction of a device API)

Receive input data.

Responsibility of streaming_node

Select a device for the kernel execution later.

End-user’s responsibility (implemented via a special user functor)

Send the kernel arguments to the device.

streaming_node calls Factory::send_data and gets dependency handlers back

Enqueue the kernel for execution on the device + Get future result handlers from the device and store them somehow.

streaming_node calls Factory::send_kernel and gets dependency handlers back for the future result

Send a future result object to the next graph node.

streaming_node creates a async_msg object with saved dependency handlers in it

The main streaming_node workflow becomes clear from the text above.

Please note that dependency handlers are device API-specific, so only the Factory can know the particular dependency type. In the current implementation, async_msg class cannot store any additional dependencies, so the Factory must provide a dependency_msg class derived from async_msg. As a result, an additional requirement for the Factory concept is that it provides the Factory::async_msg_type type. In addition, the main Factory interfaces must be able to get and update (to store dependencies) Factory::async_msg_type objects:

Factory::send_data  (device_type device, Factory::async_msg_type& dependencies[ ])Factory::send_kernel  (device_type device, kernel_type kernel, Factory::async_msg_type& dependencies[ ])

Hello, World!

Let us try to implement asynchronous “Hello World” printing with the streaming_node.

We will use a C++ thread in place of a programmable device.

The following classes and functions are needed to implement it:

A special, taliored for this case asynchronous message (derived from async_msg)
A thread with parallel printing in it (our “device”).
A Factory that can work with the “device”.
A simple device_selector.
A main() function with 2 nodes.

Let us implement the components one by one:

hello_world.cpp: part 1: user_async_msg class

#include <iostream>
#include <thread>
#include <mutex>
#include <cassert>
#include <tuple>

#define TBB_PREVIEW_FLOW_GRAPH_NODES 1
#define TBB_PREVIEW_FLOW_GRAPH_FEATURES 1

#include "tbb/tbb_config.h"
#include "tbb/concurrent_queue.h"
#include "tbb/flow_graph.h"

template<typename T>
class user_async_msg : public tbb::flow::async_msg<T>
{
public:
    typedef tbb::flow::async_msg<T> base;
    user_async_msg() : base() {}
    user_async_msg(const T& input) : base(), mInputData(input) {}
    const T& getInput() const { return mInputData; }

private:
    T mInputData;
};

In the listing there are a few standard includes as well as several Intel TBB flow graph includes and definitions that enable async_msg and streaming_node classes in the Intel TBB headers.

The user_async_msg class is quite trivial: it just adds the mInputData field to store the original input value for processing in the asynchronous thread.

hello_world.cpp: part 2: user_async_activity class

class user_async_activity { // Async activity singleton
public:
    static user_async_activity* instance() {
        if (s_Activity == NULL) {
            s_Activity = new user_async_activity();
        }
        return s_Activity;
    }

    static void destroy() {
        assert(s_Activity != NULL && "destroyed twice");
        s_Activity->myQueue.push(my_task()); // Finishing queue
        s_Activity->myThread.join();
        delete s_Activity;
        s_Activity = NULL;
    }

    void addWork(const user_async_msg<std::string>& msg) {
        myQueue.push(my_task(msg));
    }

private:
    struct my_task {
        my_task(bool finish = true)
            : myFinishFlag(finish) {}

        my_task(const user_async_msg<std::string>& msg)
            : myMsg(msg), myFinishFlag(false) {}

        user_async_msg<std::string> myMsg;
        bool                        myFinishFlag;
    };

    static void threadFunc(user_async_activity* activity) {
        my_task work;
        for(;;) {
            activity->myQueue.pop(work);
            if (work.myFinishFlag)
                break;
            else {
                std::cout << work.myMsg.getInput() << '';
                work.myMsg.set("printed: " + work.myMsg.getInput());
            }
        }
    }

    user_async_activity() : myThread(&user_async_activity::threadFunc, this) {}
private:
    tbb::concurrent_bounded_queue<my_task>  myQueue;
    std::thread                             myThread;
    static user_async_activity*             s_Activity;
};

user_async_activity* user_async_activity::s_Activity = NULL;

The user_async_activity class is a typical singleton with two common static interfaces: instance() and destroy().

The class wraps a standard thread (we used the std::thread class), which processes tasks from a task queue (implemented via the tbb::concurrent_bounded_queue class).

Any thread can add a new task to the queue via the addWork() method. While the worker thread is processing the tasks one by one. For every incoming task, it just prints the original input string to the console and uses the async_msg::set interface to return the result back to the graph. The following pseudocode shows the format of the result: Result = ‘printed: ’ | original string, where “|” represents string concatenation.

hello_world.cpp: part 3: device_factory class

class device_factory {
public:
    typedef int device_type;
    typedef int kernel_type;

    template<typename T> using async_msg_type = user_async_msg<T>;
    template <typename ...Args>
    void send_data(device_type /*device*/, Args&... /*args*/) {}

    template <typename ...Args>
    void send_kernel(device_type /*device*/, const kernel_type& /*kernel*/, Args&... args) {
        process_arg_list(args...);
    }

    template <typename FinalizeFn, typename ...Args>
    void finalize(device_type /*device*/, FinalizeFn /*fn*/, Args&... /*args*/) {}

private:
    template <typename T, typename ...Rest>
    void process_arg_list(T& arg, Rest&... args) {
        process_one_arg(arg);
        process_arg_list(args...);
    }

    void process_arg_list() {}

    // Retrieve values from async_msg objects

    template <typename T>
    void process_one_arg(async_msg_type<T>& msg) {
        user_async_activity::instance()->addWork(msg);
    }

    template <typename ...Args>
    void process_one_arg(Args&... /*args*/) {}
};

In this example, the implementation of an asynchronous device factory is simple; in fact, it implements only one real factory method: send_kernel. The method gets incoming async messages as a C++ variadic template. As a result, in the implementation we just need to get all messages from the list and put them into the addWork() interface of our asynchronous activity.

Moreover, the Factory provides the correct async_msg_type for streaming_node, trivial (unused here) types for the device and the kernel, and empty implementations for the expected (but unused here) methods send_data and finalize. In your implementation, you can implement send_data to upload data to the device before the kernel run. Additionally, if the next node in the graph can reject incoming messages from streaming_node, the Factory must implement the finalize() method that calls the provided finalization functor by a finish callback from the device.

With all of the above in mind, the Factory concept can be implemented in several dozens of code lines in simple cases.

hello_world.cpp: part 4: device_selector class

template<typename Factory>
class device_selector {
public:
    typename Factory::device_type operator()(Factory&) { return 0; }
};

In this simple example we have just one device, so the device selector functor is trivial.

hello_world.cpp: part 5: main()

int main() {
    using namespace tbb::flow;
    typedef streaming_node< tuple<std::string>, queueing, device_factory > streaming_node_type;

    graph g;
    device_factory factory;
    device_selector<device_factory> device_selector;
    streaming_node_type node(g, 0 /*kernel*/, device_selector, factory);
    std::string final;
    std::mutex final_mutex;

    function_node< std::string > destination(g, unlimited, [&g, &final, &final_mutex](const std::string& result) {
        std::lock_guard<std::mutex> lock(final_mutex);
        final += result + "; "; // Parallel access
        g.decrement_wait_count();
    });

    make_edge(output_port<0>(node), destination);
    g.increment_wait_count(); // Wait for result processing in 'destination' node
    input_port<0>(node).try_put("hello");
    g.increment_wait_count(); // Wait for result processing in 'destination' node
    input_port<0>(node).try_put("world");

    g.wait_for_all();
    user_async_activity::destroy();

    std::cout << std::endl << "done"<< std::endl << final << std::endl;
    return 0;
}

In the main() function we create all the required components: a graph object, a factory object, a device selector, and 2 nodes: one streaming_node and one destination function_node, which processes asynchronous results. make_edge() is used to connect these 2 nodes together. By default, the flow graph knows nothing about our async activity and it will not wait for the results. That is why manual synchronization (via increment_wait_count() / decrement_wait_count())was implemented. After the end the execution of the graph, the worker thread can be stopped, and the final log string is printed.

The application output:

$ g++ -std=c++11 -I$TBB_INCLUDE -L$TBB_LIB -ltbb -o hello ./hello_world.cpp
$ ./hello
hello world
done
printed: hello; printed: world;

Note: the code needs C++11 support, so the key -std=c++0x must be used for compilation.

Conclusion

The article demonstrates how to implement a simple Factory that works with streaming_node– a new flow graph node in the Intel TBB library. The detailed description of streaming_node can be found in the Intel TBB documentation (see Intel® Threading Building Blocks Developer Reference -> Appendices -> Preview Features -> Flow Graph -> streaming_node Template Class).

Note that this functionality is provided for preview and is subject to change, including incompatible modifications in the API and behavior.

If you have any remarks and suggestions about the article, feel free to leave comments.

↧

What's New? Intel® Threading Building Blocks 2017 Update 3

November 30, 2016, 5:19 am

Latest and popular articles on Intel Technologies

≫ Next: Unreal Engine* 4: Setting Up Destructive Meshes

≪ Previous: Intel® Threading Building Blocks flow graph: using streaming_node

Changes (w.r.t. Intel TBB 2017 Update 2):

- Added support for Android* 7.0 and Android* NDK r13, r13b.

Preview Features:

- Added template class gfx_factory to the flow graph API. It implements
the Factory concept for streaming_node to offload computations to
Intel(R) processor graphics.

Bugs fixed:

- Fixed a possible deadlock caused by missed wakeup signals in
task_arena::execute().

Heterogeneous TBB (flow graph promotion):

- TBB flow graph: using streaming_node

↧

Unreal Engine* 4: Setting Up Destructive Meshes

November 30, 2016, 10:02 am

Latest and popular articles on Intel Technologies

≫ Next: Unreal Engine* 4: Blueprint CPU Optimizations for Cloth Simulations

≪ Previous: What's New? Intel® Threading Building Blocks 2017 Update 3

Download the Document [PDF 436 KB]

Download the Code Sample

Destructive Meshes

The following is a quick guide on getting a PhysX* Destructible Mesh (DM) working setup in an Unreal Engine* 4 (UE4*) project.

This guide is primarily based on personal trial and error; other methods may exist that work better for your project. See official documentation for tutorials on fracturing and troubleshooting if you would like to go more in depth with Destructive Mesh capabilities.

PhysX* Lab

To get started, download and install PhysX Lab. Version 1.3.2 was used for this paper.

FBX* Files

Whether made in Blender*, Maya*, or other modeling software, set the modeling units to meters, or scale up the model so it is the correct size before exporting. If the model comes into UE4 too small, it will need to scale up in the project, which can lead to errors in mesh collision. In general, avoid changing the scale of a DM in UE4, but if needed, scaling down works better than scaling up.

Fracturing

For the purposes of this paper, once the FBX* file is imported into the lab, go ahead and click the Fracture! button in the bottom right-hand corner. To learn more about this feature, see the tutorials on fracturing.

You can go back and play with the fracture settings after getting the more important parts set up, so don’t feel like you have to get the perfect fracture just yet.

Graphics

For a DM to have two different textures (outside and inside), follow these steps in the Control Panel (Figure 1):

Under the Graphics tab, and in the Material Library tab, find the green/white lambert texture.
Right-click the lambert and load the texture as a material.
Select a BMP or Targa file for the mesh.
Select the new texture in the Material Library tab, then under the Mesh Materials tab, click the Apply (black) arrow.
Now, under the Select Interior Material tab select the lambert and then click the Set Interior Material of Selected button. (You may see this result after applying the mesh material; this is recommend to make sure it takes on export.)
Set the U Scale and V Scale to 100.

Figure 1.Graphics tab in the Control Panel.

Settings

Now, for the DM, it’s time to play with some settings (Figure 2). As with the textures, these settings can be played with after the DM has been imported into UE4. It was found that turning settings on in the lab increases the chances of them working as intended when exported. These settings are:

Debris Depth
Use Lifetime Range
Use Debris Depth
Destruction Probability (This cannot be changed in UE; chance of chunk being destroyed when hit)
Support Depth
Asset Defined Support
World Overlap

Figure 2.Assets tab in the Control Panel.

Once finished with the settings and the fracture has been set, use Export Asset to export and move the DM to UE4.

Unreal Engine

Bringing a DM into UE4 is as easy as any other asset; use the Import button in the Content Browser. If the FBX file was set up correctly, the DM can be dragged into the scene.

DM Physics

Depending on the mechanics of your game, a few physics settings should be considered (Figure 3):

Simulate Physics
- A False setting is for things like walls and stationary objects.
- A True setting will cause the mesh to fall (unless gravity is off).
Enable Gravity
- When Simulate Physics is False (the default setting for most DMs), True makes it so a DM doesn’t fall to gravity, but broken chunks will still be affected by gravity.
- A False setting will cause the DM and its chunks to float around in space.
Use Async Scene
- If True, the DM will not collide with any other physics actor.
- If False, the DM can collide with other physics actors.

Figure 3.Physics panel.

If the DM is intended to break by falling to the ground or being run into, check the Enable Impact Damage option window (Figure 4). Changing the Impact Resistance changes the amount of force pushed back into the actor that the DM collides with.

Figure 4.Destructible Setting tab.

↧