vHost User NUMA Awareness in Open vSwitch* with DPDK

August 12, 2016, 4:27 pm

Latest and popular articles on Intel Technologies

≪ Previous: Comparing the Intel® Joule™ Module and the Intel® Edison Module

This article describes the concept of vHost User non-uniform memory access (NUMA) awareness, how it can be tested, and the benefits the feature brings to Open vSwitch (OVS) with the Data Plane Development Kit (DPDK). This article was written with users of OVS in mind who wish to know more about the feature. It may also be beneficial to users who are configuring a multi-socket virtualized OVS DPDK setup that uses vHost User ports as the guest access method for virtual machines (VMs) and want to configure and verify the optimal setup.

Note: At the time of writing, vHost User NUMA awareness in OVS with DPDK is only available on the OVS master branch. Users can download the OVS master branch as a zip here. Installation steps for OVS with DPDK are available here.

vHost User NUMA Awareness

vHost User NUMA awareness was introduced in DPDK v2.2 to address a limitation in the DPDK, surrounding the inefficient allocation of vHost memory in setups with multiple NUMA nodes. In order to understand the limitation the feature addresses, one must first understand the three different types of memory that vHost User devices comprise (see Figure 1).

#	Memory managed by	Description
1	DPDK	Device tracking memory
2	OVS	Backend buffers (mbufs)
3	QEMU	Guest memory (device and memory buffers)

Figure 1:Table describing the different types of vHost User memory in Open vSwitch* with the Data Plane Development Kit.

For an optimized data path, all three memory types should be allocated on the same node. However this wasn’t possible before DPDK v2.2, because the device-tracking structures for each device (managed by DPDK) had to all come from the same node, even if the devices themselves were attached to VMs on different nodes. This created a scenario where device tracking memory and guest memory are on different nodes, introducing additional Intel® QuickPath Interconnect (QPI) traffic and a potential performance issue (see Figure 2).

Figure 2:Dual-node Open vSwitch* with the Data Plane Development Kit configuration before vHost NUMA awareness capability.

In DPDK v2.2 and later, vHost structures are dynamically associated with guest memory. This means that when the device memory is first allocated, it resides in a temporary memory structure. It stays there until information about the guest memory is communicated from QEMU* to the DPDK. The DPDK uses this information to derive the NUMA node ID that the guest memory of the vHost User device resides on. The DPDK can then allocate a permanent memory structure on this correct node, allowing for the guest memory and device tracking memory to be located on the same node.

One last type of memory needs to be correctly allocated, which is the back-end buffers, or ‘mbufs’. These are allocated by OVS and in order to ensure an efficient data path, they must also be allocated from the same node as the guest memory and device tracking memory. This is now achieved by the DPDK sending the NUMA node information of the guest to OVS, and then OVS allocating memory for these buffers on the correct node. Before the addition of this feature, these buffers were always allocated on the node of the DPDK master lcore, which wasn’t always the same node that the vHost User device was on.

The final piece of the puzzle involves the placement of OVS poll mode driver (PMD) threads. PMD threads are the threads that do the heavy lifting in OVS and perform tasks such as continuous polling of input ports for packets, classifying packets once received, and executing actions on the packets once they are classified. Before this feature was introduced in OVS, the PMD threads servicing vHost User ports had to all be pinned to cores on the same NUMA node, that node being that of the DPDK master lcore. However, now PMD threads can be placed on the same node as the device’s memory buffers, guest memory, and device tracking memory. Figure 3 depicts this optimal memory profile for vHost User devices in OVS with the DPDK in a multiple NUMA node setup.

Figure 3:Dual node Open vSwitch* with the Data Plane Development Kit configuration with vHost NUMA awareness capability.

Test Environment

The test environment requires a host platform with at least two NUMA nodes. The host is running an instance OVS with DPDK and has two vHost User devices configured on the switch, ‘vhost0’ and ‘vhost1’. Two VMs are running on separate NUMA nodes, ‘VM0’ and ‘VM1’. ‘vhost0’ is attached to ‘VM0’ and ‘vhost1’ is attached to ‘VM1’. Figure 2 shows this configuration.

The setup used in this article consists of the following hardware and software components:

Processor	Intel® Xeon® processor E5-2695 v3 @ 2.30 GHz
Kernel	4.2.8-200
OS	Fedora* 22
QEMU*	v2.6.0
Data Plane Development Kit	v16.04
Open vSwitch*	914403294be2

Configuration Steps

Before installing DPDK and OVS, ensure that the NUMA libraries are installed on the system. For example, to install these on a Fedora OS, use:

sudo yum install numactl-libs
sudo yum install numactl-devel

Ensure the DPDK is built with the following configuration option enabled:

CONFIG_RTE_LIBRTE_VHOST_NUMA=y

Now OVS can be built and linked with the DPDK.

Configure the switch as described in the “Test Environment” section, with two vHost User ports. Configure the ‘pmd-cpu-mask’ to enable PMD threads to be pinned to cores in both NUMA nodes. For example in a 28-core system where cores 0–13 are located on NUMA node 0 and 14–27 are located on NUMA node 1, set the following mask to enable one core on each node:

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=10001

Before launching the VMs, check the PMD distribution with the following command:

ovs-appctl dpif-netdev/pmd-rxq-show

Because the VMs are not yet launched and information about the guest memory is not yet known, the PMD threads associated with the vHost User ports will be located on the same NUMA node:

pmd thread numa_id 0 core_id 0:
        port: dpdkvhostuser1    queue-id: 0
        port: dpdkvhostuser0    queue-id: 0

Now launch two VMs, VM0 on node 0 and VM1 on node 1. To ensure the intended placement of the VM cores, use the ‘taskset’ command. For example:

sudo taskset 0x2 qemu-system-x86_64 -name VM0 -cpu …
sudo taskset 0x2000 qemu-system-x86_64 –name VM1 -cpu …

Check the logs of the VMs. VM1 will print a log similar to the following:

VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: reallocate vq from 0 to 1 node
VHOST_CONFIG: reallocate dev from 0 to 1 node

This means that the device tracking memory has been moved from the temporary memory structure on the original node (0) to a permanent structure on the correct node (1).

Another way to verify successful relocation is to check the PMD distribution again using the ‘pmd-rxq-show’ utility:

pmd thread numa_id 1 core_id 20:
        port: dpdkvhostuser1    queue-id: 0
pmd thread numa_id 0 core_id 0:
        port: dpdkvhostuser0    queue-id: 0

‘dpdkvhostuser1’ is now serviced by a thread on NUMA node 1, which is the node on which the VM it is attached to is running.

Conclusion

In this article we described and showed how DPDK and OVS dynamically reallocates memory and relocates threads according to how the test environment is set up. We have demonstrated the different ways you can verify the correct operation of the vHost User NUMA awareness feature.

Additional Information

For more details on the DPDK vHost library, refer to the DPDK documentation.

For more information on configuring vHost User in Open vSwitch, refer to INSTALL.DPDK.

Have a question? Feel free to follow up with the query on the Open vSwitch discussion mailing thread.

To learn more about OVS with DPDK, check out the following videos and articles on Intel® Developer Zone and Intel® Network Builders University.

QoS Configuration and usage for Open vSwitch* with DPDK

Open vSwitch with DPDK Architectural Deep Dive

DPDK Open vSwitch: Accelerating the Path to the Guest

About the Author

Ciara Loftus is a network software engineer with Intel. Her work is primarily focused on accelerated software switching solutions in user space running on Intel® architecture. Her contributions to OVS with DPDK include the addition of vHost Cuse and vHost User ports and NUMA-aware vHost User.

↧

Moon Hunters Hero Design

August 18, 2016, 9:36 am

Latest and popular articles on Intel Technologies

≫ Next: Industrial Use Case and Tutorial: Intel® and the IBM Watson* IoT Platform

≪ Previous: vHost User NUMA Awareness in Open vSwitch* with DPDK

Hey game devs! I’m Tanya, the lead of Kitfox Games, a six-person studio in Montreal, Canada. We released Moon Hunters to desktop in March and PS4 in July. Moon Hunters has six player classes, and we’re often asked how we designed their gameplay. The short answer is that we iterate. Many times.

Moon Hunters

I am happy to share our process for that iteration with you, based on what we learned over the last two and a half years of Moon Hunters development.

The truth is that there are many elements to consider when creating a player character in any action-combat game. In order from most to least obvious to the player, the key aspects of a character are:

Visuals: their appearance and animations
Efficacy: their abilities and all-around usefulness in combat
Uniqueness: their distinctive “role” or flavor, compared to other characters
Depth: how many layers of dynamic gameplay arise from using abilities in different ways
Multiplayer depth: how many layers of dynamic gameplay arise from playing with other characters
Accessibility: how easy/intuitive/forgiving their abilities are to use

For certain well-worn archetypes (such as a Warrior character) you could side-step the creation process and just jump into making a melee character based on Link or Kratos. After all, you can reverse-engineer each of those aspects from the start, and then save time, right? You can always add a twist later! The danger is that simply “injecting uniqueness” into an established character design is harder than it seems, because everything from enemy abilities to stun timing to invulnerability frames are related in ways that are difficult to figure out until you build it from the ground up.

So if you don’t know why the wheel was made the way it was, you should take the time to figure it out. I won’t pretend that all six player classes in Moon Hunters are perfect, but I can defend every single character ability on the grounds of how it achieves the goals we set for it. Every decision was deliberate, and we learned more from each mistake along the way.

Our internal process for creating a character examines each aspect, in the following order:

Uniqueness, or “The Why”
Efficacy, or “The How”
Accessibility, or “The Who”
Visuals, or “The What”
Depth
Repeat steps 2-4

Let’s pretend we have a relatively blank slate. We’re making an action combat game. How do we design a good player character?

Step 1: Uniqueness, “The Why”

Even in a single-player game, the most important element of a character’s design is that the player can recognize the character in the game. Whether you’re playing as a dragon, a robot, or a ball of sludge, the player should look at a screenshot and not only see where the character is on the screen, but also think it is cool (or if your desired aesthetic isn’t cool, then cute or sad or whatever else). Why does anyone want to play this character? Why does it exist?

Character Select Palettes

Personally, I like to choose a single element to base the whole character idea around. For the Ritualist, this was “shadows.” Based on this, I wrote a small flavor statement and the artist drew a piece of concept art.

“The Ritualist uses mathematical magic to manipulate dark matter in the universe. Ranged/control?”

Bit Bazaar Occultist

So now the team has an idea of why the Ritualist exists, and what the goal to achieve is, with all future development of the character. Dark, intellectual. If you’re making a multi-character game, I would highly recommend finishing this step for as many player characters as possible to make sure that each character still feels unique when they sit next to each other in the player’s mind. The “ranged/control?” bit of the pitch was added for us after all of the four base classes had been defined, as the other three seemed more melee-based.

Step 2: Efficacy, “The How”

Now it’s time to prototype! Using placeholder art, a gameplay designer plays around with a few different possible ability types and test them against different enemy types, always keeping in mind the unique purpose and flavor of the character. This can take a while. Take your time.

A few example elements to play with: range of attack, speed of attack & movement, aiming, durability, hit pause, turning, charging up, temporary buffs/debuffs, transformation, size, equipment, recovery time, stunning, etc.

For the Ritualist, her base abilities became firing ranged shadow-orbs, teleport, and spawning a black hole to draw enemies in towards its center. These abilities give her the most intellectual-feeling attacks. She also briefly had a Shadow Wave type attack to focus her damage, but all of the characters later underwent streamlining, removing an ability to differentiate them from each other (for increased co-operative play depth).

Ritualist FX

We didn’t take any screenshots of the programmer art during prototyping, so the above animated gif is after an artist treatment (step 4), even though the mechanics themselves were set during Step 2.

It’s worth noting that efficacy and depth are dependent on the types of enemies the player will encounter; if you’re making a bullet-hell type of game, your character’s mobility will be much more closely tied to its efficacy than in your average dungeon-crawler. Area of effects won’t be very effective if you mostly fight single targets. There is no mathematical formula to tell you how to implement the vision of a character and achieve its atmospheric goal. Start prototyping, playtesting, and be honest with yourself.

Step 3: Accessibility, “the Who”

Accessibility is mentioned here if it hasn’t been defined by now, the designer decides who is playing the character. How challenging or risky is it to play this character? Challenge is a popular design goal, but defining what that means within your existing combat framework will help place this character more firmly in its role.

For example, we discovered during prototyping that the Ritualist would probably work best as a moderately-difficult character. She’s available from the start and appealing to those with experience in twin-stick shooters. As an “intellectual” type character, even in prototyping, she felt more light and fragile than the melee characters.

Step 4: Visuals, “the What”

Here, the gameplay designer/programmer gives control to the artist(s). The artist defines the key animation frames and effects for the character, as well as any visual mechanics, to express that uniqueness within the gameplay constraints defined in steps 2 & 3. Resist the temptation to add too much detail now, wait until the gameplay is completely nailed down. It’s wasteful to get too deep into perfect, polished animations and effects.

For the Ritualist, this took the form of the shadow orb, which follows her around everywhere she goes, and her pants were swapped with a wider skirt to make her feel more “floaty” and mysterious as she teleports around. However, the distortion filter to the black hole ability wasn’t added yet, because we weren’t sure what exactly the radius would be.

Bit Bazaar Occultist

After Step 4, since we now know who is supposed to enjoy this character, we can also start playtesting with this character. Does our intended player they actually think it’s cool?

Step 5: Depth (Solo & Multiplayer)

Depth means different things to different people, but I use it to mean the “dynamics” of a character: interesting player behaviours, based on combining abilities. If players all use the abilities you’ve given them in exactly one way – a certain timing or sequence – it probably means there is not enough depth there.

Chances are that when you initially prototyped the character, you had at least one dynamic in mind between the different abilities, but often these can only be verified after extensive playtesting and adjusting. I would wait until all player characters reached at minimum step 4 before attempting depth testing, because there might be unanticipated dynamics (for better or worse) across the characters already.

A perfectly designed action-combat character would not only allow experts to discover advanced techniques to maximize their efficacy, but also it would support several different uses, which empowers even expert players to adapt to different challenges.

Initially, the Ritualist’s black hole ability went in the direction the character was facing, and went out a set distance, because we thought it would be easier for newer players to deal with fewer variables. By adding an optional charge-and-release functionality, expert players could experiment with aiming and range, which resulted in more varied positioning strategies and timings.

Ritualist FX

Depth is also a standard excuse for feature creep, so watch out! You could get stuck in an endless loop of trying to increase depth. As we’ll discuss shortly, there is no way to know when it is ”deep enough” and you are done.

It’s highly recommended to go back to steps 2, 3, and 4 after each and every depth improvement to verify that each element is still functioning and serving the character vision from step 1. Efficacy and (the intended) accessibility are essential. If a character is ineffective or unusable, there’s no point in adding further depth because nobody will enjoy the character enough to discover the love you’re pouring into it.

Step 6: Repeat Until You Die

Steps 2-5 should be iterated, again and again. You’ll never really be done. Sorry. Much like any other art form, action-combat design is all about deciding when you’ve reached the finish line. Nobody will give you a diploma and congratulate you on achieving your Complete a Character goals. It could always be a little bit more X, or a little bit less Y, or try out a hint of Z.

The fuzziness of the end goal is a side-effect of how developed this particular genre of gameplay is. There are so many excellent examples of combat design, and more are released all the time. Your game will struggle to find its place among them. Every tweak to the art can have a gameplay impact; every adjustment of the abilities can require another art pass.

Even if you try to avoid every possible risk and create a relentlessly detail-oriented clone of another game’s player character, many elements that seem like minor considerations (screen resolution, controller style, art style, framerate, character height) become problems that influence every element of the design.

Solo Fights

The good news is that if you find yourself on an iteration loop, you’re in good company. Every great character had many iterations before it found its sweet spot, from Nathan Drake to the archers of Towerfall. Keep at it and your players will appreciate the hours you put in.

Summary

While I wrote this description, the programming and the hardest work of system design described here was mostly done by Jongwoo Kim, Henk Boom, and Mike Ditchburn. And of course, Moon Hunters would be nothing without the beautiful character designs of Xin Ran Liu and G.P. Lackey, or the lovely animation talent of Mike Horowitz. Most games are a team effort, and Kitfox Games are no exception.

Creating the six heroes for Moon Hunters probably took more time than any other single feature in the game – procedurally generating the world, the metagame, the dialogue system etc. We could have easily spent another 1000 hours on them. Luckily, thanks to enthusiastic player support, we are in a position where we can keep improving the game, through patches and hopefully a DLC in the future. Maybe we’ll even introduce another player class someday. Maybe. Wish us luck.

Meanwhile, if you use our hero design process, drop me a note to so I can cheer you on at tanya@kitfoxgames.com, or Tweet us @kitfoxgames!

↧

Industrial Use Case and Tutorial: Intel® and the IBM Watson* IoT Platform

August 18, 2016, 3:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites

≪ Previous: Moon Hunters Hero Design

This guide describes the implementation of an industrial use case using Intel® IoT Gateway and the IBM Watson* IoT Platform running on IBM Bluemix*. The building blocks for implementing this use case are described in Connecting to the IBM Watson* IoT Platform with Intel® Gateway IoT Software Suites. That guide covers setting up an Intel IoT Gateway, connecting sensors, setting up the IBM Watson IoT Platform running on Bluemix, and connecting the gateway to the Watson IoT Platform so that you can send real-time sensor data to the cloud for storage and processing.

Use Case and Business Imperative

The use case is a facility for storing bulk sugar used in industrial baking. The ideal storage conditions for sugar are at a temperature of 20–25°C with a relative humidity (RH) of 55–60 percent. When stored under proper conditions, sugar has a shelf life measured in years. When conditions are outside ideal ranges, several issues arise:

Temperature greater than 25°C promotes caking because water vapor is released.
An RH less than 50 percent promotes caking, hardening, and loss of flowability.
An RH greater than 70 percent promotes syrup formation, tackiness, and mold and yeast growth.
Temperature variations up and down cause the release of water vapor and recrystallization, which results in agglomeration (sticking together). It also leads to crust formation on storage container walls.
The storage facility has traditionally used manual techniques to take periodic temperature and humidity measurements at the beginning and end of the work day, evaluate the readings to detect issues and take corrective actions, and maintain records for historical purposes. This process is labor intensive, doesn’t measure conditions 24 hours a day every day, and can fail to detect temperature up/down cycles that degrade the sugar. It’s also difficult to evaluate the storage conditions over long periods of time, which is important, as the amount of sugar stored fluctuates throughout the year due to variations in baking cycles.
The storage facility wants to expand its business but realizes that its current approach to managing storage conditions won’t scale. After conducting a series of business strategy sessions involving the marketing, sales, IT, and operations teams, the company identified several operational objectives to help grow and scale the business:
Automate the process of continuously measuring temperature and humidity in the storage silo.
Automatically detect temperature or humidity issues and alert the operations team so that it can take corrective actions.
Have all the measurement data in one place so that up/down variations in temperature can be detected and evaluated over time (including as the seasons change).
Be able to view current and historical measurement data.
Have the data in a computerized format so it can be used to demonstrate to prospective customers how the company ensures a high-quality product through continuous monitoring of storage conditions.

Implementation Approach

To implement this use case, you’ll use the following components. Use Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites to get these components up and running.

Intel® IoT Gateway and Temperature/Humidity Sensor

An Intel® NUC Kit DE3815TYKE gateway running an Intel® Atom™ processor will reside onsite at the storage facility. The gateway will be used to control an industrial temperature and humidity sensor (OMEGA* RH-USB) mounted in each storage silo. It has a stainless steel housing and is suitable for either wall or duct mounting. The sensor measures temperature from -17 to 49°C (±1°C accuracy). It measures RH from 2 percent to 98 percent (±3 percent accuracy). The sensor provides standard USB output, which you use to read temperature and humidity data using the gateway.

You will write the software on the gateway in Node-RED*, which is a graphical tool for wiring Internet of Things (IoT) applications. The gateway software will periodically read the sensors, format and combine readings into a measurement data set, and securely transmit the data to the IBM Watson* IoT Platform for further processing.

IBM Bluemix

Bluemix is a cloud platform for hosting scalable services, applications, and data. You will use it to run the IBM Watson IoT Platform and IBM Watson IoT Platform Analytics Real-Time Insights services.

IBM Watson IoT Platform

The IBM Watson IoT Platform allows you to connect devices and gateways securely to the Bluemix cloud so that you can exchange data with other services and applications. It also provides tools that allow you to view device status and monitor real-time device data feeds.

IBM Watson IoT Platform Analytics Real-Time Insights

This service allows you to monitor and contextualize data from your devices and gateways, visualize what’s happening in your operations, and respond through automated actions. You’ll use this service to create rules and automated actions driven by the temperature and humidity measurements coming from the gateway.

Prerequisites and Overview of Changes

Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites provides step-by-step details for setting up your Intel® IoT Gateway with an RH-USB sensor, establishing a Bluemix account, and creating instances of the IBM Watson IoT Platform and IoT Real-Time Insights services on Bluemix. Complete those steps, and then verify that real-time temperature and humidity readings appear on the gateway’s Intel® IoT Gateway Developer Hub portal and real-time temperature appears on the Watson IoT Platform dashboard. That configuration will be the basis for changes and additions to implement this use case.

Gateway Changes

On the Intel® IoT Gateway, you’ll make the following changes to the base configuration described in Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites:

Modify the Node-RED flow to treat the RH-USB sensor as a separate device attached to the gateway.
Report temperature and humidity in a combined data message sent from the gateway to The IBM Watson IoT Platform.

IBM Watson IoT Platform Changes

On the IBM Watson IoT Platform, you’ll make the following changes to the base configuration described in Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites:

Create a new device type for the OMEGA RH-USB sensor.
Enable historical data storage in the IBM Watson IoT Platform.
Send sensor data to IoT Real-Time Insights using a data schema.
Create rules that monitor temperature and humidity thresholds and trigger actions when they are out of bounds.
Create actions to send a text message and email when temperature or humidity readings are out of bounds.

Implementation Details

To implement the use case, you must make some changes and additions to both the gateway and the IBM Watson IoT Platform using what was covered in Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites as the starting point. Log in to the Intel® IoT Gateway Developer Hub and your IBM Watson IoT Platform account using two separate browser sessions. You want to be able to access both systems quickly during these steps.

You make changes marked with Gateway on the Intel IoT Gateway. Make changes marked with IBM Watson IoT Platform on the IBM Watson IoT Platform service and changes marked with IoT Real-Time Insights on the IoT Real-Time Insights service.

Gateway: Your starting Node-RED flow in the Intel IoT Gateway Developer Hub looks like Figure 1.

This flow reads the RH-USB temperature and humidity sensor once per second, displays real-time temperature and humidity values in the Intel IoT Gateway Developer Hub portal, and sends the temperature readings to the IBM Watson IoT Platform, where they’re displayed on the IBM Watson IoT Platform dashboard.

Figure 1. The starting Node-RED flow in the Intel® IoT Gateway Developer Hub portal.

IBM Watson IoT Platform:

Complete the following steps:

On the Devices page, click the device types tab, and then click Create Type.
Click Create device type, and then set the name to RH-USB and the description to Temperature and humidity sensor. Click Next.
On the template page, select the Manufacturer and Model check boxes, and then click Next.
Set the manufacturer to Omega and the model to RH-USB, and then click Next.
Leave the optional metadata blank, and then click Create.

You should now see two device types in the IBM Watson IoT Platform dashboard (Figure 2).

Figure 2. Your two devices in the IBM Watson IoT Platform dashboard.

Click the browse tab, and then click Add Device.
Choose the RH-USB device type, and then click Next.
For Device ID, type RH-USB-1, and then click Next.
Leave the optional metadata blank, and then click Next.
For Security, you’ll use Auto-generated authentication token, so click Next to use that option.
Review the summary page, and then click Add.

After adding the device, you see the device credentials. Copy the Organization ID and Authentication Token strings to a text file and save it for later use (Figure 3). After you close this page you won’t be able to see the auto-generated authentication token again, and you would have to re-create the device to get a new one.

You should now see two devices in your device list in the IBM Watson IoT Platform dashboard (Figure 4).

Figure 4. You have now added two devices to the IBM Watson IoT Platform.

Gateway:

On the Node-RED canvas, double-click the RH-USB node, and then click Edit flow. The details of the RH-USB node open in a new tab (Figure 5).

Figure 5. The details of your RH-USB node open in a new tab.

Click the plus sign (+) shown in callout 1 in Figure 5 to increase the number of outputs from 2 to 3. Doing so adds a third output that you can reposition to be below output 1 on the flow. Double-click the function node (see callout 2) to edit it. Increase the number of outputs to 3, add the extra code shown in Figure 6, and then click Ok.

Figure 6. Extra code for the new function node.

This change causes the third output of the RH-USB node to emit a combined temperature and humidity object that you’ll use to send temperature and humidity measurements to the IBM Watson IoT Platform. Wire the third output of the function to output 3, and then add a comment node to describe the third output. The resulting flow should look like Figure 7.

Figure 7. The flow after wiring the third output of the function node to output 3 and adding a comment node.

Deploy your flow changes to the gateway to make them active.

On Sheet 1 of your Node-RED flow, drag another IBM Watson IoT Platform output node onto the canvas and set its properties as follows (Figure 8):

Connect as: Gateway (Registered)
Credentials: Watson IoT Gateway Credential
Device Type: RH-USB
Device Id: RH-USB-1
Event type: event
Format: json
Name: Send to Watson IoT Registered w/Device ID RH-USB-1

Figure 8. Properties for the new output node.

Click Ok. Note that you’re specifying the RH-USB device type and RH-USB-1 device instance that you created in the IBM Watson IoT Platform. You set up the Watson IoT Gateway Credential in Connecting to the IBM Watson IoT Platform with Intel® IoT Gateway Software Suites; it defines the name and device ID of the gateway itself.

Delete the Node-RED wires going into the Send to Watson IoT Quickstart and Send to Watson IoT Registered nodes, and connect a wire from the third output of RH-USB to the input of Send to Watson IoT Registered w/Device ID RH-USB-1. This path sends the temperature and humidity data to the IBM Watson IoT Platform. The updated flow should look like Figure 9.

Figure 9. The updated flow after connecting a wire from the third output of the RH-USB node.

Deploy the updated flow to make it active on the gateway.

IBM Watson IoT Platform

Navigate to the Devices page and click the RH-USB-1 device. In the Recent Events section you should see a series of events: These are the temperature and humidity readings coming from the gateway. Click an event to see the reported data.

At this point, you have live temperature and humidity data flowing from the RH-USB sensor to the gateway. The gateway combines and preprocesses the data, and then generates data packets that it sends to the IBM Watson IoT Platform. Next, you add functionality to the IBM Watson IoT Platform to process the data packets.

IoT Real-Time Insights

Go to your Bluemix dashboard, and then click IoT Real-Time Insights to access the IoT Real-Time Insights dashboard. You must define a schema that describes the data packet coming from the sensor. To do so, complete the following steps:

In the navigation links toward the top, click Devices, and then click Manage Schemas.
Click Add new message schema.
On the Message Schema page, in Name, type Temperature_Humidity.
Click Link data source, and then choose Intel IoT Gateway for Data source. For Device type, choose RH-USB. Leave Event set to + (All events), and then click OK.
Click Add data points, and then choose Add from connected device. Select d, temperature, and humidity, and then click OK.
Verify that your schema looks like Figure 10, then click OK to add it.
Figure 10. Your new message schema.
Navigate to Devices, then click Browse Devices.
Click RB-USB-1.

You should see live sensor data, which should match the readings on the gateway’s Intel® IoT Gateway Developer Hub portal (Figure 11).

Figure 11. The sensor data in your gateway’s Intel® IoT Gateway Developer Hub portal.

Next, add business logic to monitor temperature and humidity conditions and send email and text alerts when they are out of range. To do so, complete the following steps:

Navigate to Analytics, and then to Rules.
Click Add new rule.
Set the following fields (Figure 12):

Name – Temperature problem
Description – Temperature is outside desired range
Message schema– Temperature_Humidity

Figure 12. Adding rules for your sensor data.

Click Next.

Now you see the rule editor, which is organized as a series of “If This Then That” (IFTTT) rules. If conditions use AND and OR logic to define trigger conditions. You define one or more actions in the THEN section to perform actions when the trigger conditions occur.

To monitor the storage temperature range, define an IF condition that detects temperature readings too low or too high. Complete the following steps:

Click New condition.
Complete the data fields as follows:

Compare to: Select Data Point.
Data point: Click Select data point, and then choose temperature in the d data set.
Operator: Choose<.
Compare with: Choose Static value.
Value: Type 20.

Click OK.
Click the plus sign under OR, and then add another condition for temperature > 25:

Compare to: Select Data Point.
Data point: Click Select data point, and then choose temperature in the d data set.
Operator: Choose >.
Compare with: Choose Static value.
Value: Type 25.

Figure 13 shows the complete dialog boxes.

Figure 13. Completed properties for your IF…THEN rules.

Click Trigger every time conditions are met, and then set the frequency requirement to trigger if conditions are met 10 times in 15 minutes (Figure 14).

Figure 14. Set the frequency requirement for your IF…THEN condition.

Click Save.
In the THEN section, click New action, and then click Add action.
You’ll create an email action that sends a notification email to the operations team.
Fill out the fields as follows, and then click OK:

Type: Send email
Name: Send email to ops team
To: <Email address to receive notifications>
CC: <Additional email address, if desired>
Subject: Temperature problem in sugar storage
Select Prepend with “IoT Real-Time Insights alert”

Figure 15 shows the completed dialog box.

Figure 15. The completed Add New Action dialog box.

Click the newly created action in the Set the action dialog box, and then click OK to add the action to the THEN section.
In the THEN section, click the plus sign next to the email action to add a second action.
Complete the files as follows:

Type: Send email
Name: Send email to ops team
To: <Mobile phone’s email-to-text address>
CC: <Additional email address, if desired>
Subject: Temperature problem in sugar storage
Select "Do not include device data in the email message"

Figure 16 shows the completed dialog box.

Figure 16. The completed Edit Action dialog box.

Your IF…THEN rule set should look like Figure 17.

Figure 17. The completed IF…THEN rule set.

Change the Alert priority to High, and then click Save.

All the actions you define you can view and update by clicking Analytics, and then clicking the Actions tab in IoT Real-Time Insights. Actions include sending email, calling a webhook, invoking a Node-RED flow running in Bluemix, and triggering an IFTTT action.

Your new rule should show on the Analytics> Rules page and currently be in the Deactivated state (Figure 18). Click the gear icon on the Temperature rule, and choose Activate. Keep in mind that for the IoT Real-Time Insights free plan, rules can only be active for 1 hour, after which they are automatically deactivated.

Figure 18. Your new rule on the Analytics > Rules page.

Test the rule by raising the temperature of the RH-USB sensor above 25°C for at least 15 seconds. You should receive an email message and a text message from IoT Real-Time Insights similar to Figure 19.

Figure 19. Sample alert from IoT Real-Time Insights for an out-of-bounds temperature reading.

Next, add a rule to monitor the humidity range. Complete the following steps:

Add a new rule named Humidity problem.
Use the d.humidity data point.
Add two condition checks using the OR operator: one for humidity < 55 and one for humidity > 60.
Set the trigger frequency to 10 times in 15 minutes.
Create and add two new actions named Send humidity email to ops team and Send humidity text to ops team.
Save the rule, and then activate it.

The completed rule should look like Figure 20.

Figure 20. New rule to monitor the humidity range of your sensor.

When humidity is out of range, IoT Real-Time Insights generates an email and text alert, similar to the temperature case.

Bringing It All Together

At this point, a real-time monitoring solution is in place that will notify the operations team when storage conditions are outside the desired range. To take it further, you can use the power of the Intel® IoT Gateway Technology combined with the IBM Watson* IoT Platform and Bluemix* to implement a variety of additional functionality. Here are a few ideas.

With respect to the gateway, you could connect more temperature and humidity sensors or combine data from multiple sensors and send to the IBM Watson IoT Platform. You could drive a live temperature and humidity display and control temperature or humidity remediation devices. In addition, consider using the built-in security features to harden your gateway.

For the IBM Watson IoT Platform and other services on Bluemix, you could send temperature and humidity data to a database for long-term retention. Dashboard displays could be added to show sensor data and monitoring alerts or display sensor locations on a map. By adding more complex data analytics and machine learning, you could spot trends and detect undesirable temperature cycling patterns. You can even create and host a mobile app that lets you view storage conditions on a mobile device. The possibilities are virtually limitless.

↧

Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites

August 18, 2016, 4:35 pm

Latest and popular articles on Intel Technologies

≫ Next: Introducing the Intel® Joule™ Module

≪ Previous: Industrial Use Case and Tutorial: Intel® and the IBM Watson* IoT Platform

This guide shows you how to set up and program three key components for designing and developing real-world Internet of Things (IoT) apps:

Arduino 101*, with the Seeed Studio Grove* Starter Kit Plus – IoT Edition
Intel® IoT Gateway
IBM Watson* IoT Platform hosted in the IBM Bluemix* cloud

The Arduino 101* (Figure 1) is a low-power processor board for building a variety of sensor applications. It features a 32-bit Intel® Quark microcontroller, an Intel® Curie module with flash and SRAM memory, a Bluetooth* Low Energy radio, and a 6-axis motion sensor with accelerometers and gyroscopes. It has 14 digital input/output pins (of which 4 can be used as Pulse Width Modulation outputs), 6 analog inputs, SPI and I2C communications interfaces, and a USB interface for serial communications. Its Arduino header allows you to add a variety of Arduino shields for additional I/O and communication capabilities.

Figure 1. The Arduino 101*

The Arduino 101* can be programmed using the Arduino IDE with standard Arduino programming commands. It can also be controlled remotely using the Firmata protocol and associated Firmata sketch that runs on the Arduino 101*.

Seeed Studio Grove* Starter Kit Plus

The Grove* Starter Kit Plus provides a way to add mix-and-match sensors and actuators to the Arduino 101* to prototype IoT applications (Figure 2). It consists of an Arduino-compatible shield (which plugs into the Arduino 101* base board), with a set of connectors for different types of Grove sensors and actuators. More than 200 Grove sensors and actuators are available, and the Grove Starter Kit Plus contains components such as:

Temperature sensor
Buzzer
Button
LED indicators
Sound sensor
Three-axis digital accelerometer
Touch sensor
Light sensor
LCD character display
Rotary angle sensor
Piezo vibration sensor

Figure 2. The Seeed Studio Grove* Starter Kit Plus

Intel® IoT Gateways

Intel® IoT Gateways (Figure 3) are edge-processing nodes that combine an Intel processor; a robust operating system; security software; various types of I/O interfaces; and several networking options, including Ethernet, Wi-Fi, and wireless WANs. Gateways tie together groups of individual devices — for example, a set of sensors within a building. They allow hierarchically organized IoT systems and provide critical functions such as local device I/O management (for example, combining data from groups of localized sensors), secure access to upstream systems (for example, cloud platforms), network security and device isolation, remote software updates, and the ability to continue performing local processing in the event of WAN issues.

Figure 3. The Intel® NUC Kit DE3815TYKHE IoT Gateway

You can choose from a variety of Intel® IoT Gateways, each with different combinations of capacity, form factor, operating temperature range, and I/O interfaces. Which gateway you use generally depends on the needs of your application and the environment in which it will be deployed (for example, an office environment versus an industrial environment). In this guide an Intel® NUC Kit DE3815TYKHE gateway will be used.

You can use a variety of languages to program Intel® IoT Gateways, including Python*, Node.js*, C/C++, Java*, and graphical environments such as Node-RED.

IBM Watson* IoT Platform

The IBM Watson* IoT Platform running on IBM Bluemix* is a cloud-based platform for connecting devices and gateways in the physical world with scalable processing and storage services running in the cloud (Figure 4). It provides a highly scalable environment for building real-world applications in domains such as asset management, facilities management, manufacturing, supply chain, transportation, energy, and smart cities.

The IBM Watson IoT Platform lets you register and manage devices, connect devices and applications in real time using the industry-standard Message Queuing Telemetry Transport (MQTT) protocol, securely exchange device commands and data, store and access device data, view operations in a powerful web portal, and allow devices to interact with a wide range of other applications and services available in the Bluemix cloud. All these capabilities are critically important when designing IoT applications that will scale to thousands or millions of devices.

Figure 4. The IBM Watson* IoT Platform (l) running on IBM* Bluemix* (r)

The IoT Development Process

Development of IoT applications typically follows these steps:

Create an initial idea and candidate use cases.
Test and refine the idea through iterative prototyping.
Implement and evaluate scalability and product/market fit through pilots and trials.
Productize the solution and design for manufacturing, and design for scale.
Bring to market and deploy to production.
Operate and maintain over time.

The Arduino 101*, Intel® IoT Gateways, and the IBM Watson* IoT Platform provide a rich and scalable set of tools to support each step of such a process, shortening overall time to market while providing the scalability, security, and operational features needed for real-world IoT applications.

How This Guide Is Organized

This guide shows you how to get started with each component and create running applications.

Part 1 explains how to set up the Arduino 101*, Grove* Starter Kit Plus, and the Intel® IoT Gateway. The focus of Part 1 is getting sensors to run in a localized environment and generating data.

Part 2 explains how to connect the components from Part 1 to the IBM Watson* IoT Platform running in Bluemix*, where data can be stored, viewed, and acted upon to create scalable, sensor-driven business applications.

Part 1: Setting Up the Arduino 101, Seeed Studio Grove Starter Kit Plus, and Intel® IoT Gateway

The Arduino 101* and Grove* Starter Kit Plus are often used during the iterative prototyping and pilot phases of an IoT project. Their combination of powerful computing, wireless networking, variety of I/O interfaces, Arduino-compatible connectors, and mix-and-match Grove sensors and actuators allow you to turn ideas into running prototypes quickly.

Fun with the Arduino 101 provides a step-by-step guide on how to set up the Arduino 101*, install and use an integrated development environment, connect the Grove Starter Kit Plus, and run example programs that interact with digital and analog sensors.

With the Intel® IoT Gateway, you can build sophisticated IoT applications that require a combination of functionality, security, manageability, and performance. You can connect sensors and actuators directly to a gateway and process and view their data locally. You can also connect groups of sensors (for example, Grove sensors connected to an Arduino 101*) to a gateway for localized processing and to form sensor clusters. Gateways can be securely connected over the Internet to the IBM Watson* IoT Platform so that you can send data to the cloud, where it can be stored, analyzed, and acted upon.

Set Up the Gateway

These instructions apply to an Intel® IoT Gateway model Intel® NUC Kit DE3815TYKHE with an Intel® Atom™ processor. The gateway is preloaded with an operating system image and configured with default system settings. The operating system is the Intel® IoT Gateway Software Suite, a robust Wind River* operating system based on Linux*.

Set up the gateway hardware by making the following connections:

Connect the power supply and plug it into wall power.
Connect the gateway to a wired Ethernet LAN network that has connectivity to the Internet.
You won’t need to connect a VGA monitor or USB keyboard because they’re needed only when you log in to the gateway system console for administrative purposes.
Connect another computer to the same LAN to allow users to log in to the gateway over the wired network.

Note: The VGA monitor and USB keyboard connections are useful for debugging gateway startup issues or logging in to the gateway system console during development.

For more detailed information about the gateway and setup options, see the Getting Started with Intel® IoT Gateways with Intel® IoT Developer Kit 3.5.

Connect a Sensor to the Gateway

There are several ways to connect sensors to the gateway; the options depend on the specific gateway model you’re using. For the Intel® NUC Kit DE3815TYKHE gateway, you’ll connect an OMEGA* RH-USB Temperature and Humidity sensor to one of the gateway’s USB ports, as shown in Figure 5. The RH-USB sensor measures temperature ranging from 1°F to 120°F, and relative humidity ranging from 2 percent to 98 percent. It works as a serial TTY device through the gateway’s USB port and reports temperature and humidity as an ASCII string.

Figure 5. OMEGA RH-USB sensor connected to the gateway’s USB port

Power up the gateway by pressing its power button until the light comes on. The gateway will use Dynamic Host Configuration Protocol (DHCP) to obtain an IP address from the wired Ethernet LAN.

Log In to the Intel® IoT Gateway Developer Hub

The Intel® IoT Gateway Software Suite contains a built-in application called the Intel® IoT Gateway Developer Hub, which you use to view gateway status, configure gateway settings, and create sensor applications. The default settings allow you to access the hub over an Ethernet LAN connection at the following address:

Wired: http://<gateway IP assigned by Ethernet LAN DHCP server>

For example, if the gateway were assigned wired IP address 192.168.22.108, you would access the IoT Gateway Developer Hub at http://192.168.22.108.

Open a web browser, and enter the Intel® IoT Gateway Developer Hub URL. If your gateway is operating in demonstration mode, you may see a Privacy Statement that you need to acknowledge. At the login prompt, use the default user name (gwuser) and default password (gwuser) (Figure 6).

Figure 6. Log in to the Intel® IoT Gateway Developer Hub

If you see a License Agreement page, review the license agreement, and click Agree if you accept the terms.

Now you’re logged in to the Intel® IoT Gateway Developer Hub, as shown in Figure 7. From here, you can monitor the gateway and its attached sensors, see real-time sensor data, adjust gateway system administration settings, access documentation, and load new software packages.

Figure 7. From the Intel® IoT Gateway Developer Hub, you can monitor and adjust your gateway.

The status area (1) shows information about the gateway. The sensor widget area (2) shows real-time data from attached sensors. Here, the RH-USB temperature is displayed. In the command area (3), you can navigate among different functions in the gateway by clicking their icons:

Sensors – display the sensor dashboard.
Packages – install or update software packages on the gateway.
Administration – access operating system commands and additional configuration tools.
Documentation – tutorials, guides and best practices.

Note: The RH-USB temperature sensor will appear as Linux TTY device /dev/ttyRH-USB on the gateway, which is a symbolic link to the physical USB TTY port into which the sensor is plugged—for example, /dev/ttyUSB0. If you plug in other USB devices, the physical TTY port number might shift, so this is the first thing to check if you don’t see the sensor in the Intel® IoT Gateway Developer Hub sensor list.

When you first log into the gateway, there may be updated software packages available. To check for updates, click on the Packages icon and look at the Install Updates button. If it shows updates available, click Install Updates to download and install the updates. It may take some time depending on how many updates are available. After installing the updates, reboot the gateway by clicking Administration and then Restart OS.

Develop a Sensor Application

To program the gateway, you can write code in Python*, Node.js*, and other languages and run it directly on the gateway’s Linux* operating system. However, the gateway also contains a powerful software environment called Node-RED*, which provides a graphical environment for creating, testing, and deploying IoT applications. Node-RED is an open-source, visual tool for wiring together hardware devices, application programming interfaces (APIs), and online services in new and interesting ways. It is written in Node.js, and automatically runs when the gateway starts. For detailed documentation about Node-RED, see the Node-RED Documentation.

To access Node-RED, navigate to the Intel® IoT Gateway Developer Hub’s main portal by clicking the Sensors icon, and then clicking Manage Sensors. Doing so takes you to the Node-RED canvas, where you’ll see a default starter flow for the RH-USB sensor.

A Node-RED application consists of nodes that perform specific operations and wires that connect nodes together to form end-to-end flows. In the web-based user interface (UI), drag and drop the nodes and wires on the canvas using drag-and-drop actions. There are three main types of nodes:

Input nodes– read data and generate events.
Output nodes– write data and allow debugging.
Function nodes– process or transform data and events.

Other types of nodes are used for special purposes, such as email and Twitter integration and disk file I/O. There’s also a large ecosystem of community-developed nodes that you can add to the Node-RED environment to support other types of sensors, send and receive data to cloud services, and perform various processing algorithms. See the Node-RED Library for examples of community-developed nodes.

The gateway also contains built-in support for MQTT, which is an industry-standard machine-to-machine IoT connectivity protocol that follows the Publish–Subscribe architectural pattern. As part of that support, there are built-in sensor data display widgets that you can wire into your gateway Node-RED flow to display data in the Intel® IoT Gateway Developer Hub portal. They display data published to the gateway’s /sensors MQTT topic. To learn how to set up a connection to IBM Bluemix* using MQTT, see Connecting to IBM Bluemix* Internet of Things Using MQTT.

Much has been written about Node-RED, and the Intel® IoT Developer Zone has a step-by-step video to help you get the RH-USB temperature and humidity sensor running on the gateway. Follow the steps described in the video. After completing it, you should see both temperature and humidity displayed on the Intel® IoT Gateway Developer Hub portal (Figure 8).

Figure 8. Temperature and humidity displayed on the Intel® IoT Gateway Developer Hub portal.

If you heat or cool the RH-USB temperature probe, you should see the temperature increase or decrease in real time based on the default 1-second sampling rate.

Note: The default RH-USB sampling rate is controlled by the Inject input node labeled Interval in the Node-RED flow. Double-click that node to see its parameters and optionally change the rate. If you make any changes, be sure to click Deploy to deploy your modified flow to the gateway.

Here’s a brief explanation of how the flow works:

The Interval node generates a trigger message once per second that flows into the RH-USB node.
The RH-USB node is a subflow that transmits an ASCII command string PA to the RH-USB sensor that’s connected to the /dev/ttyRH-USB serial port on the gateway. The RH-USB sensor returns an ASCII response string over the serial port that’s parsed into separate temperature and humidity values. The separate values flow out of two different output ports of the RH-USB node.
The temperature value is converted from Fahrenheit to Celsius through a function node. The function node contains a JavaScript* code snippet that performs the conversion calculation.
The temperature and humidity values flow into chart formatting nodes that set display ranges and attributes for the gateway’s charting widgets.
The Chart node publishes the formatted sensor data along with the chart display attributes to the gateway’s MQTT /sensors topic, which causes the values to appear on the Intel® IoT Gateway Developer Hub portal.

Notes:

To change the temperature display from a gauge-based to a time-based line graph, double-click the Temperature node, and change the Type to Line instead of Gauge. Save the change, and deploy the flow; then, you’ll see a line graph for the temperature.
Function nodes allow you to write arbitrary processing functions in JavaScript. For details about writing your own functions, see the Node-RED Documentation.

Debug Node-RED Flows

An output node named debug is particularly useful for debugging flows (Figure 9).

Figure 9. The debug Node-RED output node.

When you drag the debug node onto the canvas, anything wired into that node will have each data value displayed on the debug tab of the Node-RED canvas. If the debug tab isn’t visible, you can toggle it on or off by turning the Sidebar on or off (Figure 10).

Figure 10. Toggle the Node-RED sidebar on or off.

Notes:

Turn debugging output on or off by clicking the area on the right side of each debug node. Each time you click it, the debug output is toggled on or off.
Clear existing debug messages by clicking the trash can icon on the debug tab.
You can use multiple debug nodes in the same flow, each with different connections.
Use the Comment node to add comments to your flows. Comment nodes aren’t wired to anything, they only contain text. Use them to describe how your flow works.

Role of the Gateway

At this point, the gateway is measuring live sensor data from the RH-USB temperature and humidity probe once per second. The values are processed and formatted to the desired units, and the values are published to the gateway’s MQTT broker, where they’re displayed in real-time on the Intel® IoT Gateway Developer Hub portal.

If you needed to take additional temperature readings—for example, in a different location, you could add another RH-USB probe and expand the Node-RED flow to measure it, as well. The power of using a gateway is that the gateway can connect groups of sensors, process and combine the raw data locally, and then send the data somewhere else for further processing or long-term storage, which is the focus of the next section.

Part 2: Connecting to the IBM Watson* IoT Platform

In this section, you connect the Intel® IoT Gateway to the IBM Watson* IoT Platform and send sensor data to the platform for processing in the cloud. The IBM Watson IoT Platform is a cloud-based environment that lets you design, build, scale, and operate production IoT solutions. It consists of a set of services that you configure and connect in building-block fashion. The Watson IoT Platform services are hosted in the Bluemix* cloud, which means that they can also take advantage of the multitude of other cloud services available in Bluemix.

You can use the IBM Watson IoT Platform in two basic modes: Quickstart and Registered. For details about these modes, see the IBM Watson IoT Platform Documentation.

Connect to the IBM Watson* IoT Platform in Quickstart Mode

Quickstart is an open sandbox that allows you to quickly connect a device and view live sensor data. It’s free, and you don’t have to sign up for anything. You can use either real or simulated devices as data sources: No data is retained over time. Quickstart is useful for getting an initial device connected to the IBM Watson* IoT Platform as you’re ramping up your knowledge of it.

There are several options for connecting the gateway to the IBM Watson IoT Platform, including writing programs in Python* or Node.js*, and creating flows in Node-RED. In this section, you build your Node-RED implementation from the previous section, and use that method to connect the gateway to the IBM Watson IoT Platform.

Start by connecting the gateway to the IBM Watson IoT Platform Quickstart service. To do that, you need a new type of node that knows how to connect to the IBM Watson IoT Platform. An open source Node-RED package named node-red-contrib-ibm-watson-iot does exactly that.

To install the Node-RED components for IBM Watson IoT, add a new package repository to the gateway and install a package from that repository by following these steps:

Log into the gateway either from another computer on the Ethernet LAN network or by using a keyboard and VGA monitor connected directly to the gateway. For the Ethernet LAN method, if the gateway’s IP address was 192.168.22.100 the login command would be ssh root@192.168.22.100. Use the correct IP address for your gateway.
For the login username use root and the default password root.
Run the following command: rpm --import http://iotdk.intel.com/misc/iot_pub.key
After the command completes you can log out of the gateway command line shell.

The purpose of this command is to install a public key for the Intel software package repository used in the next step.

In the IoT Gateway Developer Hub web interface, perform the following steps:

Click on the Packages icon and then click the Add Repo button.
Under Add New Repository, set the Name field to IoT_Cloud and the URL field to http://iotdk.intel.com/repos/iot-cloud/wrlinux7/rcpl13/. Leave the Authentication information blank.
Click the Add Repository button to add the new repository.

After adding the IoT_Cloud repository it should be listed in the Existing Repositories list (Figure 11). Click X in the upper right corner to close the Manage Repositories dialog box.

Figure 11. IoT_Cloud repository added to the gateway.

Next, install the Node-RED Watson IoT packages by performing these steps:

Click on the Packages icon and then click the Add Packages button.
In the search field, type packagegroup-cloud-ibm which should display the package with that name.
Click Install to install the packagegroup-cloud-ibm package. After the installation completes, click X to exit the package installation dialog box.
Scroll through the package list to confirm node-red-contrib-ibm-watson-iot shows on the list of installed packages.

Notes:

The package installation will fail if the repository public key has not been installed.
Next, find the node-red-experience package in the list and click Stop. Then click Start to start it again. The Stop/Start cycle is needed to pick up the newly installed Node-RED package.
Click the Sensors icon and then click Manage Sensors. Along the left side you should now see two new types of nodes called Watson IoT: one in the Input section, and one in the Output section. Use the scrollbars to scroll through the list of available nodes. The new nodes allow you to connect to the IBM Watson IoT Platform. Clicking a node in Node-RED displays it’s documentation on the info tab:

Input Node
Output Node

The output node can send data to the IBM Watson IoT Platform, and the input node can receive data from the IBM Watson IoT Platform. You’ll use the output node for your Quickstart connection.

Use your mouse to drag the IBM Watson IoT Platform output node onto the Node-RED canvas. Its name will automatically change to IBM IoT Device. Double-click the node to display its properties. The default values will look like the left panel in Figure 12, with the Quickstart ID being randomly generated. Change the values to look like the right panel (set Name to Send to Watson IoT Quickstart) and then click Ok.

Figure 12. Customize the input and output node values.

Now, add a wire from the F to C node output to the Send to Watson IoT Quickstart node input. The result should look like Figure 13.

Figure 13. Add a wire from the F to C node to the Send to Watson IoT Quickstart node.

Now, click Deploy to deploy your modified flow to the gateway. Then, double-click the Send to Watson IoT Quickstart node. To the right of the Quickstart Id field, you’ll see a small square button (Figure 14). Click the button to open a new browser window for the Quickstart dashboard, with your randomly generated device ID automatically filled in.

Figure 14. Click the button beside Quickstart Id to view live temperature data in your web browser.

You should see live temperature data in the dashboard (Figure 15), with a time history graph that updates itself once per second. Your gateway has successfully made its first connection to the IBM Watson IoT Platform.

In the Quickstart dashboard in Figure 15, the upper area (1) shows the time history of the received data, and the lower area (2) shows the most recent event data the platform received.

Figure 15. Viewing time history and most recent event data in the Quickstart dashboard.

Connect to the IBM Watson* IoT Platform in Registered Mode

Registered mode is the next step up from Quickstart. You register for an account on Bluemix* and set up an instance of the IBM Watson* IoT Platform running on Bluemix. There are free and paid tiers; which you choose generally depends on the data and processing volumes and the number of devices in your application. For this guide you’ll use the free tier, because the number of devices and amount of data will be relatively small.

Registered mode is what you use to create production IoT applications. Create an instance of the IBM Watson IoT Platform running on Bluemix where you can define your device and accumulate data. You’ll also create an instance of IBM Watson IoT Real-Time Insights, which allows you to enrich and monitor data from your devices, visualize what’s happening, and respond to conditions through automated actions.

If you don’t already have a Bluemix account, creating one is the first step. On the IBM Watson IoT Platform Quickstart page, scroll to the bottom and look for the Sign Up button. Alternatively, follow this link.

Bluemix currently has both a “Classic” UI and a “New” UI. This guide uses the New UI, and you can switch to it by clicking Try the new Bluemix. Keep in mind that Bluemix is a dynamic environment, and the UI will evolve over time.

To create a new Bluemix account, click Sign Up, enter the required information, and then click Create Account. Additional information and how-to guides are available on the Bluemix site. The signup process creates an IBM* ID that you use to log in to Bluemix.

Log in to Bluemix by using your newly created account. You should see a welcome page that lists a variety of service categories, including Internet of Things. Click Internet of Things: You should see several available IoT services, including the IBM Watson IoT Platform and IoT Real-Time Insights.

Click the plus sign (+) to add a new service to your Bluemix account, then click Internet of Things Platform. Select the free plan, and then click Create. Bluemix provisions your Internet of Things Platform service; when it’s ready, it appears as a tile in your Bluemix dashboard.

Now, create an instance of the IoT Real-Time Insights service, and select the free plan. When it’s provisioned, your Bluemix dashboard should show both service tiles (Figure 16).

Figure 16. The IBM Watson* IoT Platform Analytics Real-Time Insights Bluemix dashboard.

Note: In Bluemix, there’s a main account-level console and individual service dashboards for the various services you add to your account. The process of instantiating a service and adding to your account is called provisioning. After a service is provisioned, it appears as a new tile in your main Bluemix console. The Bluemix console and individual service dashboards will have different URLs. The top-level Bluemix console URL may look something like “https://new-console.ng.bluemix.net” and the Watson IoT Platform console may look like “https://xxxxxxx.internetofthings.ibmcloud.com/dashboard/”. You can bookmark the console and dashboards to make them easier to access.

Now, connect the two services so the IoT Real-Time Insights service can get data from the IBM Watson IoT Platform service. The process is described in the IoT Real-Time Insights documentation and is summarized below:

In the IBM Watson IoT Platform dashboard, generate a new API key. Write down the API key and authentication token. (You can only view the authentication token when it’s first created.)
In the IoT Real-Time Insights dashboard, add a new data source, and provide the API key information from step 1.
The first time you complete this step, it may take some time during which the browser will be spinning or appear to hang. Give it plenty of time to finish. When setting up the new data source, you enter a name (for this example, name it Intel IoT Gateway), your IBM Watson IoT Platform Organization ID (shown in the Configuration Settings screen of the IBM Watson IoT Platform), and the API key and authentication token created in step 1.
Next, connect the Intel® IoT Gateway to the IBM Watson IoT Platform service as a registered device. Navigate to the Internet of Things Platform service in Bluemix, and launch the dashboard. Along the left side, you see the Boards, Devices, Access, Usage, and Settings icons. Click Devices, and then click Add Device.
Because this is the first time you’re adding a new type of device, you first need to create a new type of device for the Intel® IoT Gateway. If you click the Choose Device Type list, you’ll see that it’s empty because you haven’t created any device types, yet.
Click Create device type, and then choose Create gateway type. In the IBM Watson IoT Platform, there are two categories of connected devices:

Device – This is an individual device in the IBM Watson IoT Platform, such as a temperature sensor. A device may or may not have an IP address and the ability to communicate over the Internet.
Gateway – This type of device has additional devices connected to it in a hierarchical or cluster fashion—for example, an Intel® IoT Gateway with two sensors connected to it. Gateways are generally able to communicate over the Internet using IP protocols.
The difference between the device and gateway device types is that for a device, you perform operations directly between the cloud and the device, whereas for a gateway, you perform operations between the cloud, the gateway itself, and the devices connected to the gateway. In essence, the gateway can act as both a pass-through to the devices and as an intermediate device that can perform device data processing before sending data to the cloud.
For details about devices and gateways, see IBM Watson IoT Platform Concepts.
The Create Gateway Type function is a set of step-by-step dialog boxes. Enter the following settings as you step through:
Name: Intel-IoT-Gateway
Description: The Intel gateway
<Next>
Manufacturer: Checkmarked
<Next>
Manufacturer: Intel
<Next>
Leave the Metadata (optional) field blank
<Next>
<Create>

Now that you’ve created a new Gateway type for the Intel® IoT Gateway, it will appear in the list under Add Device.

Continuing in the Add Device dialog box, choose Intel-IoT-Gateway from the Device Type list, and then click Next. In the Device Info dialog box, create a unique Device ID for the gateway. Use the gateway’s Ethernet MAC address. Enter your gateway’s Ethernet MAC address (including the leading zeros) in the Device ID field, and then click Next. Leave the optional metadata blank, and then click Next. In the Security dialog box, click Next to use an auto-generated authentication token.

On the confirmation page, review all the information. If everything is correct, click Add. If you need to change anything, click Back.

When the gateway device has been added, you get an information screen that contains identifiers and authentication credentials. Be sure to save these credentials somewhere safe, because they can not be recovered later. If you misplace them, you’ll have to delete the device and recreate it. Keep in mind that these credentials are sensitive information, so protect them against unauthorized use.

When you’re finished looking at the new gateway information and printing or saving it, exit the dashboard by clicking the small circled X in the upper right corner.

Now, the gateway appears in the Devices list in the IBM Watson IoT Platform dashboard (Figure 17). Note that your Device ID will be different based on the MAC address of your gateway.

Figure 17. Your device in the IBM Watson* IoT Platform dashboard.

Now that the gateway has been defined in the IBM Watson IoT Platform, you can connect to it from the Node-RED flow on the gateway. Going back to the previous Node-RED flow, drag another instance of the Watson IoT output node onto the canvas, and set its properties as shown in Figure 18, being sure to use the correct Device ID for your device. For the Credentials settings, you need the authentication token that you saved when you added the gateway as a new device in the IBM Watson IoT Platform. You also need your Organization ID, which you can find in the IBM Watson IoT Platform Configuration Settings page.

Figure 18. Adding a second instance of the Watson IoT output node to your Node-RED canvas.

The Watson IoT node expects data in a certain format, so next you add a Function node to format the temperature readings in the correct format. Drag a Function node onto the canvas, double-click it, set the name to Format Temperature, and add the JavaScript* code shown in Figure 19. Click Ok when you’re done adding the code.

Figure 19. Configure a Function node.

Wire the output of F to C to the input of Format Temperature and the output of Format Temperature to the input of Send to Watson IoT Registered. Then, deploy the changed flow. The resulting flow should look like Figure 20.

Figure 20. The changed flow in Node-RED.

At this point, live temperature data is being sent from the gateway to both the IBM Watson IoT Platform Quickstart and Registered destinations.

In the IBM Watson IoT Platform dashboard, go to the Devices list and click on your gateway device. You should see a set of Recent Events (changing once per second) and a Temperature reading under Sensor Information. These are the real-time readings that the gateway sends and the IBM Watson IoT Platform receives. Click one of the events under Recent Events: The event data should look something like Figure 21.

Figure 21. Viewing your event data under Recent Events

This is IBM Watson IoT Platform event JavaScript Object Notation (JSON) that contains a temperature key and corresponding numeric value. It corresponds to the JavaScript code in the Format Temperature node on the gateway:

return { payload:
{ temperature: msg.payload }
};

On the IBM Watson IoT Platform dashboard, navigate to the Boards > All Boards page. The IBM Watson IoT Platform lets you create one or more dashboards (called Boards for short) that display real-time information from your devices. Click on the Usage Overview board, which is a built-in default dashboard. The Intel-IoT-Gateway device should show along with basic data transfer statistics.

Next, add a live display of the temperature value to the Usage Overview board. Navigate to the Settings page, which will display at set of Configuration Settings. Under the General section, turn on Experimental Features, then click Confirm all changes. This allows you to add new information cards to your boards. Cards are configurable display widgets that show different types of information on IBM Watson IoT Platform dashboards. Navigate back to the Usage Overview board and you should now see an Add New Card button at the top right of the page.

IBM Watson IoT Platform Experimental Features are described in the IBM Watson IoT Platform blog post User Interface Experimental Features.

Click Add New Card. In the dialog box that opens, scroll down and click Show more to see all card types. In the Devices section, click Gauge. Choose Intel-IoT-Gateway as the card data source, and then click Next. Click Connect new data set, and then set the following data set fields:

Name: Gateway Temp
Event: event (pick from menu)
Property: temperature (pick from menu)
Type: Float (pick from menu, scroll down to see all options)
Unit: °C (pick from menu)
Precision: 1
Min: 0
Max: 40

Click Next, set size to M (medium), and then click Next. Set Title to Temperature, and then click Submit.

You should now see a new widget on your IBM Watson IoT Platform dashboard that shows a live display of the RH-USB sensor connected to the gateway. If you warm or cool the sensor, the temperature value should increase or decrease in real time on both the IBM Watson IoT Platform dashboard and the gateway’s IBM IoT Gateway Developer Hub display. Figure 22 shows the IBM Watson IoT Platform dashboard with the temperature card added.

Figure 22. The IBM Watson* IoT Platform dashboard with a temperature card added.

Summary and Next Steps

At this point, you have an end-to-end running system that collects real-time sensor measurements, processes and displays the data locally on the Intel® IoT Gateway, and sends the data to the IBM Watson* IoT Platform running on Bluemix. The gateway is programmed using Node-RED, which provides an easy-to-use tool for wiring together processing flows and adding sensors or processing algorithms.

You could add the Arduino 101* and Grover* Starter Kit Plus – IoT Edition to this environment to prototype additional sensors and actuators by connecting the Arduino 101* to the gateway using USB and adding additional types of Node-RED processing nodes for the Grove components.

You’ve also added IoT Real-Time Insights to your Bluemix* environment and connected it to the IBM Watson* IoT Platform service. IoT Real-Time Insights allows you to create if-then logic to monitor sensor data and trigger actions when certain conditions are met—for example, monitoring temperature readings and generating an alert if they exceed a defined range.

Industrial Use Case and Tutorial: Intel and the IBM Watson* IoT Platform builds on and extends the foundation established in this guide. The Industrial Use Case develops a sensing application that continuously measures temperature and humidity, and generates email and text message alerts when parameters fall outside acceptable ranges.

Having the gateway integrated with the IBM Watson IoT Platform on Bluemix opens a range of other possibilities for storing, processing, and visualizing data using a variety of database and application services available on Bluemix. The Intel IoT Gateway coupled with the IBM Watson IoT Platform on Bluemix provides a robust and powerful environment for building and operating scalable IoT applications.

↧

Introducing the Intel® Joule™ Module

August 17, 2016, 10:32 am

Latest and popular articles on Intel Technologies

≫ Next: Firmwave’s Success Story for Solar-Powered Outdoor Signage

≪ Previous: Connecting to the IBM Watson* IoT Platform with Intel® IoT Gateway Software Suites

Yesterday our CEO, Brian Krzanich, announced the newest hardware in our lineup of products for developers and entrepreneurs, the Intel^® Joule™ platform. As the General Manager of the team that spent the past year conceptualizing, designing, and delivering the product, I’d like to tell you a little about why I feel this platform is so significant, and what impact I think it could have. At a time when small players–-start-ups and IoT entrepreneurs-– are making huge impacts in all sorts of industries, there are still significant barriers for hardware innovation.

Our goal with the Intel Joule platform is to enable these tech developers with the tools they need to quickly and easily prototype and productize concepts. With that in mind, we are bringing to market a platform that opens the doors of possibilities for applications using computer vision, machine learning, and robotics.

Years ago, Andy Grove forecast an era of ubiquitous computing. The Intel Joule platform, with its tiny form factor, miniscule mass, and powerful performance, is the next big leap in putting performance in places that it didn’t easily fit before. In developing the Intel Joule platform, we had a lot to consider. We listened to our user community and designed this module with all the feedback from Intel® Edison module in mind.

We wanted to create a hardware platform that is so powerful and feature-rich that you can create applications that weren’t previously possible. For that reason, we decided to integrate the latest in Intel® processor technology and design into an SoC, which means developers get a platform that idles at very low power consumption, but also can handle a very intense workload when needed.

We decided to continue with the modular architecture, since we feel that is the best way to allow developers to quickly get their ideas to market. Developers can start by building a prototype with one of our Intel Joule development platforms, then take the same module and outsource a custom carrier board. Using this approach, developers avoid most of the expense and the effort with traditional product design. We want developers to concentrate on identifying their opportunity, writing their code, and getting their product to market.

The pre-certification that comes with the module means that instead of spending their time and precious resources designing a custom board and going through the full certification process for Wi-Fi* or Bluetooth* products, developers can focus on improving their product. We are also working to develop an ecosystem of expansion boards and partners to enable rapid development in a variety of industries. We invite you to participate and contribute to the fleshing out of this ecosystem with modems, FPGA extensions, and whatever new ideas you’ll come up with.

The Intel Joule module offers a robust software stack, including a pre-installed Linux-based OS tailored for IoT based on the Yocto* Project, with support for the Intel® IoT Developer Kit sensor libraries available now. Windows* 10 IoT Core and Snappy Ubuntu Core*, along with Intel® RealSense™ software support will be coming later this year. Developers have access to the Intel® System Studio IoT Edition for C++ and the Intel® XDK for JavaScript development. For those who are familiar with the Intel Edison platform, porting code over should be straightforward.

Now more than ever, we are at the cusp of a great period of innovation in a myriad of fields. The Intel Joule module opens up new possibilities. What will you create?

Sales are live from our reseller partners NewEgg and Mouser, with others coming online as production ramps over the next few weeks.

↧

Firmwave’s Success Story for Solar-Powered Outdoor Signage

August 20, 2016, 3:48 pm

Latest and popular articles on Intel Technologies

≫ Next: ipo: warning #11021: unresolved symbols referenced a dynamic library

≪ Previous: Introducing the Intel® Joule™ Module

Firmwave® Edge

Firmwave® Edge intelligent sensor platform powered by the latest Intel Quark™ MCUs enables ease of maintenance in remotely located solar-powered outdoor advertising signage that delivers low-power, wireless sensor network connectivity and secure data collection across IoT gateways and networks.

Challenge

Gaining access to remotely located and inaccessible outdoor advertising signage, such as billboards, was proving inefficient and costly for Solar AdTek, a leading provider of solar control units and LED lighting for on and off-grid outdoor advertising display illumination. Just sending technicians to these often remote locations to download device logs, re-programme or reconfigure devices or conduct routine troubleshooting and maintenance was a time-consuming and costly practice.

Solution

Firmwave® Edge intelligent sensor platform is a suite of customizable, pre-validated hardware and firmware modules for accelerating the design, development, and deployment of the next wave of tiny, intelligent wireless sensors. CombiningFirmwave® Edge Cellularwith the flexibility of Vodafone GDSP and the power of Intel Quark MCUs enabled Firmwave to deliver a low-cost, low power smart solar control unit that can be quickly and easily deployed and connected to the cloud to deliver remote device monitoring and control and advanced Edge analytics.

"The choice of using Intel Quark products as part of our solution for Solar AdTek was an easy one. Our solutions demand robust, scalable and secure hardware platforms. Intel® Quark™ MCUs provide us with low power consumption and the robust analytics engine we need to optimise our products and deliver versatile and reliable wireless sensor network solutions that ensured we could deliver the right solution to meet Solar AdTek’s particular needs for remote device management and control." Fintan Mc Govern, CEO at Firmwave Ltd.

Simpler, Smarter Solutions

There are many challenges and complexities facing customers wanting to deploy smart connected devices in a secure and cost-effective manner. Not least among these is having expertise in connectivity, hardware and firmware as well as the complexity of network commissioning, security concerns, maintenance and remote device management. An end-to-end solution that cuts out expensive bespoke development of both hardware and firmware and allows the customer to quickly move from prototype to a robust and scalable product is the driving force behind Firmwave® Edge. Firmwave® Edge sensors also support tiny operating systems including Zephyr™, which is a small open source, scalable real-time operating system.

Working together, Solar AdTek and Firmwave, have designed a low-cost, low-power smart solar control unit based on Firmwave Edge’s modular hardware and firmware architecture powered by Intel Quark MCUs. These modules radically reduce the need for hardware and firmware customization and enable Solar AdTek to securely collect a wide range of sensor data from temperature and humidity to breakage detection as well as battery management data and advertisement viewer numbers. The ultra-low power Edge sensors leverage the Quark on-board sensor subsystem allowing decisions to be made at the Edge simplifying integration and reducing the dependency on larger, costlier gateways. Advanced sensor data and Edge analytics provide Solar AdTek with valuable business insights to include passing traffic and advertising viewer numbers. Monitoring of operational efficiencies as well as remote device maintenance and manageability were also key requirements for this project. Based on Firmwave Edge Cellular reference design powered by Intel Quark MCU and leveraging the Vodafone GDSP platform, Solar AdTek has achieved a platform that allows them to realise significant time and cost savings by enabling remote access to device logs, device configuration and firmware upgrade capabilities. Now when a device or lighting failure occurs Solar AdTek maintenance teams can be alerted immediately and quickly address the problem thus reducing the need for routine on-site visits.

Reliable, versatile solutions for smart solar control units

Firmwave successfully designed a globally deployable solution for Solar AdTek for their smart connected solar solutions for outdoor advertising signage based on the Firmwave Edge Cellular reference design. Key to the success of this project was the practicality of allowing remote devices to be maintained and controlled over a globally dispersed network of devices and to reliably collect sensor data at the Edge and transmit this securely over the network.

Strong supporting relationships

Through a close collaboration with Intel and strong supporting channels, Firmwave were well positioned to design one of the first customisable off the shelf solutions suitable for large scale wireless sensor network deployments. In March 2016, Firmwave launched Firmwave Edge, the first Edge sensor network platform based on Intel® Quark™ (view full press release) and has expanded on the range by launching the Edge 2.0 and Edge 3.0 series. Visit www.firmwave.com for more information.

↧

ipo: warning #11021: unresolved symbols referenced a dynamic library

August 22, 2016, 12:19 am

Latest and popular articles on Intel Technologies

≫ Next: Connecting the Intel® IoT Gateway to Microsoft* Azure*

≪ Previous: Firmwave’s Success Story for Solar-Powered Outdoor Signage

Reference number: DPD200411050

Product versions:
Intel® Parallel Studio XE 2015

OS affected:
Linux*

Problem description:

During the final linkage stage of an application built with option "-ipo" there may be warning messages like:

ipo: warning #11021: unresolved hwloc_get_obj_by_depth
Referenced in /usr/lib64/openmpi/lib/libmpi.so
ipo: warning #11021: unresolved hwloc_topology_get_depth
Referenced in /usr/lib64/openmpi/lib/libmpi.so

All the unresolved symbols in the #11021 warning messages are from dynamic libraries which are not explicitly provided in the linkage command line by option "-l". They are indirectly referenced by the application through another dynamic library. The executable still can be generated and run.

Solution:

This issue is known, and we are working on fixing it. When the fix is available, this article will be updated with the information.

A workaround to eliminate those warning message is explicitly adding the dynamic library containing the unresolved symbols to the command line by option "-l".

↧

Connecting the Intel® IoT Gateway to Microsoft* Azure*

August 22, 2016, 9:24 am

Latest and popular articles on Intel Technologies

≫ Next: Get YOUR game analyzed at Austin Games Conference!

≪ Previous: ipo: warning #11021: unresolved symbols referenced a dynamic library

This guide will walk you through adding the IoT Cloud repository to your Intel® IoT Gateway and adding support for Microsoft* Azure* so you can begin developing applications for this platform in your programming language of choice.

Prerequisites

Intel® IoT Gateway Technology running IDP 3.1 or above with internet access
A development device (e.g. laptop) on the same network as the Intel® IoT Gateway
Terminal access to the Intel® IoT Gateway from your development device
Microsoft Azure account: https://portal.azure.com/

Please see the following documentation for setting up your Intel® IoT Gateway: https://software.intel.com/en-us/node/633284

Adding the IoT Cloud repository to your Intel® IoT Gateway

1. Access the console on your gateway either using a monitor and keyboard connected directly or SSH (recommended).

2. Add the GPG key for the cloud repository using the following command:

rpm --import http://iotdk.intel.com/misc/iot_pub.key

3. On your development device (e.g. laptop), open a web browser and load the IoT Gateway Developer Hub interface by entering the IP address of your gateway in the address bar.

Tip: You can find your gateway’s IP address using the ‘ifconfig’ command.

4. Log in to the IoT Gateway Developer Hub interface using the credentials root:root.

IoT Gateway Developer Hub Login dialog box

5. Add the IoT Cloud repository.

a. Go to the Packages section and click the Add Repo + button.

Populate the fields with the following information and click Add Repository:

Name: IoT_Cloud

URL: http://iotdk.intel.com/repos/iot-cloud/wrlinux7/rcpl13

Finally, click the Update Repositories button to update the package list.

**Adding Microsoft* Azure* support to your Intel® IoT Gateway**

1. Click the Add Packages + button to bring up the list of packages you can install.

2. Search for cloud-azure using the search box at the top of the package window. Click the Install button next to the packagegroup-cloud-azure entry.

**Setup an Azure* IoT Hub**

1. In a browser, navigate to the Azure* Portal at https://portal.azure.com and log in to your Azure account.

2. Create a new IoT Hub.

a. Select New in the top left of the Azure Portal. Select Internet Of Things from the list, and select ‘IoT Hub.

b. Give your new IoT Hub a unique Name and select which Pricing and scale tier you require.

c. In the Resource Group section select Create New and give it a unique name in the box provided.

d. Select the ‘Location’ nearest to you and click Create.

It will take some time to deploy your new Resource Group and IoT Hub, but eventually you should get a Deployments succeeded notification.

e. Once your new Resource Group has been created, navigate into it by selecting the Resource groups option in the left-hand panel and selecting the Resource Group you just created from the list.

f. Select the IoT Hub you just created from the list, then click the Keys icon. Select iothubowner from the policy list, then click the Copy button next to the Connection string-primary key to copy your IoT Hub connection string to the clipboard.

**Create a new Azure* IoT Device**

Tip: To make copying connection strings easier, it is recommended that you use SSH to connect to your gateway or access the command line through the Intel Developer Hub interface. If you are accessing the command line of your gateway directly using a keyboard and mouse, you will need to manually enter the connection string in the next section.

1. Enter the following in your gateway’s console to add it to your IoT Hub:

iothub-explorer "[YOUR CONNECTION STRING]" create IntelIoTGateway --connection-string

If the device is successfully added, you will get output similar to this:

2. Load the Node-RED interface. Go to the Administration section of the IoT Gateway Developer Hub and click Launch under the Node-RED icon.

3. Configure a Node-RED flow.

a. Drag an ‘inject’, ‘function’ and ‘azureiothub’ node from the nodes panel on the left into the current flow (you may need to scroll down in the nodes panel).Drag an ‘inject’, ‘function’ and ‘azureiothub’ node from the nodes panel on the left into the current flow (you may need to scroll down in the nodes panel).

b. Arrange and connect the nodes as in the screenshot above. Here we have an inject node which will send a trigger at a specified interval to a function which will randomly generate a number to send to your Azure IoT Hub. First we need to configure the nodes.

c. Double-click on the timestamp node to bring up the configuration dialogue. Change the settings so they match the screenshot below, and click Ok when done. This will set the node to send a trigger every 5 seconds.

d. Double-click on the ‘function’ node to bring up the configuration dialog box. Here we are going to add some simple code to generate a random number which we can send to our Azure IoT Hub. In reality this could be a sensor reading, for example.

e. In the ‘Function’ box enter the following code as in the screenshot below:

msg.payload = Math.round(Math.random() * 100);

Click the Ok button when done, to close the configuration dialog box.

f. Double-click the Azure IoT Hub node to bring up the configuration dialog box again. Add the device connection string you copied/saved earlier into the Connection String box and click Ok. (Make sure you enter the device connection string and not the one for your IoT Hub!)

4. Now your flow is configured you can deploy your flow. Click the Deploy button in the top right of the screen.

If everything is working correctly, the status of the Azure IoT Hub node should change to Connected and eventually Sent message, as in the screenshot below. A message will be sent to your Azure IoT Hub every five seconds while the flow is running.

If you navigate back to the Azure* Portal and load your IoT Hub instance, you will see the message count in the Usage tile increasing.

5. Monitor device events (Optional).

You can monitor devices in your IoT Hub using the iothub-explorer utility from the command line.

There is also a Windows utility called Device Explorer if you prefer a visual application: https://github.com/Azure/azure-iot-sdks/blob/master/tools/DeviceExplorer/doc/how_to_use_device_explorer.md

To monitor device events in the command line on your gateway, run the following command:

iothub-explorer "[YOUR IOT HUB CONNECTION STRING]" monitor-events [YOUR DEVICE NAME]

If the Node-RED flow is deployed and running you should see the random numbers being sent to Azure* in the console output similar to the screenshot below.

↧

Get YOUR game analyzed at Austin Games Conference!

August 22, 2016, 11:26 am

Latest and popular articles on Intel Technologies

≫ Next: Using Enclaves from .NET: Making ECALLS with Callbacks via OCALLS

≪ Previous: Connecting the Intel® IoT Gateway to Microsoft* Azure*

The Intel® Graphics Performance Analyzers team is offering hands on performance tuning for games at the Austin Game Developers Conference! Participants will have the opportunity to have their game analyzed by our performance optimization experts and Intel® GPA. Each session is a one on one 90 minute session to ask questions about performance, learn Intel® GPA and understand performance opportunities within YOUR game!

What you need to do:

Send an email to sierra.reid@intel.com if you are interested and to get more details.
Bring your application with you to the Austin Game Developers Conference. (.apk, .exe) If the game is available on steam, please provide a steam key before the event so we can download the game on the analysis system.

At the workshop:

Meet us to get your game ready to be analyzed by our experts. Together we will get the analysis system ready with your game and Intel® GPA.
Once the system is ready, our experts will analyze your game with you one on one for 90 minutes.
Once the 90 minutes are up, you will have a better understanding of your game’s performance and how to use Intel® GPA!

↧

Using Enclaves from .NET: Making ECALLS with Callbacks via OCALLS

August 23, 2016, 11:01 am

Latest and popular articles on Intel Technologies

≫ Next: Improve Video Quality, Build Extremely Efficient Encoders & Decoders with Intel® VPA & Intel® SBE

≪ Previous: Get YOUR game analyzed at Austin Games Conference!

One question about Intel® Software Guard Extensions (Intel® SGX) that comes up frequently is how to mix enclaves with managed code on Microsoft Windows* platforms, particularly with the C# language. While enclaves themselves must be 100 percent native code and the enclave bridge functions must be 100 percent native code with C (and not C++) linkages, it is possible, indirectly, to make an ECALL into an enclave from .NET and to make an OCALL from an enclave into a .NET object. There are multiple solutions for accomplishing these tasks, and this article and its accompanying code sample demonstrate one approach.

Mixing Managed Code and Native Code with C++/CLI

Microsoft Visual Studio* 2005 and later offers three options for calling unmanaged code from managed code:

Platform Invocation Services, commonly referred to by developers as P/Invoke
COM
C++/CLI

P/Invoke is good for calling simple C functions in a DLL, which makes it a reasonable choice for interfacing with enclaves, but writing P/Invoke wrappers and marshaling data can be difficult and error-prone. COM is more flexible than P/Invoke, but it is also more complicated; that additional complexity is unnecessary for interfacing with the C bridge functions required by enclaves. This code sample uses the C++/CLI approach.

C++/CLI offers significant convenience by allowing the developer to mix managed and unmanaged code in the same module, creating a mixed-mode assembly which can in turn be linked to modules comprised entirely of either managed or native code. Data marshaling in C++/CLI is also fairly easy: for simple data types it is done automatically through direct assignment, and helper methods are provided for more complex types such as arrays and strings. Data marshaling is, in fact, so painless in C++/CLI that developers often refer to the programming model as IJW (an acronym for “it just works”).

The trade-off for this convenience is that there can be a small performance penalty due to the extra layer of functions, and it does require that you produce an additional DLL when interfacing with Intel SGX enclaves.

Minimum component makeup of an Intel® Software Guard Extensions application written in C# and C++/CLI.

Figure 1Minimum component makeup of an Intel® Software Guard Extensions application written in C# and C++/CLI.

Figure 1 illustrates the component makeup of a C# application when using the C++/CLI model. The managed application consists of, at minimum, a C# executable, a C++/CLI DLL, the native enclave bridge DLL, and the enclave DLL itself.

The Sample Application

The sample application provides two functions that execute inside of an enclave: one calls CPUID, and the other generates random data in 1KB blocks and XORs them together to produce a final 1KB block of random bytes. This is a multithreaded application, and you can run all three tasks simultaneously. The user interface is shown in Figure 2.

Sample application user interface.

Figure 2:Sample application user interface.

To build the application you will need the Intel SGX SDK. This sample was created using the 1.6 Intel SGX SDK and built with Microsoft Visual Studio 2013. It targets the .NET framework 4.5.1.

The CPUID Tab

On the CPUID panel, you enter a value for EAX to pass to the CPUID instruction. When you click query, the program executes an ECALL on the current thread and runs the sgx_cpuid() function inside the enclave. Note that sgx_cpuid() does, in turn, make an OCALL to execute the CPUID instruction, since CPUID is not a legal instruction inside an enclave. This OCALL is automatically generated for you by the edgr8tr tool when you build your enclave. See the Intel SGX SDK Developer Guide for more information on the sgx_cpuid() function.

The RDRAND Tab

On the RDRAND panel you can generate up to two simultaneous background threads. Each thread performs the same task: it makes an ECALL to enter the enclave and generates the target amount of random data using the sgx_read_rand() function in 1 KB blocks. Each 1 KB block is XORd with the previous block to produce a final 1 KB block of random data that is returned to the application (the first block is XORd with a block of 0s).

For every 1 MB of random data that is generated, the function also executes an OCALL to send the progress back up to the main application via a callback. The callback function then runs a thread in the UI context to update the progress bar.

Because this function runs asynchronously, you can have both threads in the UI active at once and even switch to the CPUID tab to execute that ECALL while the RDRAND ECALLs are still active.

Overall Structure

The application is made up of the following components, three of which we’ll examine in detail:

C# application. A Windows Forms*-based application that implements the user interface.
EnclaveLink.dll. A mixed-mode DLL responsible for marshaling data between .NET and native code. This assembly contains two classes: EnclaveLinkManaged and EnclaveLinkNative.
EnclaveBridge.dll. A native DLL containing the enclave bridge functions. These are pure C functions.
Enclave.dll (Enclave.signed.dll). The Intel SGX enclave.

There is also a fifth component, sgx_support_detect.dll, which is responsible for the runtime check of Intel SGX capability. It ensures that the application exits gracefully when run on a system that does not support Intel SGX. We won’t be discussing this component here, but for more information on how it works and why it’s necessary, see the article Properly Detecting Intel® Software Guard Extensions in Your Applications.

The general application flow is that the enclave is not created immediately when the application launches. It initializes some global variables for referencing the enclave and creates a mutex. When a UI event occurs, the first thread that needs to run an enclave function checks to see if the enclave has already been created, and if not, it launches the enclave. All subsequent threads and events reuse that same enclave. In order to keep the sample application architecture relatively simple, the enclave is not destroyed until the program exists.

The C# Application

The main executable is written in C#. It requires a reference to the EnclaveLink DLL in order to execute the C/C++ methods that eventually call into the enclave.

On startup, the application calls static methods to prepare the application for the enclave, and then closes it on exit:

        public FormMain()
        {
            InitializeComponent();
            // This doesn't create the enclave, it just initializes what we need
            // to do so in an multithreaded environment.
            EnclaveLinkManaged.init_enclave();
        }

        ~FormMain()
        {
            // Destroy the enclave (if we created it).
            EnclaveLinkManaged.close_enclave();
        }

These two functions are simple wrappers around functions in EnclaveLinkNative and are discussed in more detail below.

When either the CPUID or RDRAND functions are executed via the GUI, the application creates an instance of class EnclaveLinkManaged and executes the appropriate method. The CPUID execution flow is shown, below:

      private void buttonCPUID_Click(object sender, EventArgs e)
        {
            int rv;
            UInt32[] flags = new UInt32[4];
            EnclaveLinkManaged enclave = new EnclaveLinkManaged();

            // Query CPUID and get back an array of 4 32-bit unsigned integers

            rv = enclave.cpuid(Convert.ToInt32(textBoxLeaf.Text), flags);
            if (rv == 1)
            {
                textBoxEAX.Text = String.Format("{0:X8}", flags[0]);
                textBoxEBX.Text = String.Format("{0:X8}", flags[1]);
                textBoxECX.Text = String.Format("{0:X8}", flags[2]);
                textBoxEDX.Text = String.Format("{0:X8}", flags[3]);
            }
            else
            {
                MessageBox.Show("CPUID query failed");
            }
        }

The callbacks for the progress bar in the RDRAND execution flow are implemented using a delegate, which creates a task in the UI context to update the display. The callback methodology is described in more detail later.

        Boolean cancel = false;
        progress_callback callback;
        TaskScheduler uicontext;

        public ProgressRandom(int mb_in, int num_in)
        {
            enclave = new EnclaveLinkManaged();
            mb = mb_in;
            num = num_in;
            uicontext = TaskScheduler.FromCurrentSynchronizationContext();
            callback = new progress_callback(UpdateProgress);

            InitializeComponent();

            labelTask.Text = String.Format("Generating {0} MB of random data", mb);
        }

        private int UpdateProgress(int received, int target)
        {
            Task.Factory.StartNew(() =>
            {
                progressBarRand.Value = 100 * received / target;
                this.Text = String.Format("Thread {0}: {1}% complete", num, progressBarRand.Value);
            }, CancellationToken.None, TaskCreationOptions.None, uicontext);

            return (cancel) ? 0 : 1;
        }

The EnclaveLink DLL

The primary purpose of the EnclaveLink DLL is to marshal data between .NET and unmanaged code. It is a mixed-mode assembly that contains two objects:

EnclaveLinkManaged, a managed class that is visible to the C# layer
EnclaveLinkNative, a native C++ class

EnclaveLinkManaged contains all of the data marshaling functions, and its methods have variables in both managed and unmanaged memory. It ensures that only unmanaged pointers and data get passed to EnclaveLinkNative. Each instance of EnclaveLinkManaged contains an instance of EnclaveLinkNative, and the methods in EnclaveLinkManaged are essentially wrappers around the methods in the native class.

EnclaveLinkNative is responsible for interfacing with the enclave bridge functions in the EnclaveBridge DLL. It also is responsible for initializing the global enclave variables and handling the locking.

#define MUTEX L"Enclave"

static sgx_enclave_id_t eid = 0;
static sgx_launch_token_t token = { 0 };
static HANDLE hmutex;
int launched = 0;

void EnclaveLinkNative::init_enclave()
{
	hmutex = CreateMutex(NULL, FALSE, MUTEX);
}

void EnclaveLinkNative::close_enclave()
{
	if (WaitForSingleObject(hmutex, INFINITE) != WAIT_OBJECT_0) return;

	if (launched) en_destroy_enclave(eid);
	eid = 0;
	launched = 0;

	ReleaseMutex(hmutex);
}

int EnclaveLinkNative::get_enclave(sgx_enclave_id_t *id)
{
	int rv = 1;
	int updated = 0;

	if (WaitForSingleObject(hmutex, INFINITE) != WAIT_OBJECT_0) return 0;

	if (launched) *id = eid;
	else {
		sgx_status_t status;

		status= en_create_enclave(&token, &eid, &updated);
		if (status == SGX_SUCCESS) {
			*id = eid;
			rv = 1;
			launched = 1;
		} else {
			rv= 0;
			launched = 0;
		}
	}
	ReleaseMutex(hmutex);

	return rv;
}

The EnclaveBridge DLL

As the name suggests, this DLL holds the enclave bridge functions. This is a 100 percent native assembly with C linkages, and the methods from EnclaveLinkNative call into these functions. Essentially, they marshal data and wrap the calls in the mixed mode assembly to and from the enclave.

The OCALL and the Callback Sequence

The most complicated piece of the sample application is the callback sequence used by the RDRAND operation. The OCALL must propagate from the enclave all the way up the application to the C# layer. The task is to pass a reference to a managed class instance method (a delegate) down to the enclave so that it can be invoked via the OCALL. The challenge is to do that within the following restrictions:

The enclave is in its own DLL, which cannot depend on other DLLs.
The enclave only supports a limited set of data types.
The enclave can only link against 100 percent native functions with C linkages.
There cannot be any circular DLL dependencies.
The methodology must be thread-safe.
The user must be able to cancel the operation.

The Delegate

The delegate is prototyped inside of EnclaveLinkManaged.h along with the EnclaveLinkManaged class definition:

public delegate int progress_callback(int, int);

public ref class EnclaveLinkManaged
{
	array<BYTE> ^rand;
	EnclaveLinkNative *native;

public:
	progress_callback ^callback;

	EnclaveLinkManaged();
	~EnclaveLinkManaged();

	static void init_enclave();
	static void close_enclave();

	int cpuid(int leaf, array<UINT32>^ flags);
	String ^genrand(int mb, progress_callback ^cb);

	// C++/CLI doesn't support friend classes, so this is exposed publicly even though
	// it's only intended to be used by the EnclaveLinkNative class.

	int genrand_update(int generated, int target);
};

When each ProgressRandom object is instantiated, a delegate is assigned in the variable callback, pointing to the UpdateProgress instance method:

    public partial class ProgressRandom : Form
    {
        EnclaveLinkManaged enclave;
        int mb;
        Boolean cancel = false;
        progress_callback callback;
        TaskScheduler uicontext;
        int num;

        public ProgressRandom(int mb_in, int num_in)
        {
            enclave = new EnclaveLinkManaged();
            mb = mb_in;
            num = num_in;
            uicontext = TaskScheduler.FromCurrentSynchronizationContext();
            callback = new progress_callback(UpdateProgress);

            InitializeComponent();

            labelTask.Text = String.Format("Generating {0} MB of random data", mb);
        }

This variable is passed as an argument to the EnclaveLinkManaged object when the RDRAND operation is requested:

        public Task<String> RunAsync()
        {
            this.Refresh();

            // Create a thread using Task.Run

            return Task.Run<String>(() =>
            {
                String data;

                data= enclave.genrand(mb, callback);

                return data;
            });
        }

The genrand() method inside of EnclaveLinkManaged saves this delegate to the property “callback”. It also creates a GCHandle that both points to itself and pins itself in memory, preventing the garbage collector from moving it in memory and thus making it accessible from unmanaged memory. This handle is passed as a pointer to the native object.

This is necessary because we cannot directly store a handle to a managed object as a member of an unmanaged class.

String ^EnclaveLinkManaged::genrand(int mb, progress_callback ^cb)
{
	UInt32 rv;
	int kb= 1024*mb;
	String ^mshex = gcnew String("");
	unsigned char *block;
	// Marshal a handle to the managed object to a system pointer that
	// the native layer can use.
	GCHandle handle= GCHandle::Alloc(this);
	IntPtr pointer= GCHandle::ToIntPtr(handle);

	callback = cb;
	block = new unsigned char[1024];
	if (block == NULL) return mshex;

	// Call into the native layer. This will make the ECALL, which executes
	// callbacks via the OCALL.

	rv= (UInt32) native->genrand(kb, pointer.ToPointer(), block);

In the native object, we now have a pointer to the managed object, which we save in the member variable managed.

Next, we use a feature of C++11 to create a std::function reference that is bound to a class method. Unlike standard C function pointers, this std::function reference points to the class method in our instantiated object, not to a static or global function.

DWORD EnclaveLinkNative::genrand (int mkb, void *obj, unsigned char rbuffer[1024])
{
	using namespace std::placeholders;
	auto callback= std::bind(&EnclaveLinkNative::genrand_progress, this, _1, _2);
	sgx_status_t status;
	int rv;
	sgx_enclave_id_t thiseid;

	if (!get_enclave(&thiseid)) return 0;

	// Store the pointer to our managed object as a (void *). We'll Marshall this later.

	managed = obj;

	// Retry if we lose the enclave due to a power transition
again:
	status= en_genrand(thiseid, &rv, mkb, callback, rbuffer);

Why do we need this layer of indirection? Because the next layer down, EnclaveBridge.dll, cannot have a linkage dependency on EnclaveLink.dll as this would create a circular reference (where A depends on B, and B depends on A). EnclaveBridge.dll needs an anonymous means of pointing to our instantiated class method.

Inside en_genrad() in EnclaveBridge.cpp, this std::function is converted to a void pointer. Enclaves only support a subset of data types, and they don’t support any of the C++11 extensions regardless. We need to convert the std::function pointer to something the enclave will accept. In this case, that means passing the pointer address in a generic data buffet. Why use void instead of an integer type? Because the size of a std::function pointer varies by architecture.

typedef std::function<int(int, int)> progress_callback_t;

ENCLAVENATIVE_API sgx_status_t en_genrand(sgx_enclave_id_t eid, int *rv, int kb, progress_callback_t callback, unsigned char *rbuffer)
{
	sgx_status_t status;
	size_t cbsize = sizeof(progress_callback_t);

	// Pass the callback pointer to the enclave as a 64-bit address value.
	status = e_genrand(eid, rv, kb, (void *)&callback, cbsize, rbuffer);

	return status;
}

Note that we not only must allocate this data buffer, but also tell the edgr8r tool how large the buffer is. That means we need to pass the size of the buffer in as an argument, even though it is never explicitly used.

Inside the enclave, the callback parameter literally just gets passed through and out the OCALL. The definition in the EDL file looks like this:

enclave {
	from "sgx_tstdc.edl" import *;

    trusted {
        /* define ECALLs here. */

		public int e_cpuid(int leaf, [out] uint32_t flags[4]);
		public int e_genrand(int kb, [in, size=sz] void *callback, size_t sz, [out, size=1024] unsigned char *block);
    };

    untrusted {
        /* define OCALLs here. */

		int o_genrand_progress ([in, size=sz] void *callback, size_t sz, int progress, int target);
    };
};

The callback starts unwinding in the OCALL, o_genrand_progress:

typedef std::function<int(int, int)> progress_callback_t;

int o_genrand_progress(void *cbref, size_t sz, int progress, int target)
{
	progress_callback_t *callback = (progress_callback_t *) cbref;

	// Recast as a pointer to our callback function.

	if (callback == NULL) return 1;

	// Propogate the cancellation condition back up the stack.
	return (*callback)(progress, target);
}

The callback parameter, cbref, is recast as a std::function binding and then executed with our two arguments: progress and target. This points back to the genrand_progress() method inside of the EnclaveLinkNative object, where the GCHandle is recast to a managed object reference and then executed.

int __cdecl EnclaveLinkNative::genrand_progress (int generated, int target)
{
	// Marshal a pointer to a managed object to native code and convert it to an object pointer we can use
	// from CLI code

	EnclaveLinkManaged ^mobj;
	IntPtr pointer(managed);
	GCHandle mhandle;

	mhandle= GCHandle::FromIntPtr(pointer);
	mobj= (EnclaveLinkManaged ^)mhandle.Target;

	// Call the progress update function in the Managed version of the object. A retval of 0 means
	// we should cancel our operation.

	return mobj->genrand_update(generated, target);
}

The next stop is the managed object. Here, the delegate that was saved in the callback class member is used to call up to the C# method.

int EnclaveLinkManaged::genrand_update(int generated, int target)
{
	return callback(generated, target);
}

This executes the UpdateProgress() method, which updates the UI. This delegate returns an int value of either 0 or 1, which represents the status of the cancellation button:

        private int UpdateProgress(int received, int target)
        {
            Task.Factory.StartNew(() =>
            {
                progressBarRand.Value = 100 * received / target;
                this.Text = String.Format("Thread {0}: {1}% complete", num, progressBarRand.Value);
            }, CancellationToken.None, TaskCreationOptions.None, uicontext);

            return (cancel) ? 0 : 1;
        }

A return value of 0 means the user has asked to cancel the operation. This return code propagates back down the application layers into the enclave. The enclave code looks at the return value of the OCALL to determine whether or not to cancel:

        // Make our callback. Be polite and only do this every MB.
        // (Assuming 1 KB = 1024 bytes, 1MB = 1024 KB)
        if (!(i % 1024)) {
            status = o_genrand_progress(&rv, callback, sz, i + 1, kb);
            // rv == 0 means we got a cancellation request
            if (status != SGX_SUCCESS || rv == 0) return i;
         }

Enclave Configuration

The default configuration for an enclave is to allow a single thread. As the sample application can run up to three threads in the enclave at one time—the CPUID function on the UI thread and the two RDRAND operations in background threads—the enclave configuration needed to be changed. This is done by setting the TCSNum parameter to 3 in Enclave.config.xml. If this parameter is left at its default of 1 only one thread can enter the enclave at a time, and simultaneous ECALLs will fail with the error code SGX_ERROR_OUT_OF_TCS.

<EnclaveConfiguration><ProdID>0</ProdID><ISVSVN>0</ISVSVN><StackMaxSize>0x40000</StackMaxSize><HeapMaxSize>0x100000</HeapMaxSize><TCSNum>3</TCSNum><TCSPolicy>1</TCSPolicy><DisableDebug>0</DisableDebug><MiscSelect>0</MiscSelect><MiscMask>0xFFFFFFFF</MiscMask></EnclaveConfiguration>

Summary

Mixing Intel SGX with managed code is not difficult, but it can involve a number of intermediate steps. The sample C# application presented in this article represents one of the more complicated cases: multiple DLLs, multiple threads originating from .NET, locking in native space, OCALLS, and UI updates based on enclave operations. It is intended to demonstrate the flexibility that application developers really have when working with Intel SGX, in spite of their restrictions.

↧

Improve Video Quality, Build Extremely Efficient Encoders & Decoders with Intel® VPA & Intel® SBE

August 23, 2016, 2:45 pm

Latest and popular articles on Intel Technologies

≫ Next: Could Your Next App Be for B2B?

≪ Previous: Using Enclaves from .NET: Making ECALLS with Callbacks via OCALLS

Video Codec Developers: This could be your magic encoder/decoder ring. We're excited to announce new Intel® Video Pro Analyzer 2017 (Intel® VPA), and also Intel® Stress Bitstreams and Encoder 2017, which you can use to enhance the brilliance of your video quality, and build extremely efficient, robust encoders and decoders. Get the scoop and more technical details on these advanced Intel video analysis tools below.

Learn more: Intel® VPA | Intel® SBE

Enhance Video Quality & Streaming for AVC, HEVC, VP9 & MPEG-2 with Intel VPA

Improving your encoder's video quality and compliance becomes faster and easier. Intel® VPA, a comprehensive video analysis toolset to inspect, debug and optimize the encode/decode process for AVC, HEVC, VP9 and MPEG-2, brings efficiency and multiple UI enhancements in its 2017 edition. A few of the top new features include:

Optimized Performance & Efficiency

HEVC file indexingmakes Intel VPA faster and easier to use with better performance and responsiveness when loading and executing debug optimizations and quicker switching capabilities between frames.
MPEG-2 error resilience improvements (previously delivered for HEVC and VP9 analysis)
Significant improvements in decode processing time by 30% for HEVC and by 60% for AVC, along with AVC playback optimization. This includes optimizations for skipping some intermediate processing when the user clicks on frames to decode in the GUI.
Video Quality Caliper provides more stream information, and has faster playback speed.

Enhanced Playback & Navigation

New performance enhancements in the 2017 release include decoder performance optimization with good gains for linear playback and indexing (for HEVC) to facilitate very fast navigation within the stream. **Playback for HEVC improved 1.4x, and for AVC improved 2.2x.**¹
Performance analysis for HEVC and AVC playback (blue bars) consists of the ratio of average times to load one Time Lapse Footage American Cities sequence, 2160p @ 100 frames.
Performance analysis for HEVC Random Navigation (orange bar) improved by 12x and consists of the ratio of latency differences to randomly access the previous frame from the current frame, measured on 2160p HEVC/AVC video.

UI Enhancements

Filtering of error messages and new settings to save fixes
Additional options for save/load, including display/decode order, fields/frame, yuv mode, and more.
Improved GUI picture cashing.

And don't forget. With this advance video analysis tool, you can innovate for UHD with BT 2020 support. See the full list of Intel VPA features, visit the product site for more details. Versions for Linux*, Windows* and Mac OSX* are available.

For current users, Download the Intel VPA 2017 Edition Now

If you are not already using Intel VPA - Get a Free Trial²

Intel VPA special offer for Academia: 50% discount, and free for eligible students and educators

More Resources - Get Started Optimizing Faster

Build Compliant HEVC & AVS2 Decoders with new Intel SBE 2017

Intel SBE is a set of streams and tools for VP9, HEVC, and AVS2 for extensive validation of the decoders, transcoders, players, and streaming solutions. You can also create custom bitstreams for testing and optimize stream base for coverage and usage efficiency. The new 2017 release delivers:

Improved HEVC coverage including syntax ensuring that decoders are in full conformance with the standard. Longterm reference generation support for video conferencing.
Random Encoder for AVS2 Main and Main 10 (this can be shipped only to those who are members of the AVS2 committee. (AVS2 format is broadly used in People's Republic of China.)
Compliance with the recently finalized AVS2 standard.

Learn more by visiting the Intel SBE site.

Take a test drive of Intel SBE - Download a Free Evaluation²

¹Baseline configuration: Intel® VPA 2017 vs. 2016 running on Microsoft Windows* 8.1. Intel Customer Reference Platform with Intel® Core™ i7-5775C (3.4 GHz, 32 GB DDR3 DRAM). Gigabyte Z97-HD3 Desktop board, 32GB (4x8GB DDR3 PC3-12800 (800MHz) DIMM), 500GB Intel SSD, Turbo Boost Enabled, and HT Enabled. Source: Intel internal measurements as of August 2016.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/performance. Features and benefits may require an enabled system and third party hardware, software or services. Consult your system provider.

Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804.

²Note that in evaluation package, the number of streams and/or other capabilities may be limited. Contact Intel Sales if you need a review without these limitations.

↧

Could Your Next App Be for B2B?

August 24, 2016, 10:54 am

Latest and popular articles on Intel Technologies

≫ Next: DPDK Pdump in Open vSwitch* with DPDK

≪ Previous: Improve Video Quality, Build Extremely Efficient Encoders & Decoders with Intel® VPA & Intel® SBE

If you’re building a game, a photo-editing app, or a calendar app, you’re likely developing it for consumers, end users who will find it through the iOS app store or the Google Play store. Previously, we’ve talked about at how to get noticed in this saturated market. But another option you may want to consider is creating apps for the B2B space.

What Is B2B?

B2B just means business to business — when one business sells or provides a service to another business. As an app developer, this means that instead of targeting end consumers, you target businesses. The app might be for internal use, something that the sales force or field managers use to manage their workload, or it might be something that the business offers to their own customers as part of the product or service they’re providing. B2B apps are tools that allow businesses to work faster and more efficiently.

Why Do Businesses Need B2B Apps?

Every industry is unique, with its own pain points and specific needs, but no matter what service or product a company is providing, nearly all businesses have a need to run their operations more effectively. This is especially true today, where people often work remotely, and need to carry tools with them into the field. Whether they're used internally or distributed to another end user, mobile apps enable businesses, and their customers, to do business from anywhere, at any time, with less paperwork and less room for error.

More and more businesses are discovering that with apps, they can:

Offer extra convenience or functionality to customers. Some of the most familiar business apps supplement or support the business’s product or service. Think of a bank offering an app to manage deposits, or a dog-walker providing an app to schedule walks with their clients.
Give customers an enhanced shopping experience. Electronic catalogs and buying guides can streamline sales by giving customers key product details and comparisons and an easy way to buy from wherever they are.
Increase efficiency and responsiveness in the field. Sales teams can easily pull up the latest information on each customer, maximizing their chance to close sales, provide support, and build strong relationships.
Provide customers with the latest information—and deals. App updates allow businesses to push timely content and offers to already-interested customers.
Learn about customers’ needs. Analytics within the app can give businesses useful information about how it’s being used, and what their customers need.
Reduce paperwork. Not only is it more environmentally friendly, there’s much less chance of losing or misplacing important documents.

What’s the Benefit for Developers?

We’ve outlined some of the reasons that a business might want an app, but why should you be interested in this market as a developer?

Less competition.
Higher price point.

Lucrative Areas for B2B Apps

If the B2B angle sounds interesting to you, here’s a brief overview of some of the main areas you may want to start thinking about:

Ordering– Replace paper-based catalogs and enable on-the-spot ordering.
Inventory– Manage complicated inventory in order to showcase products with confidence
Quote generation– Consider all relevant details about a project, and generate an accurate quote in the field.
Geolocation– Connect sales or support staff in the field with customers who need service. Efficiently deploying these resources saves time and money, and increases customer satisfaction.
CRM– Access customer data from anywhere, enabling reps to provide accurate service and build long-term relationships.
Electronic signature– Close deals on the spot by enabling and managing electronic signatures.

B2B App Business Models

There are two main business models most frequently used by B2B developers, depending on your idea, and the market you’re targeting.

App store. If your app is something that many different business customers might be able to use, you can create the app and then put it up for sale in the Microsoft Store or in the iOS App Store or Google Play Store under business.
Custom. For more specialized apps, you can work directly with a business to create a custom app for them to use with their employees or customers.

Apps Are Not a One-Time Sale!

Apps need to be maintained and updated in order to continue functioning. This is even more important to businesses than to consumers. To be successful in this area, it’s important to in the long-term relationship and build trust that you will be there to keep everything running smoothly. Even if you don’t make very many functional updates, you’ll still need to stay on top of platform changes, updating for the latest versions of iOS and Android.

Got a great idea for a B2B app? Check back for future articles to help you market and sell your app to B2B customers.

↧

DPDK Pdump in Open vSwitch* with DPDK

August 24, 2016, 2:38 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Xeon Phi™ Processor Software Archive

≪ Previous: Could Your Next App Be for B2B?

This article describes the concept of DPDK pdump, how it can be tested, and the benefits the feature brings to Open vSwitch* (OVS) with the Data Plane Development Kit (DPDK). This article was written with users of OVS in mind who want to know more about the feature and for those who want to monitor traffic on DPDK devices in their OVS DPDK setup.

Note: DPDK pdump in OVS with DPDK is available on both the OVS master branch and the 2.6 release branch. Users can download the OVS master branch as a zip or the 2.6 branch as a zip, and installation steps for the OVS master branch and 2.6 branch.

DPDK Pdump

The pdump library was introduced in DPDK v16.07 to allow users of DPDK-based applications to monitor and capture the traffic passing on DPDK devices.

The pdump library uses a client-server model. The DPDK-based application (OVS DPDK) acts as the server and is responsible for initializing the pdump framework. OVS DPDK will initialize the pdump framework on startup if the 'pmd_pcap' and 'pdump' DPDK configuration options are set in DPDK when linking with OVS. This process is described in the installation documentation as well as in the Configuration Steps section of this article.

A separate, secondary DPDK process must be launched that acts as the client and is responsible for enabling and disabling the packet capture. An example of an existing application that can be used as the client is the dpdk-pdump application, which can be found in the 'app' directory of the DPDK. Further details of its usage can be found in the Configuration Steps section and in the DPDK documentation.

A performance decrease is expected when a monitoring application like dpdk-pdump is used.

The DPDK programmer's guide has a section dedicated to the pdump library which which contains more information about how it works.

Test Environment

The following describes how to set up an OVS DPDK configuration with one physical 'dpdk' port and the steps to set up the dpdk-pdump application to enable traffic capture on that port.

Figure 1 shows the test environment configuration.

Figure 1:Open vSwitch* with the Data Plane Development Kit configuration with one physical port and the librte_pdump library initialized. The dpdk-pdump DPDK sample application is being used to capture traffic passing on 'dpdk0' and saving the information in 'pkts.pcap'.

The setup used in this article consists of the following hardware and software components:

Processor	Intel® Xeon® processor E5-2695 v3 @ 2.30 GHz
Kernel	4.2.8-200
OS	Fedora* 22
Data Plane Development Kit	v16.07
Open vSwitch*	92690eae8aac24bba499da921206852951581836

Configuration Steps

Build OVS with DPDK as described in the installation docs. Make sure DPDK is built with the following configuration options set:
CONFIG_RTE_LIBRTE_PMD_PCAP=y CONFIG_RTE_LIBRTE_PDUMP=y
Configure the switch as described in the Test Environment section, with one physical 'dpdk' port.
ovs-vsctl add-br br0 ovs-vsctl set Bridge br0 datapath_type=netdev ovs-vsctl add-port br0 dpdk0 ovs-vsctl set Interface dpdk0 type=dpdk
Launch the switch. Navigate to the 'app/pdump' directory in the DPDK. 'make' the application and launch like so:
sudo ./build/app/dpdk_pdump -- --pdump port=0,queue=*,rx-dev=/tmp/pkts.pcap --server-socket-path=/usr/local/var/run/openvswitch
Send some traffic to dpdk0 via traffic generator or otherwise.
Inspect the contents of 'pkts.pcap' using a tool that can interpret pcap files. One example is tcpdump:
$ tcpdump -r pkts.pcap reading from file /tmp/pkts.pcap, link-type EN10MB (Ethernet) 13:14:42.4270163 IP 2.2.2.2.0 > 1.1.1.1.0: Flags [none], seq 0:6, win 0, length 6 13:14:44.126555 IP 2.2.2.2.0 > 1.1.1.1.0: Flags [none], seq 0:6, win 0, length 6

More information about the dpdk-pdump application as well as information about its usage can be found in the DPDK documentation, links to which can be found in the 'Addition Information' section.

Conclusion

In this article we described the DPDK pdump library and how it can be leveraged in OVS in order to capture traffic passing on DPDK ports.

Additional Information

DPDK pdump library: DPDK documentation

DPDK pdump sample application: DPDK documentation

Have a question? Feel free to follow up with the query on the Open vSwitch discussion mailing thread.

To learn more about OVS with DPDK, check out the following videos and articles on Intel® Developer Zone and Intel® Network Builders University.

QoS Configuration and usage for Open vSwitch* with DPDK

Open vSwitch with DPDK Architectural Deep Dive

DPDK Open vSwitch: Accelerating the Path to the Guest

About the Author

Ciara Loftus is a network software engineer with Intel. Her work is primarily focused on accelerated software switching solutions in user space running on Intel® architecture. Her contributions to OVS with DPDK include the addition of vHost User ports, NUMA-aware vHost User and DPDK v16.07 support.

↧

Intel® Xeon Phi™ Processor Software Archive

August 25, 2016, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Caffe* Optimized for Intel® Architecture: Applying Modern Code Techniques

≪ Previous: DPDK Pdump in Open vSwitch* with DPDK

In this page you will find the last 1-2 releases of the Intel® Xeon Phi™ Processor Software. The most recent release is found here: https://software.intel.com/en-us/articles/xeon-phi-software and we recommend customers use the latest release wherever possible.

N-1 release for the Intel® Xeon Phi™ Processor Software 1.4.x

N-1 release for the Intel® Xeon Phi™ Processor Software 1.4.x series

Intel® Xeon Phi™ Processor Software 1.4.0 release for Linux

Version	Downloads available	Size	MD5 Checksum
	Centos 7.2	487MB	77f83c57ab2663ba53704da27d4356a2
xppsl_1.4.0 (released: July 20, 2016)	RedHat 7.2	31MB	0aabeb4b816a055e4abf2d947ce6e0a9
	SuSE 12.0	324MB	d63b6d417f68deaae90072118e03dfb3
	SuSE 12.1	323MB	2c932c3f481373bf66ef7c2ae6b0fbbb

Documentation link	Description	Last Updated On	Size (approx)
releasenotes-linux.txt	English - Release Notes	July 2016	~7KB
EULA.txt	End User License Agreement (IMPORTANT: Read Before Downloading, Installing, or Using)	July 2016	~30KB
Outbound_License_Agreement.pdf	License	July 2016	~280KB
UserGuide.pdf	Intel® Xeon Phi™ Processor Software User's guide	July 2016	~850KB
micperf User's Guide	micperf User's Guide	July 2016	~1MB

**Intel® Xeon Phi™ Processor Software 1.4.0 release for Microsoft* Windows**

Version	Downloads available	Size	MD5 Checksum
xppsl_1.4.0 (released: July 20, 2016)	Microsoft* Windows	21MB	d6051d5d683613e139a662b6579c4b30

Documentation link	Description	Last Updated On	Size (approx)
releasenotes-windows.txt	English - Release Notes	July 2016	~6KB

Offload Over Fabric Software 1.4.0 release for Linux

Downloads available	Size	MD5 Checksum
Centos 7.2-Offload	7MB	0e14ee091a03a2dd5000dcec264320ea
RedHat 7.2-Offload	7MB	e9f6e0645bce15c3ea9c6e22632d65cf
SuSE 12.0-Offload	3MB	49297aebe98bbafcaed62d7293256771
SuSE 12.1-Offload	3MB	f408c72a7634877fbb2fca88cb76c257
Offload Over Fabric User's Guide	480KB

↧

Caffe* Optimized for Intel® Architecture: Applying Modern Code Techniques

August 27, 2016, 12:06 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Quark™ microcontroller D2000 – How to communicate with MATLAB over UART

≪ Previous: Intel® Xeon Phi™ Processor Software Archive

Improving the computational performance of a deep learning framework

Authors

Vadim Karpusenko, Ph.D., Intel Corporation
Andres Rodriguez, Ph.D., Intel Corporation
Jacek Czaja, Intel Corporation
Mariusz Moczala, Intel Corporation

Abstract

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture. This version of Caffe, known as Caffe optimized for Intel architecture, is currently integrated with the latest release of Intel® Math Kernel Library 2017 and is optimized for Intel® Advanced Vector Extensions 2 and will include Intel Advanced Vector Extensions 512 instructions. This solution is supported by Intel® Xeon® processors and Intel® Xeon Phi™ processors, among others. This paper includes performance results for a CIFAR-10* image-classification dataset, and it describes the tools and code modifications that can be used to improve computational performance for the BVLC Caffe code and other deep learning frameworks.

Introduction

Deep learning is a subset of general machine learning that in recent years has produced groundbreaking results in image and video recognition, speech recognition, natural language processing (NLP), and other big-data and data-analytics domains. Recent advances in computation, large datasets, and algorithms have been key ingredients behind the success of deep learning, which works by passing data through a series of layers, with each layer extracting features of increasing complexity.

Each layer in a deep network is trained to identify features of higher complexity—this figure shows a small subset of the features of a deep network projected down to the pixels space and the corresponding images that activate those features — **Figure 1**. Each layer in a deep network is trained to identify features of higher complexity—this figure shows a small subset of the features of a deep network projected down to the pixels space (the gray images on the left) and the corresponding images (colored images on the right) that activate those features.
Zeiler, Matthew D. and Fergus, Rob. New York University, Department of Computer Science. “Visualizing and Understanding Convolutional Networks.” 2014. https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf.

Supervised deep learning requires a labeled dataset. Three popular types of supervised deep networks are multilayer perceptrons (MLPs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). In these networks, the input is passed through a series of linear and non-linear transformations as it progresses through each layer, and an output is produced. An error and the respective cost of the error are then computed before a gradient of the costs of the weights and activations in the network is computed and iteratively backward propagated to lower layers. Finally, the weights or models are updated based on the computed gradient.

In MLPs, the input data at each layer (represented by a vector) is first multiplied by a dense matrix unique to that layer. In RNNs, the dense matrix (or matrices) is the same for every layer (the layer is recurrent), and the length of the network is determined by the length of the input signal. CNNs are similar to MLPs, but they use a sparse matrix for the convolutional layers. This matrix multiplication is represented by convolving a 2-D representation of the weights with a 2-D representation of the layer’s input. CNNs are popular in image recognition, but they are also used for speech recognition and NLP. For a detailed explanation of CNNs, see “CS231n Convolutional Neural Networks for Visual Recognition” at http://cs231n.github.io/convolutional-networks/.

Caffe

Caffe* is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and community contributors. This paper refers to that original version of Caffe as “BVLC Caffe.”

In contrast, Caffe optimized for Intel® architecture is a specific, optimized fork of the BVLC Caffe framework. Caffe optimized for Intel architecture is currently integrated with the latest release of Intel® Math Kernel Library (Intel® MKL) 2017, and it is optimized for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and will include Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions, which are supported by Intel® Xeon® processors and Intel® Xeon Phi™ processors, among others. For a detailed description of compiling, training, fine- tuning, testing, and using the various tools available, read “Training and Deploying Deep Learning Networks with Caffe* Optimized for Intel® Architecture” at https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-optimized-for-intel-architecture.

Intel would like to thank Boris Ginsburg for his ideas and initial contribution to the OpenMP* multithreading implementation of Caffe* optimized for Intel® architecture.

This paper describes the performance of Caffe optimized for Intel architecture compared to BVLC Caffe running on Intel architecture, and it discusses the tools and code modifications used to improve computational performance for the Caffe framework. It also shows performance results from using the CIFAR-10* image-classification dataset (https://www.cs.toronto.edu/~kriz/cifar.html) and the CIFAR-10 full-sigmoid model that composes layers of convolution, max and average pooling, and batch normalization: (https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt).

**Figure 2**. Example of CIFAR-10* dataset images

To download the source code for the tested Caffe frameworks, visit the following:

BVLC Caffe: https://github.com/BVLC/caffe
Caffe optimized for Intel architecture: https://github.com/intelcaffe/caffe

Image Classification

The CIFAR-10 dataset consists of 60,000 color images, each with dimensions of 32 × 32, equally divided and labeled into the following 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The classes are mutually exclusive; there is no overlap between different types of automobiles (such as sedans or sports utility vehicles [SUVs]) or trucks (which includes only big trucks)—neither group includes pickup trucks (see Figure 2).

When Intel tested the Caffe frameworks, we used the CIFAR-10 full-sigmoid model, a CNN model with multiple layers including convolution, max pooling, batch normalization, fully connected, and softmax layers. For layer descriptions, refer to the Code Parallelization with OpenMP* section.

Initial Performance Profiling

One method for benchmarking Caffe optimized for Intel architecture and BVLC Caffe is using the time command, which computes the layer-by-layer forward and backward propagation time. The time command is useful for measuring the time spent in each layer and for providing the relative execution times for different models:

./build/tools/caffe time \
    --model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \
    -iterations 1000

In this context, an iteration is defined as one forward and backward pass over a batch of images. The previous command returns the average execution time per iteration for 1,000 iterations per layer and for the entire network. Figure 3 shows the full output.

Output from the Caffe* time command — **Figure 3**. Output from the Caffe* **time** command

In our testing, we used a dual-socket system with one Intel Xeon processor E5-2699 v3 at 2.30 GHz per socket, 18 physical cores per CPU, and Intel® Hyper-Threading Technology (Intel® HT Technology) disabled. This dual- socket system had 36 cores in total, so the default number of OpenMP* threads, specified by the OMP_NUM_THREADS environment variable, was 36 for our tests, unless otherwise specified (note that we recommend letting Caffe optimize for Intel architecture automatically specify the OpenMP environment rather than setting it manually). The system also had 64 GB of DDR4 memory installed, operating at a frequency of 2,133 MHz.

Using those numbers, this paper demonstrates the performance results of code optimizations made by Intel engineers. We used the following tools for performance monitoring:

Callgrind* from Valgrind* toolchain
Intel® VTune™ Amplifier XE 2017 beta

Intel VTune Amplifier XE tools provide the following information:

Functions with the highest total execution time (hotspots)
System calls (including task switching)
CPU and cache usage
OpenMP multithreading load balance
Thread locks
Memory usage

We can use the performance analysis to find good candidates for optimization, such as code hotspots and long function calls. Figure 4 shows important data points from the Intel VTune Amplifier XE 2017 beta summary analysis running 100 iterations. The Elapsed Time, Figure 4 top, is 37 seconds.

This is the time that the code takes to execute on the test system. The CPU Time, shown below Elapsed Time, is 1,306 seconds—this is slightly less than 37 seconds multiplied by 36 cores (1,332 seconds). CPU Time is the combined duration sum of all threads (or cores, because hyper-threading was disabled in our test) contributing to the execution.

Intel® VTune™ Amplifier XE 2017 beta analysis summary for BVLC Caffe* CIFAR-10* execution — **Figure 4**. Intel® VTune™ Amplifier XE 2017 beta analysis; summary for BVLC Caffe* CIFAR-10* execution

The CPU Usage Histogram, Figure 4 bottom, shows how often a given number of threads ran simultaneously during the test. Most of the time, only a single thread (a single core) was running—14 seconds out of the 37-second total. The rest of the time, we had a very inefficient multithreaded run with less than 20 threads contributing to the execution.

The Top Hotspots section of the execution summary, Figure 4 middle, gives an indication of what is happening here. It lists function calls and their corresponding combined CPU times. The kmp_fork_barrier function is an internal OpenMP function for implicit barriers, and it is used to synchronize thread execution. With the kmp_fork_barrier function taking 1,130 seconds of CPU time, this means that during 87 percent of the CPU execution time, threads were spinning at this barrier without doing any useful work.

The source code of the BVLC Caffe package contains no #pragma omp parallel code line. In the BVLC Caffe code, there is no explicit use of the OpenMP library for multithreading. However, OpenMP threads are used inside of the Intel MKL to parallelize some of the math-routine calls. To confirm this parallelization, we can look at a bottom-up tab view (see Figure 5 and review the function calls with Effective Time by Utilization [at the top] and the individual thread timelines [at the bottom]).

Figure 5 shows the function-call hotspots for BVLC Caffe on the CIFAR-10 dataset.

**Figure 5**. Timeline visualization and function-call hotspots for BVLC Caffe* CIFAR-10* dataset training

The gemm_omp_driver_v2 function — part of libmkl_intel_thread.so— is a general matrix-matrix (GEMM) multiplication implementation of Intel MKL. This function uses OpenMP multithreading behind the scenes. Optimized Intel MKL matrix-matrix multiplication is the main function used for forward and backward propagation—that is, for weight calculation, prediction, and adjustment. Intel MKL initializes OpenMP multithreading, which usually reduces the computation time of GEMM operations. However, in this particular case—convolution for 32 × 32 images— the workload is not big enough to efficiently utilize all 36 OpenMP threads on 36 cores in a single GEMM operation. Because of this, a different multithreading-parallelization scheme is needed, as will be shown later in this paper.

To demonstrate the overhead of OpenMP thread utilization, we run code with the OMP_NUM_THREADS=1 environment variable, and then compare the execution times for the same workload: 31.1 seconds instead of 37 seconds (see the Elapsed Time section in Figure 4 and Figure 6 top). By using this environment variable, we force OpenMP to create only a single thread and to use it for code execution. The resulting almost six seconds of runtime difference in the BVLC Caffe code implementation provides an indication of the OpenMP thread initialization and synchronization overhead.

With this analysis setup, we identified three main candidates for performance optimization in the BVLC Caffe implementation: the im2col_cpu, col2im_cpu, and PoolingLayer::Forward_cpu function calls (see Figure 6 middle).

Code Optimizations

The Caffe optimized for Intel architecture implementation for the CIFAR-10 dataset is about 13.5 times faster than BVLC Caffe code (20 milliseconds [ms] versus 270 ms for forward-backward propagation). Figure 7 shows the results of our forward-backward propagation averaged across 1,000 iterations. The left column shows the BVLC Caffe results, and the right column shows the results for Caffe optimized for Intel architecture.

**Figure 7**. Forward-backward propagation results

For an in-depth description of these individual layers, refer to the Neural-Network-Layers Optimization Results section below.

For more information about defining calculation parameters for layers, visit http://caffe.berkeleyvision.org/tutorial/layers.html.

The following sections describe the optimizations used to improve the performance of various layers. Our techniques followed the methodology guidelines of Intel® Modern Code Developer Code, and some of these optimizations rely on Intel MKL 2017 math primitives. The optimization and parallelization techniques used in Caffe optimized for Intel architecture are presented here to help you better understand how the code is implemented and to empower code developers to apply these techniques for other machine learning and deep learning applications and frameworks.

Scalar and Serial Optimizations

Code Vectorization

After profiling the BVLC Caffe code and identifying hotspots—function calls that consumed most of the CPU time—we applied optimizations for vectorization. These optimizations included the following:

Basic Linear Algebra Subprograms (BLAS) libraries (switch from Automatically Tuned Linear Algebra System [ATLAS*] to Intel MKL)
Optimizations in assembly (Xbyak just-in-time [JIT] assembler)
GNU Compiler Collection* (GCC*) and OpenMP code vectorization

BVLC Caffe has the option to use Intel MKL BLAS function calls or other implementations. For example, the GEMM function is optimized for vectorization, multithreading, and better cache traffic. For better vectorization, we also used Xbyak — a JIT assembler for x86 (IA-32) and x64 (AMD64* or x86-64). Xbyak currently supports the following list of vector-instruction sets: MMX™ technology, Intel® Streaming SIMD Extensions (Intel® SSE), Intel SSE2, Intel SSE3, Intel SSE4, floating-point unit, Intel AVX, Intel AVX2, and Intel AVX-512.

The Xbyak assembler is an x86/x64 JIT assembler for C++, a library specifically created for developing code efficiently. The Xbyak assembler is provided as header-only code. It can also dynamically assemble x86 and x64 mnemonics. JIT binary-code generation while code is running allows for several optimizations, such as quantization, an operation that divides the elements of a given array by the elements of a second array, and polynomial calculation, an operation that creates actions according to constant, variable x, add, sub, mul, div, and so on. With the support of Intel AVX and Intel AVX2 vector-instruction sets, Xbyak can get a better vectorization ratio in the code implementation of Caffe optimized for Intel architecture. The latest version of Xbyak has Intel AVX-512 vector-instruction-set support, which can improve computational performance on the Intel Xeon Phi processor x200 product family. This improved vectorization ratio allows Xbyak to process more data simultaneously with single instruction, multiple data (SIMD) instructions, which more efficiently utilize data parallelism. We used Xbyak to vectorize this operation, which improved the performance of the process pooling layer significantly. If we know the pooling parameters, we can generate assembly code to handle a particular pooling model for a specific pooling window or pooling algorithm. The result is a plain assembly that is proven to be more efficient than C++ code.

Generic Code Optimizations

Other serial optimizations included:

Reducing algorithm complexity
Reducing the amount of calculations
Unwinding loops

Common-code elimination is one of the scalar optimization techniques that we applied during the code optimization. This was done in order to predetermine what can be calculated outside of the innermost for-loop.

For example, consider the following code snippet:

for (int h_col = 0; h_col < height_col; ++h_col) {
  for (int w_col = 0; w_col < width_col; ++w_col) {
    int h_im = h_col * stride_h - pad_h + h_offset;
    int w_im = w_col * stride_w - pad_w + w_offset;

In the third line of this code snippet, for the h_im calculation, we are not using a w_col index of the innermost loop. But this calculation will still be performed for every iteration of the innermost loop. Alternatively, we can move this line outside of the innermost loop with the following code:

for (int h_col = 0; h_col < height_col; ++h_col) {
  int h_im = h_col * stride_h - pad_h + h_offset;
  for (int w_col = 0; w_col < width_col; ++w_col) {
    int w_im = w_col * stride_w - pad_w + w_offset;

CPU-Specific, System-Specific, and Other Generic Code-Optimization Techniques

The following additional generic optimizations were applied:

Improved im2col_cpu/col2im_cpu implementation
Complexity reduction for batch normalization
CPU/system-specific optimizations
Use one core per computing thread
Avoid thread movement

Intel VTune Amplifier XE 2017 beta identified the im2col_cpu function as one of the hotspot functions—making it a good candidate for performance optimization. The im2col_cpu function is a common step in performing direct convolution as a GEMM operation for using the highly optimized BLAS libraries. Each local patch is expanded to a separate vector, and the whole image is converted to a larger (more memory- intensive) matrix whose rows correspond to the multiple locations where filters will be applied.

One of the optimization techniques for the im2col_cpu function is index-calculation reduction. The BVLC Caffe code had three nested loops for going through image pixels:

for (int c_col = 0; c_col < channels_col; ++c_col)
  for (int h_col = 0; h_col < height_col; ++h_col)
    for (int w_col = 0; w_col < width_col; ++w_col)
      data_col[(c_col*height_col+h_col)*width_col+w_col] = // ...

In this code snippet, BVLC Caffe was originally calculating the corresponding index of the data_col array element, although the indexes of this array are simply processed sequentially. Therefore, four arithmetic operations (two additions and two multiplications) can be substituted by a single index-incrementation operation. In addition, the complexity of the conditional check can be reduced due to the following:

/* Function uses casting from int to unsigned to compare if value
of parameter a is greater or equal to zero and lower than value of
parameter b. The b parameter has signed type and always positive,
therefore its value is always lower than 0x800... where casting
negative parameter value converts it to value higher than 0x800...
The casting allows to use one condition instead of two. */
inline bool is_a_ge_zero_and_a_lt_b(int a, int b) {
  return static_cast<unsigned>(a) < static_cast<unsigned>(b);
}

In BVLC Caffe, the original code had the conditional check if (x >= 0 && x < N), where x and N are both signed integers, and N is always positive. By converting the type of those integer numbers into unsigned integers, the interval for the comparison can be changed. Instead of running two compares with logical AND, a single comparison is sufficient after type casting:

if (((unsigned) x) < ((unsigned) N))

To avoid thread movement by the operating system, we used the OpenMP affinity environment variable, KMP_AFFINITY=c ompact,granularity=fine. Compact placement of neighboring threads can improve performance of GEMM operations because all threads that share the same last-level cache (LLC) can reuse previously prefetched cache lines with data.

For cache-blocking-optimization implementations and for data layout and vectorization, please refer to the following publication: http://arxiv.org/pdf/1602.06709v1.pdf.

Code Parallelization with OpenMP*

Neural-Network-Layers Optimization Results

The following neural network layers were optimized by applying OpenMP multithreading parallelization to them:

Convolution
Deconvolution
Local response normalization (LRN)
ReLU
Softmax
Concatenation
Utilities for OpenBLAS* optimization—such as the vPowx - y[i] = x[i]β operation, caffe_set, caffe_copy, and caffe_rng_bernoulli
Pooling
Dropout
Batch normalization
Data
Eltwise

Convolution Layer

The convolution layer, as the name suggests, convolves the input with a set of learned weights or filters, each producing one feature map in the output image. This optimization prevents under-utilization of hardware for a single set of input feature maps.

template <typename Dtype>
void ConvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& \
      bottom, const vector<Blob<Dtype>*>& top) {
  const Dtype* weight = this->blobs_[0]->cpu_data();
  // If we have more threads available than batches to be prcessed then
  // we are wasting resources (lower batches than 36 on XeonE5)
  // So we instruct MKL
  for (int i = 0; i < bottom.size(); ++i) {
    const Dtype* bottom_data = bottom[i]->cpu_data();
    Dtype* top_data = top[i]->mutable_cpu_data();
#ifdef _OPENMP
    #pragma omp parallel for num_threads(this->num_of_threads_)
#endif
      for (int n = 0; n < this->num_; ++n) {
        this->forward_cpu_gemm(bottom_data + n*this->bottom_dim_,
                               weight,
                               top_data + n*this->top_dim_);
        if (this->bias_term_) {
          const Dtype* bias = this->blobs_[1]->cpu_data();
          this->forward_cpu_bias(top_data + n * this->top_dim_, bias);
        }
      }
  }
}

We process k = min(num_threads,batch_size) sets of input_feature maps; for example, k im2col operations happen in parallel, and k calls to Intel MKL. Intel MKL is switched to a single-threaded execution flow automatically, and performance overall is better than it was when Intel MKL was processing one batch. This behavior is defined in the source code file, src/caffe/layers/base_conv_layer.cpp. The implementation optimized OpenMP multithreading from src/caffe/layers/conv_layer.cpp — the file location with the corresponding code

Pooling or Subsampling

Max-pooling, average-pooling, and stochastic-pooling (not implemented yet) are different methods for downsampling, with max-pooling being the most popular method. The pooling layer partitions the results of the previous layer into a set of usually non-overlapping rectangular tiles. For each such sub-region, the layer then outputs the maximum, the arithmetic mean, or (in the future) a stochastic value sampled from a multinomial distribution formed from the activations of each tile.

Pooling is useful in CNNs for three main reasons:

Pooling reduces the dimensionality of the problem and the computational load for upper layers.
Pooling lower layers allows the convolutional kernels in higher layers to cover larger areas of the input data and therefore learn more complex features; for example, a lower-layer kernel usually learns to recognize small edges, whereas a higher-layer kernel might learn to recognize sceneries like forests or beaches.
Max-pooling provides a form of translation invariance. Out of eight possible directions in which a 2 × 2 tile (the typical tile for pooling) can be translated by a single pixel, three will return the same max value; for a 3 × 3 window, five will return the same max value

Pooling works on a single feature map, so we used Xbyak to make an efficient assembly procedure that can create average-to-max pooling for one or more input feature maps. This pooling procedure can be implemented for a batch of input feature maps when you run the procedure parallel to OpenMP.

The pooling layer is parallelized with OpenMP multithreading; because images are independent, they can be processed in parallel by different threads:

#ifdef _OPENMP
  #pragma omp parallel for collapse(2)
#endif
  for (int image = 0; image < num_batches; ++image)
    for (int channel = 0; channel < num_channels; ++channel)
      generator_func(bottom_data, top_data, top_count, image, image+1,
                        mask, channel, channel+1, this, use_top_mask);
}

With the collapse(2) clause, OpenMP #pragma omp parallel spreads on to both nested for-loops, iterates though images in the batch and image channels, combines the loops into one, and parallelizes the loop.

Softmax and Loss Layer

The loss (cost) function is the key component in machine learning that guides the network training process by comparing a prediction output to a target or label and then readjusting weights to minimize the cost by calculating gradients—partial derivatives of the weights with respect to the loss function.

The softmax (normalized exponential) function is the gradient-log normalizer of the categorical probability distribution. In general, this is used to calculate the possible results of a random event that can take on one of K possible outcomes, with the probability of each outcome separately specified. Specifically, in multinomial logistic regression (a multi-class classification problem), the input to this function is the result of K distinct linear functions, and the predicted probability for the j^th class for sample vector x is:

OpenMP multithreading, when applied for these calculations, is a method of parallelizing by using a master thread to fork a specified number of subordinate threads as a way of dividing a task among them. The threads then run concurrently as they are allocated to different processors. For example, in the following code, parallelized individual arithmetic operations with independent data access are implemented through division by the calculated norm in different channels:

    // division
#ifdef _OPENMP
#pragma omp parallel for
#endif
    for (int j = 0; j < channels; j++) {
      caffe_div(inner_num_, top_data + j*inner_num_, scale_data,
              top_data + j*inner_num_);
    }

Rectified Linear Unit (ReLU) and Sigmoid—Activation/ Neuron Layers

ReLUs are currently the most popular non-linear functions used in deep learning algorithms. Activation/neuron layers are element-wise operators that take one bottom blob and produce one top blob of the same size. (A blob is the standard array and unified memory interface for the framework. As data and derivatives flow through the network, Caffe stores, communicates, and manipulates the information as blobs.)

The ReLU layer takes input value x and computes the output as x for positive values and scales them by negative_slope for negative values:

The default parameter value for negative_slope is zero, which is equivalent to the standard ReLU function of taking max(x, 0). Due to the data-independent nature of the activation process, each blob can be processed in parallel as shown on the next page:

template <typename Dtype>
void ReLULayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  const int count = bottom[0]->count();
  Dtype negative_slope=this->layer_param_.relu_param().negative_slope();
#ifdef _OPENMP
#pragma omp parallel for
#endif
  for (int i = 0; i < count; ++i) {
    top_data[i] = std::max(bottom_data[i], Dtype(0))
        + negative_slope * std::min(bottom_data[i], Dtype(0));
  }
}

Similar parallel calculations can be used for backward propagation, as shown below:

template <typename Dtype>
void ReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (propagate_down[0]) {
    const Dtype* bottom_data = bottom[0]->cpu_data();
    const Dtype* top_diff = top[0]->cpu_diff();
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
    const int count = bottom[0]->count();
    Dtype negative_slope=this->layer_param_.relu_param().negative_slope();
#ifdef _OPENMP
#pragma omp parallel for
#endif
    for (int i = 0; i < count; ++i) {
      bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)
          + negative_slope * (bottom_data[i] <= 0));
    }
  }
}

In the same fashion, the sigmoid function S(x) = 1 / (1 + exp(-x)) can be parallelized in the following way:

#ifdef _OPENMP
  #pragma omp parallel for
#endif
  for (int i = 0; i < count; ++i) {
    top_data[i] = sigmoid(bottom_data[i]);
  }

Because Intel MKL does not provide math primitives to implement ReLUs to add this functionality, we tried to implement a performance-optimized version of the ReLU layer with assembly code (via Xbyak). However, we found no visible gain for Intel Xeon processors — perhaps due to limited memory bandwidth. Parallelization of the existing C++ code was good enough to improve the overall performance.

Conclusion

The previous section discussed various components and layers of neural networks and how blobs of processed data in these layers were distributed among available OpenMP threads and Intel MKL threads. The CPU Usage Histogram in Figure 8 shows how often a given number of threads ran concurrently after our optimizations and parallelizations were applied.

With Caffe optimized for Intel architecture, the number of simultaneously operating threads is significantly increased. The execution time on our test system dropped from 37 seconds in the original, unmodified run to only 3.6 seconds with Caffe optimized for Intel architecture—improving the overall execution performance by more than 10 times.

**Figure 8**. Intel® VTune™ Amplifier XE 2017 beta analysis summary of the Caffe* optimized for Intel® architecture implementation for CIFAR-10* training

As shown in the Elapsed Time section, Figure 8 top, there is still some Spin Time present during the execution of this run. As a result, the execution’s performance does not scale linearly with the increased thread count (in accordance with Amdahl’s law). In addition, there are still serial execution regions in the code that are not parallelized with OpenMP multithreading. Re-initialization of OpenMP parallel regions was significantly optimized for the latest OpenMP library implementations, but it still introduces non-negligible performance overhead. Moving OpenMP parallel regions into the main function of the code could potentially improve the performance even more, but it would require significant code refactoring.

Figure 9 summarizes the described optimization techniques and code rewriting principals that we followed with Caffe optimized for Intel architecture.

**Figure 9**. Step-by-step approach of Intel® Modern Code Developer Code

In our testing, we used Intel VTune Amplifier XE 2017 beta to find hotspots—good code candidates for optimization and parallelization. We implemented scalar and serial optimizations, including common-code elimination and reduction/simplification of arithmetic operations for loop index and conditional calculations. Next, we optimized the code for vectorization following the general principles described in “Auto-vectorization in GCC” (https://gcc.gnu. org/projects/tree-ssa/vectorization.html). The JIT assembler Xbyak allowed us to use SIMD operations more efficiently.

We implemented multithreading with an OpenMP library inside the neural-network layers, where data operations on images or channels were data-independent. The last step in implementing the Intel Modern Code Developer Code approach involved scaling the single-node application for many-core architectures and a multi- node cluster environment. This is the main focus of our research and implementation at this moment. We also applied optimizations for memory (cache) reuse for better computational performance. For more information see: http://arxiv.org/pdf/1602.06709v1.pdf. Our optimizations for the Intel Xeon Phi processor x200 product family included the use of high-bandwidth MCDRAM memory and utilization of the quadrant NUMA mode.

Caffe optimized for Intel architecture not only improves computational performance, but it enables you to extract increasingly complex features from data. The optimizations, tools, and modifications included in this paper will help you achieve top computational performance from Caffe optimized for Intel architecture.

For more information about the Intel Modern Code Developer Code program, refer to the following publications:

For more information on machine learning, see:

http://software.intel.com/machinelearning

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other informa- tion and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, visit intel.com/performance.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimiza- tions include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice Revision #20110804

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

Intel technologies may require enabled hardware, specific software, or services activation. Check with your system manufacturer or retailer.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

Intel, the Intel logo, Intel Xeon Phi, VTune, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

0816/VK/PRW/PDF 334759-001US

↧

Intel® Quark™ microcontroller D2000 – How to communicate with MATLAB over UART

August 25, 2016, 11:55 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Quark™ microcontroller D2000 – Design consideration pin multiplexing of a GPIO application

≪ Previous: Caffe* Optimized for Intel® Architecture: Applying Modern Code Techniques

Introduction

Intel® System Studio 2016 for Microcontrollers is an integrated tool suite for developing, optimizing, and debugging systems and firmware for the Intel® Quark™ microcontroller D2000 and Intel® Quark™ SE microcontroller development boards, which offers a microcontroller core to enable applications from device control to edge sensing IoT solutions.

Objective of this document

This document explains how to communicate with MATLAB® over UART of D2000. Most of IoT applications would collect sensory data and send it to back-end analytic server, but it is important to analyze actual sensor data before send it to server. During the development phase, developer would check the reliability of data acquisition from the sensor on serial terminal such as Putty or in Excel when user can import those data into spreadsheet. One of popular tool of those data analysis in visual figure is to use MATLAB® and this document shows how configure the Intel® Quark™ microcontroller D2000’s UART so that MATLAB® collect sensor data to plot these data in visual axes.

↧

Intel® Quark™ microcontroller D2000 – Design consideration pin multiplexing of a GPIO application

August 26, 2016, 12:01 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Memory Protection Extensions on Windows® 10: A Tutorial

≪ Previous: Intel® Quark™ microcontroller D2000 – How to communicate with MATLAB over UART

Introduction

Objective of this document

This document describes what kind of configurations are needed to be considered to implement your own customized GPIO application which uses other peripheral interfaces such as I2C, SPI, PWM and ADC channels. Because the Intel® Quark™ microcontroller D2000 has limited physical pinout which is compatible with Arduino UNO 3 header pinout, you need to carefully configure each pinout according to your application interface with a sensor or an actuator.

↧

Intel® Memory Protection Extensions on Windows® 10: A Tutorial

August 26, 2016, 4:57 pm

Latest and popular articles on Intel Technologies

≫ Next: Texture Space Caching and Reconstruction for Ray Tracing

≪ Previous: Intel® Quark™ microcontroller D2000 – Design consideration pin multiplexing of a GPIO application

Introduction

Beginning with the Intel® 6th generation Core™ processor, Intel has introduced Intel® Memory Protection Extensions (Intel® MPX), a new extension to the instruction set architecture that aims to enhance software security by helping to protect against buffer overflow attacks. In this article, we discuss buffer overflow, and then give step-by step details on how application developers can prevent their apps from suffering from buffer overflow attacks on Windows® 10. Intel MPX works for both traditional desktop apps and Universal Windows Platform* apps.

Prerequisites

To run the samples discussed in this article, you’ll need the following hardware and software:

A computer (desktop, laptop, or any other form factor) with Intel® 6th generation Core™ processor and Microsoft Windows 10 OS (November 2015 update or greater; Windows 10 version 1607 is preferred)
Intel MPX enabled in UEFI (if the option is available)
Intel MPX driver properly installed
Microsoft Visual Studio* 2015 (update 1 or later IDE; Visual Studio 2015 update 3 is preferred)

Buffer Overflow

C/C++ code is by nature more susceptible to buffer overflows. For example, in the following code the string operation function “strcpy” in main() will put the program at risk for a buffer overflow attack.

#include "stdafx.h"
#include <iostream>
#include <time.h>
#include <stdlib.h>

using namespace std;

void GenRandomUname(char* uname_string, const int uname_len)
{
	srand(time(NULL));
	for (int i = 0; i < uname_len; i++)
	{
		uname_string[i] = (rand() % ('9' - '0' + 1)) + '0';
	}
	uname_string[uname_len] = '\0';
}

int main(int argnum, char** args)
{
	char user_name[16];
	GenRandomUname(user_name, 15);
	cout << "random gentd user name: "<< user_name << endl;

	char config[10] = { '\0' };
	strcpy(config, args[1]);

	cout << "config mem addr: "<< &config << endl;
	cout << "user_name mem addr: "<< &user_name << endl;

	if (0 == strcmp("ROOT", user_name))
	{
		cout << "Buffer Overflow Attacked!"<< endl;
		cout << "Uname changed to: "<< user_name << endl;
	}
	else
	{
		cout << "Uname OK: "<< user_name << endl;
	}
	return 0;
}

To be more accurate, if we compile and run the above sample as a C++ console application, passing CUSM_CFG as an argument, the program will run normally and the console will show the following output:

Figure 1 Buffer Overflow

But if we rerun the program passing CUSTOM_CONFIGUREROOT as an argument, the output will be “unexpected” and the console will show a message like this:

Figure 2 Buffer Overflow

This simple example shows how a buffer overflow attack works. The reason why there can be unexpected output is that the function call of strcpy does not check the bonds of the destination array. Although compilers usually give several extra bytes to arrays for memory alignment purpose, buffer overflow may still happen if the source array is long enough. In this case, a piece of the runtime memory layout of the program looks like this (the result of different compilers or compile options may vary):

Intel Memory Protection Extensions

With the help of Intel MPX, we can avoid the buffer overflow security issue simply by adding the compile option /d2MPX to the Visual Studio C++ compiler.

After recompiling with the Intel MPX option, the program is able to defend against buffer overflow attacks. If we try running the recompiled program with CUSTOM_CONFIGUREROOT argument, a runtime exception will arise and cause the program to exit.

Let’s dig into the generated assembly code to see what Intel MPX has done with the program. From the results, we can see that many of the instructions related to Intel MPX have been inserted into the original instructions to detect buffer overflows at runtime.

Now let’s look in more detail at the instructions related to Intel MXP:

bndmk: Creates LowerBound (LB) and UpperBound (UB) in the bounds register (%bnd0) in the code snapshot above.
bndmov: Fetches the bounds information (upper and lower) out of memory and puts it in a bounds register.
bndcl: Checks the lower bounds against an argument (%rax) in the code snapshot above.
bndcu: Checks the upper bounds against an argument (%rax) in the code snapshot above.

Troubleshooting

If MPX is not working properly,

Double-check the versions of your CPU, OS, and Visual Studio 2015. Boot the PC into the UEFI settings to check if there is any Intel MPX switch; turn on the switch if needed.
Confirm that the Intel MPX driver is properly installed and functioning properly in the Windows* Device Manager.
Check that the compiled executable contains instructions related to Intel MPX. Insert a break point, and then run the program. When the break point is hit, right-click with the mouse, and then click Go To Disassembly. A new window will display for viewing the assembly code.

Conclusion

Intel MPX is a new hardware solution that helps defend against buffer overflow attacks. Compared with software solutions such as AddressSanitizer (https://code.google.com/p/address-sanitizer/), from an application developer’s point of view, Intel MPX has many advantages, including the following:

Detects when a pointer points out of the object but still points to valid memory.
Intel MPX is more flexible, it can be used in some certain modules without affecting any other modules.
Compatibility with legacy code is much higher for code instrumented with Intel MPX.
One single version binary can still be released, because of the particular instruction encoding. The instructions related to Intel MPX will be executed as NOPs (No Operations) on unsupported hardware or operations systems.

On Intel® 6th generation Core™ Processor and Windows 10, benefiting from Intel MPX for applications is as simple as adding a compiler option, which can help enhance application security without hurting the application’s backward compatibility.

Intel® Memory Protection Extensions Enabling Guide:

https://software.intel.com/en-us/articles/intel-memory-protection-extensions-enabling-guide

References

[1] AddressSanitizer: https://code.google.com/p/address-sanitizer/

About the Author

Fanjiang Pei is an application engineer in the Client Computing Enabling Team, Developer Relations Division, Software and Solutions Group (SSG). He is responsible for enabling security technologies of Intel such as Intel MPX, Intel® Software Guard Extensions, and more.

↧

Texture Space Caching and Reconstruction for Ray Tracing

August 29, 2016, 9:36 am

Latest and popular articles on Intel Technologies

≫ Next: Cordova Whitelisting with Intel® XDK for AJAX and Launching External Apps

≪ Previous: Intel® Memory Protection Extensions on Windows® 10: A Tutorial

By Jacob Munkberg, Jon Hasselgren, Petrik Clarberg, Magnus Andersson and Tomas Akenine-Möller
Intel Corporation

Abstract

We present a texture space caching and reconstruction system for Monte Carlo ray tracing. Our system gathers and filters shading on-demand, including querying secondary rays, directly within a filter footprint around the current shading point. We shade on local grids in texture space with primary visibility decoupled from shading.

Unique filters can be applied per material, where any terms of the shader can be chosen to be included in each kernel. This is a departure from recent screen space image reconstruction techniques, which typically use a single, complex kernel with a set of large auxiliary guide images as input. We show a number of high performance use cases for our system, including interactive denoising of Monte Carlo ray tracing with motion/defocus blur, spatial and temporal shading reuse, cached product importance sampling, and filters based on linear regression in texture space.

↧

Cordova Whitelisting with Intel® XDK for AJAX and Launching External Apps

August 29, 2016, 10:30 pm

Latest and popular articles on Intel Technologies

≫ Next: Connecting an Intel® IoT Gateway to Amazon Web Services*

≪ Previous: Texture Space Caching and Reconstruction for Ray Tracing

Cordova CLI 5.1.1 and Higher

Starting with Apache* Cordova* CLI 5.1, the whitelisting security model that restricts and permits access to other domains from the app has changed. It is recommended that before you move your app to production you include a whitelist of the domains to which your app needs access.

Android

Starting with Cordova Android 4.0, your Android app's security policy is managed through a Whitelist Plugin and standard W3C Content Security Policy (CSP) directives. The Android Cordova whitelist plugin understands three distinct whitelist tags:

<access> tag for Network Requests
<allow-intent> tag for Intent Requests
<allow-navigation> for Navigation

CSP directives are set by including a meta-tag in the <head> section of your index.html file. An Introduction to Content Security Policy is a good place to go to understand how to configure and apply these whitelist rules to your app. The CSP Playground is also a very useful site for learning about CSP and validating your CSP rules.

iOS

Unlike Android, your Cordova iOS app's whitelist security policy is managed directly by the cordova-ios framework. Cordova iOS versions prior to 4.0 used only the W3C Widget Access specification for domain whitelisting (i.e., the <access> tag). Starting with Cordova iOS 4.0, your Cordova iOS app's whitelist uses the <access> tag, as before, and adds support for two additional tags: <allow-intent> and <allow-navigation> as described in the Whitelist Plugin.

Starting with iOS 9, a scheme called Application Transport Security (ATS) is used to implement whitelist rules. Cordova automatically converts your <access> and <allow-navigation> tags to their equivalent ATS directives. When used with iOS apps, the <access> and <allow-navigation> tags support two new attributes for extra security for a domain whose security attributes you have control over. They have their equivalents in ATS:

minimum-tls-version
requires-forward-secrecy

See the ATS Technote for more details.

Windows

On Windows platforms, Cordova continues to use the W3C Widget Access specification to enforce domain whitelisting, which is built into the Cordova Windows framework.

See the following section for information regarding CSP directives and the Windows platforms.

Content Security Policy (CSP)

CSP is managed by the webview runtime (the builtin web runtime on which your Cordova app executes). Network requests include such actions as retrieving images from a remote server, performing AJAX requests (XHR), etc. CSP controls are specified in a single meta tag in your html files. Most Cordova apps are single-page apps, meaning they have only a single index.html file. If your app contains multiple html files, it is recommended that you use CSP <meta> tag on all of your pages.

Android version 4.4 (KitKat) and above supports the use of CSP (the Android 4.4 native webview is based on Chromium 30). If you are using the Android Crosswalk webview, CSP is supported on Android version 4.0 (Ice Cream Sandwich) and later (the Crosswalk webviews are also based on Chromium).

Apple iOS 7.1 and later supports the use of CSP directives (Apple iOS devices run on the Safari webview).

Windows Phone 8.x devices provide partial support via the X-Content-Security-Policy directive (Windows Phone 8.x devices run on the IE10 and IE11 mobile webviews). Windows 10 devices include full support for standard CSP directives (Windows Phone 10 and Windows 10 tablets run on the Edge webview).

It is recommended that you use CSP whenever possible!!

To get started with CSP, you can include the following very long and overly permissive directive in the <head> section of your index.html file (let your mouse hover over the code fragment shown below and then select the "view source" icon to open a window that will allow you to see and select the entire CSP rule):

<meta http-equiv="Content-Security-Policy" content="default-src 'self''unsafe-eval' data: blob: filesystem: ws: gap: cdvfile: https://ssl.gstatic.com *; style-src * 'unsafe-inline'; script-src * 'unsafe-inline''unsafe-eval'; img-src * data: 'unsafe-inline'; connect-src * 'unsafe-inline'; child-src *; ">

There is no single CSP directive that can be recommended for all applications. The correct CSP directive is the one that provides the access you need while simultaneously insuring the protection necessary to keep your app from being compromised and exposing customer or user data.

This StackOverflow post is very helpful to read as an introduction to how Content Security Policy rules work.

Intel XDK 3088 and Higher

Starting with Intel XDK version 3088, the UI provided to specify whitelist entries has changed to accommodate changes in Cordova whitelist rules. Please read the rest of this document to understand how to specify whitelist entries in the Intel XDK.

Network Request Whitelist (<access>):

Network Request controls which network requests, such as content fetching or AJAX (XHR), are allowed to be made from within the app. For those webviews that support CSP, it is recommended that you use CSP. This whitelist entry is intended for older webviews that do not support CSP.

These whitelist specifications are defined in a Cordova CLI config.xml file using the <access origin> tag. Within the Intel XDK UI you specify your URLs in the Build Settings section of the Projects tab. For example, to specify http://mywebsite.com as a whitelisted URL:

Networkwhitelist5.4.1

By default, only requests to file:// URLs are allowed, but Cordova applications by default include access to all website. It is recommended that you provide your whitelist before publishing your app.

Intent Whitelist (<allow-intent>):

The intent whitelist controls which URLs the app is allowed to ask the system (ie., the webview) to open. By default, no external URLs are allowed. This applies to inline hyperlinks and calls to the window.open() function (note, if you are using the inAppBrowser it may change the behavior of window.open(), especially regarding whitelist rules). You app can open "hyperlinks" like a browser (for http:// and https:// URLs) and can "open" other apps via hyperlinks, such as the phone, sms, email, maps etc.

To allow your app to launch external apps through a URL or via window.open(), specify your rules in the Build Settings section of the Projects tab.

Navigation Whitelist (<allow-navigation>):

The navigation whitelist rules control which URLs the application webview can be navigated to. Only top level navigations are allowed, with the exception of Android, where it also applies to iframes for non-http(s) schemes. By default, you can only navigate to file:// URLs.

Additional Whitelist Settings for iOS ATS:

The UI whitelist settings for iOS are similar to those described above, with the addition of an ATS setting. When you click the "Edit ATS settings" link you can specify ATS settings for the Network Request and Navigation whitelist rules on your iOS 9 device. ATS settings do not apply to iOS 8 and earlier devices.

Most users should not have to change the ATS settings and can use the default values. For more details about ATS you can read this tutsplus.com article or search the web for additional articles.

The ATS settings dialog looks like this:

Windows Platform Whitelist Rules:

Windows platforms use the W3C Widget Access for whitelisting (that is, the <access> tag). Windows 10 also supports the <allow-navigation> tag. The rules for those tags are consistent with those described above. The Windows platforms also support CSP whitelist rules, which were described in the CSP section above.

Intel XDK versions prior to 3088:

Navigation Whitelist :

Navigation Whitelist controls which URLs the WebView can be navigated to. (Only top level navigations are allowed, with the exception,for Android it applies to iFrames also for non-http(s) schemes.) By default, you can only navigate to file:// URLs. To allow other URLS, <allow-navigation> tag is used in config.xml file. With the Intel® XDK you need not specify this in config.xml, the Intel XDK automatically generates config.xml from the Build settings.

In the Intel® XDK you specify the URL that you would like the WebView to be navigated to under Build Settings > Android > Cordova CLI X.Y > Whitelist > Cordova Whitelist > Navigation. For example: http://google.com

CLI5.1.1AndroidNavigation.png

Intent Whitelist:

Intent Whitelist controls which URLs the app is allowed to ask the system to open. By default, no external URLs are allowed. This applies to only hyperlinks and calls to window.open(). App can open a browser (for http:// and https”// URLs) or other apps like phone, sms, email, maps etc. To allow app to launch external apps through URL or launch inAppBrowser through window.open(), <allow-intent> tag is used in config.xml, but again you need not specify this in config.xml, the Intel® XDK takes care of it through Build settings.

In the Intel® XDK specify the URL you want to whitelist for external applications under Build Settings > Android > Cordova CLI X.Y > Whitelist > Cordova Whitelist > Intent. For example: http://example.com or tel:* or sms:*

CLI5.1.1AndroidIntent.png

Network Request Whitelist:

Network Request Whitelist controls, which network requests, such as content fetching or AJAX (XHR) etc. are allowed to be made from within the app. For the web views that support CSP, it is recommended that you use CSP. This whitelist is for the older WebViews that do not support CSP. This whitelist is defined in the config.xml using <access origin> tag, but once again in Intel® XDK you provide the URL under Build Settings > Android > Cordova CLI X.Y > Whitelist > Cordova Whitelist > Network Request. For example: http://mywebsite.com

By default, only request to file”// URLs are allowed, but Cordova applications by default include access to all website. It is recommended that you provide your whitelist before publishing your app.

CLI5.1.1AndroidNetwork.png

Content Security Policy:

Content Security Policy controls, which network requests such as images, AJAX requests (XHR) etc. are allowed to be made via WebView directly. This is specified through meta tags in your html file. It is recommended that you use CSP <meta> tag on all of your pages. Android KitKat onwards supports CSP, but Crosswalk web view supports CSP on all android versions.

For example include this in your index.html file.

<meta http-equiv="Content-Security-Policy" content="default-src 'self' data: gap: cdvfile: https://ssl.gstatic.com; style-src 'self''unsafe-inline'; media-src *">

For Microsoft Windows* platforms also, W3C Widget Access standards are used and the build settings for whitelisting are as follows.

Cordova CLI 4.1.2

Cordova CLI 4.1.2 is no longer supported by the Intel XDK. Please update your project to use CLI 5.4.1 or later.

↧

vHost User NUMA Awareness

Test Environment

Configuration Steps

Conclusion

Additional Information

About the Author

Step 1: Uniqueness, “The Why”

Step 2: Efficacy, “The How”

Step 3: Accessibility, “the Who”

Step 4: Visuals, “the What”

Step 5: Depth (Solo & Multiplayer)

Step 6: Repeat Until You Die

Summary

Use Case and Business Imperative

Implementation Approach

Intel® IoT Gateway and Temperature/Humidity Sensor

IBM Bluemix

IBM Watson IoT Platform

IBM Watson IoT Platform Analytics Real-Time Insights

Prerequisites and Overview of Changes

Gateway Changes

IBM Watson IoT Platform Changes

Implementation Details

IBM Watson IoT Platform:

Gateway:

IBM Watson IoT Platform

IoT Real-Time Insights

Bringing It All Together

Seeed Studio Grove* Starter Kit Plus

IBM Watson* IoT Platform

The IoT Development Process

How This Guide Is Organized

Part 1: Setting Up the Arduino 101*, Seeed Studio Grove* Starter Kit Plus, and Intel® IoT Gateway

Set Up the Gateway

Connect a Sensor to the Gateway

Log In to the Intel® IoT Gateway Developer Hub

Develop a Sensor Application

Debug Node-RED Flows

Role of the Gateway

Part 2: Connecting to the IBM Watson* IoT Platform

Connect to the IBM Watson* IoT Platform in Quickstart Mode

Connect to the IBM Watson* IoT Platform in Registered Mode

Summary and Next Steps

Adding the IoT Cloud repository to your Intel® IoT Gateway

Adding Microsoft* Azure* support to your Intel® IoT Gateway

Setup an Azure* IoT Hub

Create a new Azure* IoT Device

Mixing Managed Code and Native Code with C++/CLI

The Sample Application

The CPUID Tab

The RDRAND Tab

Overall Structure

The C# Application

The EnclaveLink DLL

The EnclaveBridge DLL

The OCALL and the Callback Sequence

The Delegate

Enclave Configuration

Summary

Optimized Performance & Efficiency

Enhanced Playback & Navigation

UI Enhancements​

More Resources - Get Started Optimizing Faster

DPDK Pdump

Test Environment

Configuration Steps

Conclusion

Additional Information

About the Author

N-1 release for the Intel® Xeon Phi™ Processor Software 1.4.x series

Intel® Xeon Phi™ Processor Software 1.4.0 release for Linux

Intel® Xeon Phi™ Processor Software 1.4.0 release for Microsoft* Windows

Offload Over Fabric Software 1.4.0 release for Linux

Improving the computational performance of a deep learning framework

Authors

Abstract

Introduction

Caffe

Image Classification

Initial Performance Profiling

Part 1: Setting Up the Arduino 101, Seeed Studio Grove Starter Kit Plus, and Intel® IoT Gateway

**Adding Microsoft* Azure* support to your Intel® IoT Gateway**

**Setup an Azure* IoT Hub**

**Create a new Azure* IoT Device**

UI Enhancements

**Intel® Xeon Phi™ Processor Software 1.4.0 release for Microsoft* Windows**