Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB)

November 3, 2017, 9:18 am

Latest and popular articles on Intel Technologies

≫ Next: Alternatives to Using the Intel® XDK to Develop Node.js* IoT Applications

≪ Previous: Intel® Parallel Computing Center at Freie Universität Berlin

ZIB

Principal Investigators:

Dr. Thomas Steinke

Thomas is head of the HPC dept. at the Zuse Institute Berlin (ZIB). His research interest is in high-performance computing, heterogeneous systems for scientific and data analytics applications, and parallel simulation methods. Thomas co-founded the OpenFPGA initiative in 2004, and he leads the Intel® Parallel Computing Center (Intel® PCC) at ZIB. He received the doctoral degree in Theoretical Chemistry from the Humboldt-Universität zu Berlin in 1990.

Florian Wende

Florian is part of the Distributed Algorithms and Supercomputing department at Zuse Institute Berlin (ZIB). He is interested in accelerator and many-core computing with application in Computer Science and Computational Physics. His focus is on load balancing of irregular parallel computations and on close-to-hardware code optimization. He received a Diploma degree in Physics from Humboldt Universität zu Berlin and a Bachelor degree in Computer Science from Freie Universität Berlin.

Matthias Noack

Matthias is part of the Distributed Algorithms and Supercomputing group at Zuse Institute Berlin (ZIB). His interests include parallel programming models, heterogeneous architectures, and scientific computing. He developed the Heterogeneous Active Messages (HAM) framework, which provides efficient offloading, local and over fabric, for multi- and many-cores. Matthias currently focuses on runtime compilation techniques, portable programming methods for vectorization, as well as optimization and scaling of the Hierarchical Equations of Motion (HEOM) method.

Description:

Intel Corporation and Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) have set up a "Research Center for Many-core High-Performance Computing" at ZIB. This Center will foster the uptake of current and next generation Intel many- and multi-core technology in high performance computing and big data analytics. The Intel® PCC at ZIB is focusing on a diverse set of codes including VASP which is targeted at atomic scale materials modelling.

The activities of the "Research Center for Many-core High-Performance Computing" are focused on enhancing selected workloads with impact on the HPC community to improve their performance and scalability on many-core processor technologies and platform architectures. The selected applications cover a wide range of scientific disciplines including materials science and nanotechnology, atmosphere and ocean flow dynamics, astrophysics, quantum physics, drug design, particle physics and big data analytics. Novel programming models and algorithms will be evaluated for the parallelization of the workloads on many-core processors.

The workload optimization for many-core processors is supported by research activities associated with many-core architectures at ZIB, where novel programming models and algorithms for many-core architectures are developed and evaluated.

Furthermore, the parallelization work is complemented by dissemination and education activities within the Northern German HPC Alliance "HLRN" to overcome the barriers involved with the introduction of upcoming highly parallel processor and platform technologies

"We are delighted to enter into a multi-year cooperation with Intel" said Prof. Alexander Reinefeld, head of the computer science department at Zuse Institute Berlin. "Our goal is to port and optimize selected HPC codes for Intel many-core processors with a special focus on maximum performance and scalability"

Publications:

ZIB - Zuse Institute Berlin, 06/22/2017, KART: Kernel Compilation at Run Time for Improving HPC Application Performance, IXPUG
ZIB - Zuse Institute Berlin, 10/15/2015, Gaining Performance through Vectorization using Fortran by ZIB, ZIB - Zuse Institute Berlin
ZIB - Zuse Institute Berlin, 11/03/2014, Chpt 12: Concurrent Kernel Offloading, High Performance Parallelism Pearls Volume 1
ZIB - Zuse Institute Berlin, 11/19/2014, Hierarchical Equations of Motion:? -?What we can learn from OpenCL, IXPUG
ZIB - Zuse Institute Berlin, 07/15/2015, Language Impact on Vectorization: Vector Programming in Fortran, IXPUG
ZIB - Zuse Institute Berlin, 07/15/2015, How Effective is SIMD in Case of Divergent Code Execution?, IXPUG
ZIB - Zuse Institute Berlin, 11/18/2015, Dynamic SIMD Scheduling, IXPUG
ZIB - Zuse Institute Berlin, 06/23/2016, Dynamic SIMD Vector Lane Scheduling, IXPUG
ZIB - Zuse Institute Berlin, 06/23/2016, AVX512 vs AVX2 on KNL, IXPUG
ZIB - Zuse Institute Berlin, 11/19/2014, Enabling Manual Vectorization of Complex Code Patterns in Fortran, IXPUG
ZIB - Zuse Institute Berlin, 08/01/2014, Concurrent Kernel Execution on Xeon Phi within Parallel Heterogeneous Workloads, Euro-Par 2014
ZIB - Zuse Institute Berlin, 10/20/2014, Concurrent Kernel Offloading, Web Article: TechEnablement
ZIB - Zuse Institute Berlin, 03/01/2015, SIMD Enabled Functions on Intel Xeon CPU & Intel Xeon Phi Coprocessor, White Paper: ZIB Website
ZIB - Zuse Institute Berlin, 07/01/2015, Chpt 19 - OpenCL: There and Back Again, High Performance Parallelism Pearls Volume 2
ZIB - Zuse Institute Berlin, 05/01/2015, Application Performance on a Cray XC30 Evaluation System with Xeon Phi Coprocessors at HLRN-III, Cray User Group (CUG) 2015
ZIB - Zuse Institute Berlin, 08/01/2016, Portable SIMD Performance with OpenMP* 4.x Compiler Directives, Euro-Par 2016
ZIB - Zuse Institute Berlin, 06/23/2016, Dynamic SIMD Vector Lane Scheduling, ISC16 IXPUG Workshop
ZIB - Zuse Institute Berlin, 05/01/2014, Integration of Intel Xeon Phi Servers into the HLRN-III Complex: Experiences, Performance and Lessons Learned, Cray User Group (CUG) 2014

Related Websites:

IPCC @ ZIB: Strategic Overview
IPCC @ ZIB: Project

Additional Sites:

AVX512 vs AVX2 on Intel® Xeon Phi™ processor family (Knights Landing)– ISC'16 IXPUG Workshop 06/2016

Dynamic SIMD Vector Lane Scheduling– ISC'16 IXPUG Workshop 06/2016

On Enhancing 3D-FFT Performance in VASP -- CUG'16, London, UK, 05/2016

Dynamic SIMD Scheduling– SC'15 IXPUG BoF, 11/2015

Explicit Vectorization in VASP– IXPUG 09/2015

OpenCL: There and Back Again– IXPUG 09/2015

Improving Thread Parallelism and Asynchronous Communication in VASP – IXPUG 09/2015

Runtime Kernel Compilation for efficient vectorisation– IXPUG 09/2015

Language Impact on Vectorization: Vector Programming in Fortran –ISC'15 IXPUG Workshop, 07/2015

How Effective is SIMD in Case of Divergent Code Execution? - ISC'15 IXPUG BoF, 07/2015

Efficient SIMD-code generation with OpenCL and OpenMP 4.0 – ISC'15 IXPUG BoF, 07/2015

Hierarchical Equations of Motion: What we can learn from OpenCL– SC'14 Birds of Feather 11/2014

Integration of Intel® Xeon Phi™ Servers into the HLRN-III Complex: Experiences, Performance and Lessons Learned, CUG'14, 05/2014

↧

Alternatives to Using the Intel® XDK to Develop Node.js* IoT Applications

November 3, 2017, 1:39 pm

Latest and popular articles on Intel Technologies

≫ Next: Using Chrome* DevTools to Debug your Remote IoT Node.js* Application

≪ Previous: Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB)

The Intel® XDK provides a cross-development environment for creating Node.js* IoT applications that run on headless embedded Linux* systems. The tools used to assemble this environment within the Intel® XDK are standard open-source tools. This article provides a starting point for assembling a similar set of tools for developing Node.js applications on headless IoT Linux devices.

Intel® XDK IoT Development Components

That part of the Intel® XDK that supports IoT application development includes the following key functional elements:

An editor that is JavaScript* friendly.
Tools to connect to headless IoT Linux devices.
Samples that can be used to learn how to interact with I/O sensors.
A means to remotely debug your Node.js application.

Useful alternatives to the first three components are described below. An alternative to the last component (remotely debugging your Node.js application) is described in a companion article titled Using Chrome DevTools to Debug your Remote IoT Node.js Application.

JavaScript* Friendly Editors

A variety of free JavaScript friendly editors can be used as alternatives to the Intel® XDK. In addition to the free open-source Brackets* editor from Adobe*, the editor built into the Intel® XDK, there is also the open-source Atom* editor from GitHub* and the free and open-source Visual Studio* Code editor from Microsoft*.

If you have been using the Intel® XDK it is not a requirement that you continue to use the Brackets editor. Ultimately, you should pick an editor that works for you, any of the free alternatives mentioned above will work, as well as many popular fee-based code editors such as Webstorm* by JetBrains* and Sublime Text* by Sublime HQ*.

Brackets

The editor built into the Intel® XDK is the Brackets open-source editor, which is sponsored by Adobe and was designed expressly for development of HTML5 (JavaScript, CSS and HTML) applications. In fact, the Brackets editor itself is an HTML5 app! The Brackets editor can be extended via many open-source extensions. Some extensions to consider adding to Brackets are:

Use this "javascript development with brackets" search link to find many other suggestions.

The Brackets editor runs on Microsoft Windows*, Apple macOS* and most distributions of the Linux* OS.

Atom

GitHub has created an open-source editor called Atom that, like Brackets, is also built on HTML5 technologies. It has many free packages which can be used to extend the editor. Some extensions to consider adding to Atom are:

atom-js-console
GitHub for Atom
Linter plus linter-jshint or linter-eslint

NOTE: Of particular interest is the Nuclide package for Atom. This "add-on" provides built-in support for Chrome DevTools and remote development of Node.js applications, akin to that which is provided by the Intel® XDK.

Use this "javascript development with atom" search link to find many other suggestions.

The Atom editor runs on Microsoft Windows, Apple macOS and most distributions of the Linux OS.

Visual Studio* Code

The Visual Studio* Code editor is a general-purpose open-source programming editor from Microsoft that includes many features that are specific to developing JavaScript applications, as well as extensions for JavaScript and Node.js application development. Despite this editor's name, a Microsoft Visual Studio* license is not required to use the editor, nor does its use depend on having Visual Studio installed on your system. It is a standalone editor. Some extensions to consider adding to Visual Studio Code are:

Use this "javascript development with visual studio code" search link to find many other suggestions.

The Visual Studio Code editor runs on Microsoft Windows, Apple macOS and most distributions of the Linux OS.

Connecting to Headless IoT Devices

On the Intel® XDK IoT Develop tab there are several features to help you locate a headless Linux IoT device on your network, login to your IoT device and transfer project files to your device. You can use the following tools and techniques to perform similar tasks if you are working outside of the Intel® XDK:

ssh to remotely login to an IoT device.
MDNS (aka Boujour or Avahi) to locate IoT devices by name.
SSHFS to share project files with your IoT device.

Enabling Remote Login using SSH

Remote login into your Linux IoT target device is most easily accomplished by using SSH. This allows you to get a shell prompt over the network, from your host development system to your IoT target device. Using this feature requires an SSH client on your development machine (aka PuTTY on many Windows machines) and an SSH server on your IoT target.

NOTE: Some IoT targets include a serial TTY port that can be used to obtain access to a login shell. Technically, this is not a "remote login" because you must physically connect your host development machine to your IoT target, usually via a serial over USB connection or a conventional RS-232 or "TTL" serial port. You will need to use this serial TTY port to install and configure an SSH server if your IoT device does not support the use of a keyboard and monitor.

Depending on the Linux distribution that is installed on your IoT target device, there may already be an SSH server installed. If you are using an IoT device that you previously used with the Intel® XDK, it should already have an SSH server running on it. If not, you may have to install an SSH server on your IoT device to enable remote login using SSH.

If there is no sshd (or equivalent ssh server) running on your Linux IoT device, type the following command at a bash command-line prompt on your IoT device (these instructions assume an Ubuntu* distribution, other Linux distributions may require different instructions):

$ sudo apt install openssh-server

NOTE: The Intel® XDK requires that you enable the ability to remote into your IoT device as root, because the xdk-daemon assumes, and the mraa library requires, that your IoT applications run with root privileges. This is an overly restrictive limitation that is not otherwise required. If you want to enable remote login as root, search for "enabling root login via ssh" for details.

If you do not want to add a root user to your remote IoT device, a convenient way to "become root" temporarily is to use the "sudo -E bash" command. This will allow you to act like a root user and will retain your regular user's environment and home directory. When you no longer need to "be root" simply type "exit" to return to your regular user shell session.

Once the SSH service has been enabled on your IoT target device, you can use your favorite SSH client to login to your IoT device over the network. Depending on your network configuration, you may have to identify your IoT device by its numerical IP address to establish a remote SSH connection. See the next section for a solution to that problem.

From an Apple macOS* machine, a Linux* desktop or a Microsoft Windows® 10 machine that has "Bash on Windows" enabled, the simplest way to remotely login to your IoT device is by using the "ssh" command (substitute the username and IP address for your IoT device):

$ ssh my-iot-username@192.168.2.15

The IP address 192.168.2.15 shown above is a placeholder and will probably not work! You must use the IP address of your IoT device to establish a remote ssh connection!!

If you do not have "Bash on Windows" enabled (or you have an older version of Windows) you can install the MinGW MSYS utilities for a copy of the ssh command that will run directly from a Windows command prompt or from within the MSYS bash prompt (remember to add the MSYS utilities to your PATH). Otherwise, a free and popular alternative to ssh for Windows development hosts is the PuTTY ssh and telnet client.

Adding MDNS Services to Your IoT Device

If you are working on a small or unmanaged network (such as the typical home network), you likely do not have a name server to help locate your IoT device by name. Unless you have specifically configured your router's DHCP services to provide a fixed IP address to your IoT device, the IP address associated with your IoT device is subject to change.

To address this problem add MDNS (Avahi) services to your Linux IoT device. This technique is used by the Intel® XDK to identify and locate IoT devices. The use of MDNS is limited to those situations where your development system (e.g., your laptop) and your IoT device reside on the same subnet (which is typical of most home networks).

NOTE: The precise instructions to install MDNS on your target IoT device will vary as a function of the specific Linux distribution that is installed on your IoT device. The instructions below assume your IoT device is running Ubuntu.

If your IoT target device is running Ubuntu, type the following commands on your remote IoT device shell (i.e., you are logged into your IoT device via a remote shell using ssh/PuTTY or the IoT device's open TTY serial port, if it has one):

$ sudo -E apt install avahi-daemon avahi-autoipd avahi-utils

Then test the MDNS service by typing the following commands via the same remote shell on your IoT device:

$ sudo systemctl is-active avahi-daemon.service
active
$ ping $HOSTNAME.local
PING my-iot-ubuntu.local (10.7.188.149) 56(84) bytes of data.
64 bytes from my-iot-ubuntu.intel.com (10.7.188.149): icmp_seq=1 ttl=64 time=0.051 ms
64 bytes from my-iot-ubuntu.intel.com (10.7.188.149): icmp_seq=2 ttl=64 time=0.090 ms

If you see results similar to those above, your MDNS service is running on your IoT device.

With the MDNS service installed and running on your IoT device, you can quickly identify your IoT device using the special local domain that is employed by MDNS/Avahi/Bonjour (assuming, of course, that you know the hostname of your IoT device and your IoT device and development system are on the same local subnet).

On a typical Ubuntu Desktop or Apple macOS machine the MDNS/Bonjour/Avahi services and utilities are already installed. In that case, type the following command at a terminal shell on your Apple or Ubuntu Linux development machine:

$ ping my-iot-hostname.local

Where "my-iot-hostname" is the base hostname of your IoT device. For example, using the results of the earlier on-device"ping $HOSTNAME.local" test, the following command typed on your development system would locate and identify that IoT device:

$ ping my-iot-ubuntu.local

A Windows machine needs the Apple Bonjour Service installed to perform this ping test. The Bonjour service is typically included with iTunes or any one of a variety of other applications that employ MDNS (typically software designed to support networked printers, media servers and NAS drives on unmanaged networks). To determine if the Bonjour service is already running on your Windows system, type the following at a Windows command prompt:

> net start | find "Bonjour"
   Bonjour Service

If you see the output "Bonjour Service" it means MDNS/Avahi/Bonjour is installed and running on your Windows system. In that case, the ping test described above will also work from a Windows command line.

If the Bonjour service is not running on your Windows system, it may have been disabled or you may need to install it. To install the Bonjour service, install iTunes or use the "Bonjour Print Services for Windows" installer available at support.apple.com/bonjour.

For more information about installing and configuring MDNS/Avahi/Bonjour, see the following:

Using SSHFS to Share Files

When developing applications for a headless IoT target, it is generally easier to run your development tools (editor, package manager, debugger, etc.) on your development host (your laptop or desktop machine) and then copy the application you are developing to your IoT device, where you can then run it. You may be able to run your development tools directly on your IoT device, but if your IoT target is headless, has limited RAM and storage space, or is physically inaccessible, using your IoT device as your development host is not practical.

Developing Node.js JavaScript applications for your IoT device is more difficult when you separate the development host system from the target runtime system. Because Node.js applications are not compiled into a single executable, like a typical C/C++ application, you need a way to efficiently and accurately copy all the Node.js application files to your IoT device for testing and debugging.

With the Intel® XDK, development is performed locally on your host dev machine and your project is copied to your IoT target device to run and debug. In essence, the Intel® XDK would TAR the local project folder on your host, copy it to your target device and unTAR it on the target device (into a folder named "/node_app_slot"). To duplicate this process manually, you could do something like the following (assuming a folder named "node_app_slot" already exists on the target device).

First, on the host development machine:

$ cd my-project-folder
$ tar cvjhf my-project.tgz *
$ scp my-project.tgz user@iot-hostname.local:
$ ssh iot-username@iot-hostname.local

Then, on the remote IoT device (using the remote ssh channel that was opened in the prior step):

$ cd node_app_slot
$ rm -rf *
$ tar xvf ../my-project.tgz

After your application files have been copied to your IoT device you can commence with package installation (if required) and running and debugging the application.

Obviously, the above process is very tedious if you are making many small changes to your code and/or need to install NPM packages as part of the installation of the application onto your IoT device.

NOTE: It is important that installation of node packages, especially those that include installation of binary components, be done on the target device, not on the host. This is done to insure that the binary components of the node package match the architecture and OS of the system that will be using them (i.e., your IoT target device).

If the architecture and the OS on your target IoT device match your host development machine, you can perform the installation of NPM packages on your host, before copying your project to the IoT target.

There are many optimizations that can be made to this process, such as copying across only those files that have been added or changed. Another optimization is to use the "scp -r" option and skip the tar/untar step. Even faster would be to use the "rsync" tool to keep your host and target project folders mirrored and in sync (keep in mind the issues regarding NPM package installation). Trying to manage this process by hand is clumsy and error prone.

Using a Network File system, Instead

Rather than using the copy method described above, you can manage a single set of sources and project by using a shared network file system. There are essentially two ways to approach this:

Export an application project folder on the target and mount it on the host
Export an application project folder on the host and mount it on the target

Option #2 has the advantage of insuring your project source code "lives" on your host development system. This makes it easy to manage and backup your application source code files.

Option #1 has the advantage of being easier to setup and implement on a variety of IoT devices. This option also insures that the application code to be run resides on the device that is running the code, which is important if you experience network file system delays and/or disconnects while running and testing your application. These network interruptions are a significant problem for scripted languages that are interpreted at runtime, like Node.js applications.

NOTE: The simplest way to implement option #1 is to use SSHFS. Since you already have an SSH server on your IoT target, there is no additional software to install on your target IoT device (such as a Samba server or similar network file system). This is especially helpful for working with resource limited IoT devices.

To understand how to use SSHFS for this task, read the first page of this article. There may be additional software required to install on your host development system to make SSHFS work. For a review of how to install this additional software on your host development system, see this excellent installation article.

In essence, once you have the SSHFS software installed on your host development system, you will issue a command that looks something like the following (on your host development machine):

$ sshfs user@my-iot-device.local: my-host-mount-folder

This will result in mounting your IoT device's remote file system and making it visible at <my-host-mount-folder> on your host development system.

Once mounted, you can use your favorite editor and other tools to work on the application. You will also need to initiate a parallel SSH connection to your remote IoT device to deal with installing NPM packages and running the application (on your IoT device). Be sure to make backups of the sources located on this remote IoT project folder, either onto another location on your host system or into a cloud repository, to insure that you do not lose your source code in the event something happens to your IoT device!

For help using remote Chrome DevTools session to debug your remote Node.js IoT application see this companion article.

NOTE: The Nuclide package for Atom "add-on" also provides built-in support for Chrome DevTools and remote development of Node.js applications, similar to the SSHFS method described above, but within a complete development environment.

JavaScript* Samples Written for Node.js* IoT Devices

The IoT Node.js samples included with the Intel® XDK are published in the Intel managed IoT DevKit GitHub* account and GoMobile GitHub* account, specifically, in these repos:

The majority of these samples rely on the open-source MRAA and UPM libraries for access to local device I/O (a few samples require only the MRAA library to function). The MRAA and UPM Linux libraries provide access to low speed sensors, serial communication ports and actuator outputs. They are written in C/C++ and include API bindings for the Python, JavaScript (Node.js) and Java* languages.

A special service called imraa is included with MRAA to utilize an Arduino 101* (branded Genuino 101* outside the U.S.) board attached to your IoT target device via USB as an "I/O extender" for easy prototyping of IoT "edge device" applications, by way of the firmata sketch software.

For additional details regarding the MRAA and UPM libraries, see the following pages:

MRAA doc pages and the UPM doc pages for API details
MRAA readme page for a complete list of supported IoT hardware platforms
Installing MRAA on your IoT device
Installing UPM on your IoT device

Installing MRAA and UPM

Before you can use these samples you need to confirm that you have the MRAA library and (optionally) appropriate UPM libraries installed on your IoT target device. To determine if mraa is installed as a global node module, and that it can be included in a Node.js application, type the following at your IoT device prompt:

$ npm -g list --depth=0
/usr/lib
├── mraa@1.6.1
└── npm@2.15.11
$ node
> var x = require('mraa')
undefined> x.getVersion()'v1.6.1'> .exit

The specific versions reported by your system may vary from those shown above.

NOTE: Installing the MRAA and UPM libraries globally is not a requirement, it is done only as a convenience, especially if you are developing many applications that depend on these libraries. If you prefer, you can install the necessary MRAA and UPM libraries as local node modules within your application project workspace.

The following commands assume the use of an Ubuntu IoT device. For other Linux distributions, see the MRAA installation instructions.

At the time this article was written the MRAA library did not support Node.js 7.x or higher. Please check the MRAA library README for the latest information regarding which versions of Node.js are supported by MRAA.

$ sudo add-apt-repository ppa:mraa/mraa
$ sudo apt update
$ sudo apt install libmraa1 libmraa-dev mraa-tools mraa-imraa python-mraa python3-mraa

The update command (middle line above) may take some time to complete, be patient!

To confirm that the imraa service was successfully installed, type the following on your IoT target device command-line:

$ which imraa
/usr/bin/imraa

Now install the mraa node module (the version number you see reported may vary from below):

$ sudo npm -g install mraa> mraa@1.6.1 install /usr/lib/node_modules/mraa> node-gyp rebuild

...many compile messages, with many warnings...

  SOLINK_MODULE(target) Release/obj.target/mraa.node
  COPY Release/mraa.node
make: Leaving directory '/usr/lib/node_modules/mraa/build'
mraa@1.6.1 /usr/lib/node_modules/mraa

Do not be alarmed by the many warning messages during the compilation phase of the installation. This is normal.

NOTE: Install only those UPM modules you need for the samples that interest you. It is not necessary to install every UPM library module available in the UPM repo.

Additional Resources

↧

Using Chrome* DevTools to Debug your Remote IoT Node.js* Application

November 3, 2017, 1:40 pm

Latest and popular articles on Intel Technologies

≫ Next: Sharing VR Through Green Screen Mixed Reality Video

≪ Previous: Alternatives to Using the Intel® XDK to Develop Node.js* IoT Applications

If you prefer using Chrome* DevTools to debug your JavaScript* applications, rather than the command-line debug tool built into Node.js*, you should consider using one of these two options:

Chrome Inspect for Node — if your IoT Node.js version is 6.3.0 or newer.
Node Inspector— if the version of Node.js on your IoT device is 6.2.x or older.

Both support a connection to Chrome DevTools in the Chrome browser on your host development system to the remote Node.js application running on your headless IoT target device. Additional Node.js debugging options are described on the Debugging Node.js Apps documentation page.

For help setting up a remote Node.js for IoT JavaScript development environment, see the companion article titled Alternatives to Using the Intel® XDK to Develop Node.js* IoT Applications.

NOTE: The Nuclide* package for Atom* "add-on" provides built-in support for Chrome DevTools and remote development of Node.js applications within a convenient development environment. You will need to install the Atom editor by GitHub* to use this tool.

Using Chrome* Inspect for Node.js*

This debug technique requires:

Chrome Browser version 55 or higher on your host development system
Node.js version 6.3.0 or higher on your IoT target device

If you are using older versions of the above components consider using the Node Inspector debug solution, described in the next major section of this document.

If your versions of Node.js and Chrome meet the requirements stated above, there is nothing more that needs to be installed onto your target IoT device or your host development system.

Configuring Chrome*

Some initial configuration in the Chrome browser is required to make this work. Open your the Chrome browser on your host development system and type the following into the URL address bar:

chrome://inspect

Then, make sure the Discover network targets box is checked and push the Configure... button located to the right of that checkbox item (as shown in the image below).

The Configure... button will bring up a Target discovery settings dialog similar to the one shown below. In this dialog you can add the names (or IP addresses) of your IoT target devices, appended with port 9229.

In this example, we are using an IoT device named "de3815-paf2" which is addressable as "de3815-paf2.local" on the local network via MDNS/Avahi/Bonjour (see this companion article for more information about configuring MDNS). If you have a corporate name service on your network the device might be named something like "my-iot-hostname.mycompany.com" or it might be accessible simply as "my-iot-hostname" without any domain qualifier. Use any name that responds to a "ping" command, or use the device IP address if it is statically assigned and you know it will never change.

Using Chrome* Inspect

With "chrome://inspect" configured and running, you can get started debugging your remote IoT application in Chrome DevTools. Using ssh (or PuTTY), log into your IoT target and start the application in "debug mode" by typing the following:

$ node --inspect --debug-brk <my-app-name>.js

In the example below, the application to be debugged is named main.js and is located in the ~/node-debug-test folder.

The application being debugged in this example uses the MRAA library, which requires that this application be run as root. Thus the reason for the sudo -E bash command, which is simply a convenient way to get a root prompt.

The --debug-brk option pauses your application at the first line of executable code. Additional command-line options are described in the Node.js debug doc pages.

NOTE: DO NOT open the long "chrome-devtools://devtools/..." URL! That only works if the application and Chrome are on the same system. In this case the application is running on your IoT target device and Chrome is on your host development system.

Once the application has been started in "debug mode" it should appear in the chrome://inspect tab in the Chrome browser window on your host development system. If you do not have Chrome running, start it and enter chrome://inspect into the URL address bar to see a list of debuggable devices and applications.

Click the blue inspect link that appears in the chrome://inspect tab.

In this example you can see the application waiting to be debugged is listed under the title "Remote Target #DE3815-PAF2.LOCAL" and lists the Node.js version (v6.10.3) followed by the name of the application (main.js) and the location on the IoT file system of that application (/home/ubuntu/node-debug-test/main.js).

Clicking the inspect link should bring up an instance of Chrome DevTools for Node.js, similar to that shown below. Obviously, the details of the Chrome DevTools window will vary as a function of the application you are debugging. This is a view of the "blink LED" sample application borrowed from the Intel® XDK.

From here you can use a version of Chrome DevTools tailored for use with Node.js to single-step, break and inspect objects within your application. Notice also that you have access to the JavaScript console for quick tests and debugging.

For help with using Chrome DevTools, see these doc pages on the Google Developers site.

NOTE: if you use console.log() messages in your application they should appear in the Chrome DevTools console, and they will also appear in your remote IoT device's ssh console. This is especially of value if your application ends immediately after issuing some console.log() messages, in which case those messages will appear in the ssh console but may get dropped from the remote Chrome DevTools console, due to a sudden loss of the debug connection.

Using Node Inspector

This debug technique works best with:

Chrome Browser on your host development system
Node.js version 6.2.x or lower on your IoT target device

NOTE: If your IoT target device is running Node.js version 6.3.0 or higher consider using the Chrome Inspect for Node debug solution described in the previous section. It provides a superior debug environment and is easier to setup.

Installing Node Inspector

Before you can use this technique you must first install the Node Inspector node module on your IoT target. Log into your IoT target device, using ssh or PuTTY, and type the following command:

$ npm install -g node-inspector

No additional software needs to be installed on your host development system, other than the Chrome browser.

Starting a Node Inspector Debug Session

At the remote login shell on your IoT target device, type the following:

$ node-debug --debug-brk --cli --web-host 192.168.0.2 my-app.js

Where the IP address listed after the --web-host option is your IoT target device's IP address and my-app.js is the name of the Node.js application you are going to run and debug on your IoT system.

If you are uncertain of your IoT device's IP address, type "ping $HOSTNAME" on the remote login shell of your IoT target device and use the IP address that is reported by that command.

In the example below, the application to be debugged is named main.js and is located in the ~/node-debug-test folder.

The --debug-brk option pauses your application at the first line of executable code and the --cli option makes sure node-debug does not attempt to automatically open a browser on your remote IoT device. Additional command-line options are described in the Node Inspector README.

Once the application has been started in "debug mode" it prints a URL (e.g., "http://192.168.20.37:8080/?port=5858" in the example above). Copy that URL into the Chrome browser window on your host development system to open a copy of Chrome DevTools, as shown below.

The details of the Chrome DevTools window you see will vary as a function of the application you are debugging. This is a view of the "blink LED" sample application borrowed from the Intel® XDK.

From here you can use Chrome DevTools to single-step, break and inspect objects within your remote Node.js application. Notice also that you have access to the JavaScript console for quick tests and debugging.

For help with using Chrome DevTools, see these doc pages on the Google Developers site.

NOTE: if you use console.log() messages in your application they should appear in the Chrome DevTools console, and they will also appear in your remote IoT device's ssh console. This is especially of value if your application ends immediately after issuing some console.log() messages; in which case those messages will appear in the ssh console but may get dropped from the remote Chrome DevTools console, due to a sudden loss of the debug connection.

Additional Resources

↧

Sharing VR Through Green Screen Mixed Reality Video

October 30, 2017, 2:21 pm

Latest and popular articles on Intel Technologies

≫ Next: Exploit Nested Parallelism with OpenMP* Tasking Model

≪ Previous: Using Chrome* DevTools to Debug your Remote IoT Node.js* Application

Virtual Reality (VR) is an amazing experience. However, it’s also a solo experience that can be hard to describe to anyone yet to don a headset and make the leap into that virtual world. As VR continues to expand its horizons in games, art, and a whole string of commercial applications from real estate to health, one of the enduring challenges for its cheerleaders is effectively showcasing the experience to those without access to VR hardware. Regular 2D videos shot from the first-person perspective of the user don’t do justice to the real experience; their limited field of view prevents them from truly giving a sense of the immersion into a 360-degree world.

To tackle this, VR hardware producers, application developers, and video makers have created a new VR video production paradigm, with green-screen, mixed-reality video. Shot from a third-person perspective, the technique allows the production of 2D videos that show the user in the heart of the experience—immersed in a virtual world, and interacting with the elements in it—in a way that first-person perspective videos simply can’t do. Until the day when every home has a VR headset, green-screen, mixed-reality video is likely to remain the best way to share the incredible VR experiences developers around the world are creating.

sharing vr through green screen mixed reality video

Figure 1: Screenshot from the mixed reality HTC Vive* VR demo trailer released in April 2016.

This paper introduces developers (and anyone else interested in the medium) to the basic principles and techniques for the creation of green-screen, mixed-reality videos for VR experiences. It will look at the hardware and software stack, and the process of enabling and producing mixed-reality video for VR games and applications, with a view to equipping developers to take their first steps with the technique. Additional companion articles and videos will follow in the future, and readers can stay up-to-date with developments by joining the Intel® Game Dev program at software.intel.com/gamedev.

Trailers and Streamers

Even though virtual reality has benefited from waves of hype and attention over recent years, it is still a relatively young technology. In that context, the technique of creating VR mixed-reality video is very new, having only come to the fore since early 2016, when HTC Vive* released its own VR mixed-reality demo video, and mixed-reality trailers appeared for games—including Job Simulator* from Owlchemy Labs, and Fantastic Contraption* from Northway Games.

The applications of VR will continue to be explored for many years to come, but, right now, the ability to capture the essence of the experience in a 2D video using mixed reality is of great interest to developers—such as those behind Owlchemy Labs’ Rick and Morty: Virtual Rick-ality*—as they seek to communicate the appeal of their experience to a broad audience, and to differentiate from the growing number of VR games hitting the market. It’s also a vital tool in effectively showcasing VR apps like Google’s 3D virtual art creation tool Tilt Brush*, as demonstrated in season two of SoulPancake’s Art Attack* series on YouTube*.

Screenshot from a mixed-reality video for Art Attack in which artist Daron Nefcy uses Google Tilt Brush

Figure 2: Screenshot from a mixed-reality video for Art Attack* in which artist Daron Nefcy uses Google Tilt Brush*.

Mixed-reality video is, without doubt, the best technique to use in a promotional trailer for a VR experience—as the work of one of the genre’s leading trailer producers, Kert Gartner, amply demonstrates. Legions of streamers and YouTubers—including Barnacules Nerdgasm* and DashieGames*—have begun to exploit this technology to great effect in their videos of playing VR experiences, bringing a new dimension to the presentation, and leaving increasing numbers of VR converts in their wake.

Screenshot from Kert Gartner’s mixed-reality trailer for Space Pirate Trainer by I-Illusions

Figure 3: Screenshot from Kert Gartner’s mixed-reality trailer for Space Pirate Trainer* by I-Illusions.

From a developer’s point of view, enabling a VR app or game for mixed-reality video creation requires at least a degree of forethought, and potentially some more serious programming. In the context of a developer’s limited resources, it may not always seem the highest priority—but it’s worth making time for in the schedule.

“Being able to show the experience to people outside of the headset is the biggest benefit of this technique,” said Josh Bancroft, Community Manager in the Developer Relations Division at Intel. “A secondary benefit is that you’re enabling that vast army of content creators, streamers, and YouTubers by making it easy and attractive for them to stream your VR game. That helps you increase your reach and get more people seeing, and hopefully playing, your game.”

Josh and his colleague Jerry Makare, who runs the Developer Relations Division video team at Intel, have been working with VR mixed-reality video for over a year, including setting up a live demo of the technology for developers to try at the 2017 Game Developers Conference. Josh and his team’s natural attraction to any exciting new technology drew them to the first examples of the technique, with the ultimate goal of helping to make it more accessible and to facilitate its adoption by their community of developers.

Building the Mixed Reality Stack

Mixed-reality video of a VR experience requires two central components that must be perfectly synchronized with one another, in high quality. The components include live footage of the user interacting with the app, and footage from the virtual environment generated by the VR application or game. The tasks required include running the VR app (including the generation of an additional third-person camera view), capturing the live green-screen video, chroma key processing, compositing, encoding, and output of the final mixed-reality video for recording or streaming.

Virtual Eyes

The first stage requires instructing the VR app or game software to add a virtual third-person, in-game camera, which points toward the player’s virtual position in the app environment. This is vital to being able to show footage of the user actually within the virtual game environment, and is in addition to the standard first-person camera that produces the immersive 360-degree image the user sees in the headset. In software terms, this additional camera is implemented in much the same way that any virtual camera is placed in a game, namely by setting the seven variables that decide where it is, and what it sees, in the virtual 3D volume: X, Y, and Z position; X, Y, and Z rotation; and field of view (that is, how narrow or wide the shot is).

In a VR application, the user’s first-person camera position and rotation is controlled by the movement of the headset, as registered by the sensors in the physical space around the user, while the field of view is set by the developer depending on what the experience is, and how much of the environment is optimal for them to see. Defining an additional, virtual third-person camera is a relatively straightforward task in most game engines; the new camera must be adjusted to give the desired view of the user within the environment—that is, not too close that we lose sight of them easily when they move, and not too wide that they’re too far away or start to become lost in the environment.

In addition to the first-person headset view and the third-person view, the app also needs to be instructed to output two further views, namely background and foreground views, as seen from the new virtual third-person camera. These are the layers that will be combined with the live-action footage of the user for compositing the final mixed-reality output.

Go Green

Once the app’s virtual cameras and video outputs have been dealt with, it’s time to look at what needs to be done in the real world. The first prerequisite is a suitable space that’s big enough for the user to move around in, as required by the app or game. To create the full mixed-reality video experience where the user is shown fully immersed in the environment of the app, shooting the user on a green screen, which can them be removed in the video-processing stage using a chroma key filter, is necessary.

The green-screen studio setup used by the Intel team at Computex in Taipei, May 2017

Figure 4: The green-screen studio setup used by the Intel team at Computex in Taipei, May 2017.

The area covered by the green screen needs to be large enough that the physical camera can move around the space without getting in the way of the user, and without reaching the edges of the volume—otherwise, a sudden burst of real-life studio interior can appear on the edges of the final composited image, breaking the immersive illusion. The larger the closed green space, the more options the camera has to move around the user. The lighting of the green space also needs to be as even as possible across all surfaces in order that the chroma key filter can accurately register and remove all the green color from the image, without any patches remaining.

It is possible, however, to create very effective mixed-reality video without using green screen, depending on the VR experience in question. For example, a VR art application such as Tilt Brush is exclusively about the creation of foreground elements, so there is no requirement regarding the background. This means that while the user could be shot on a green screen with any background added in virtually, they could just as easily be shot in any physical environment with a suitable background. The foreground objects would appear superimposed on the real environment (as in augmented reality), and the user would be able to interact with them.

Screenshot from Kert Gartner’s mixed-reality trailer for Fantastic Contraption, made without green screen

Figure 5: Screenshot from Kert Gartner’s mixed-reality trailer for Fantastic Contraption*, made without green screen.

Physical Camera

When the new in-app third-person camera and the physical shooting space are figured out, the next step is setting up the physical camera. A webcam could be used if it’s the only thing available, but the quality of the final video will be directly impacted by the quality of the camera used; a DSLR or professional video camera outputting a signal at a resolution of 1080p or higher will deliver a significantly better result. The crucial part at this stage is binding the physical camera to the newly created virtual camera so that they see the same thing. This requires an exacting calibration process.

The upper image is the raw green-screen footage

the lower image is the final composited image with background and foreground layers added in real time

Figure 6: The upper image is the raw green-screen footage; the lower image is the final composited image with background and foreground layers added in real time.

The third-person image from the app needs to be seamlessly composited with the image from the physical camera, so that the elements align precisely—particularly the player’s hands, which will be holding the two VR controllers that are tracked in space. If this isn’t done correctly, it’s possible to end up with virtual in-game hands, or other handheld items that are anything from a few centimeters to a couple of feet away from the user’s real hands. This, of course, completely breaks the illusion of immersion and makes the resulting video look very odd.

In the case of the HTC Vive that Josh and his team have worked with extensively, the virtual and physical cameras are bound together by attaching a Vive Tracker* (a hockey puck-like sensor) or a Vive controller to the physical camera so that the Vive sensors can map space to sync with the virtual in-game camera. This third tracker, or controller, attached to the physical camera is the linchpin between the virtual and physical worlds, facilitating the entire process.

Calibration

The calibration process is concerned with lining up the in-game virtual camera with the physical camera. The seven key variables (as described previously for the virtual third-person camera) need to be perfectly matched so that the two cameras are pointed in exactly the same direction, see the same thing with the same field of view, and track perfectly with each other when they move. These variables are the X, Y, and Z positions; the X, Y, and Z rotation; and the field of view.

The calibration process can be done by simple trial and error; changing the values and iterating until the position of the controllers matches with the virtual hands in the app. Manual calibration is extremely difficult, however, and as the efficacy of the mixed-reality illusion relies on the accuracy of this calibration process, it’s better to enlist help where possible.

Screenshot from MixCast VR Studio showing the seven values for position, rotation, and field of view

Figure 7: Screenshot from MixCast VR Studio* showing the seven values for position, rotation, and field of view.

A number of tools are available, but, for the various demos and related work produced to date, Josh and Jerry have been using MixCast VR Studio* by Blueprint Tools. It’s a purpose-built suite of tools developed to help those producing VR and mixed-reality videos, and includes good calibration support. It works by first being told which input device your camera is, then using its green-screen chroma key functionality to support the calibration process.

Many Hands

According to Josh, calibration works best when it is a two-person process. Prior to starting, it’s important to ensure that the physical camera and the attached tracker are as level as possible. “Use a bubble level, or a phone with a compass app, to tell you whether it’s level or not,” said Josh. “The process will be more precise, and much easier, if you start with the camera and the tracker as close to perfectly level and square as you can.”

Next, one of the hand controllers, or a tracker puck, is fixed on the body of the camera, and MixCast is told which device is the one attached to the camera. Next, Quick Setup is launched, followed by two sets of crosshairs appearing on screen which need to be lined up. Once that’s done, a click begins the calculations for the field of view and the rough position and approximate alignment of the virtual and real cameras, followed by a fair amount of necessary fine tuning to make it perfect.

Josh also emphasized the importance of the user position when going through the calibration process. “When you’re standing in front of the camera, try to stand perfectly square to it, so your shoulders are square and you’re lined up with the center of the lens,” he explained. “It makes things that much easier in terms of not having to compensate for those positional differences when you’re trying to line up everything in three dimensions.”

Using MixCast VR Studio to perform the calibration process

Figure 8: Using MixCast VR Studio* to perform the calibration process.

Once everything is lined up inside MixCast VR Studio, it will look right when the user picks up and interacts with objects—and when the physical camera is moved around in the real world, the virtual in-app camera will move with it in perfect sync. The calibration values can then be copied from MixCast VR Studio to the configuration file and reused later, or used with any application that has the same kind of mixed-reality enablement (at the time of writing, this includes most Unity* titles that use the SteamVR* plugin). As long as the camera and the tracker stay physically locked together, and their physical relationship doesn’t change in space, those values will remain accurate, although fine tuning may be required. In live demo environments, Josh and Jerry recalibrate the cameras at least once a day to ensure accuracy.

Compositing

At this point, the app is providing background and foreground visual feeds, and the position, rotation, and field of view of the virtual and physical cameras are synchronized, allowing the user to interact with the VR app and to have their actions accurately rendered in space from a third-person perspective. The next task is to bring the physical camera feed (green screen) into the PC using a video capture device, along with the two feeds from the app (background and foreground).

The image feeds can be brought into any software program capable of performing the chroma key and compositing operations required to produce a single mixed-reality video output. A number of software suites designed for streamers and video producers can perform the necessary tasks, including XSplit Broadcaster* and OBS Studio*, with Josh and Jerry having worked primarily with the latter.

The third-person view from the app is displayed in a single window divided into quarters, comprised of the third-person background and foreground views; the standard first-person headset view; and a fourth view, which is an alpha mask. The first-person view is not required for the mixed-reality compositing, but provides a useful reference of what the user is actually seeing, and can be used to cut to during live streaming and recording situations. The alpha mask is also non-essential unless there is black in the foreground image, but, while being relatively complex to implement, can improve the overall visual quality by making edges smoother.

Virtual Rick-ality* game. Top-left is foreground, bottom-left is background, top-right is the mask, and bottom-right is the first-person view

Figure 9: Screenshot from OBS Studio* showing the four quadrants from the Rick and Morty: Virtual Rick-ality* game. Top-left is foreground, bottom-left is background, top-right is the mask, and bottom-right is the first-person view.

An important point regarding the window that displays the quadrant screen is that it needs to be at least four times the resolution of the final video output (for example, 4K for a final 1080p output). This is because the compositing software takes the individual quadrants of the screen into the compositing process, meaning that the images used for compositing are a quarter of the entire screen size. Anything less than 4K, and the final video will be sub-1080p HD resolution.

The individual screen quadrants, and the live camera green-screen feed, are then all brought into OBS Studio. The chroma key filter is applied to the live footage to remove the green background (much as when YouTubers show only their head and shoulders superimposed onto game footage), giving a cutout of the user, which can be placed directly on top of the app background layer. The foreground layer from the app engine is comprised of the foreground elements on a black background, so a key-color filter is applied to remove the black, creating a foreground layer with a transparent background that can be directly applied onto the image as the final third layer. This will work unless there is black in the foreground image, in which case the key-color filter will remove foreground elements that need to be retained. In this case, the alpha mask layer should be used instead of the key-color filter. The final composited video can then be encoded and output for recording and/or a live stream.

Using the key-color filter in OBS Studio to remove the unwanted black area of the foreground layer

Figure 10: Using the key-color filter in OBS Studio* to remove the unwanted black area of the foreground layer.

Extreme Megatasking

The entire process of producing mixed-reality VR video is extremely hungry when it comes to processing power. Josh describes it as extreme megatasking—a collection of processes that go well beyond the average requirements for a PC. Individually, running a VR game and video compositing are already extremely demanding tasks. With this process, the system needs to do both, in parallel, in minimum 4K resolution, while maintaining a high, 90 frames per second frame rate (to lower the risk of motion sickness), and simultaneously encoding and outputting the signal for streaming and recording.

The load on the CPU and graphics processing unit (GPU) is such that, in normal circumstances, it’s too much for a single computer to handle at anything better than low quality. At the Game Developers Conference 2017, Josh built a custom PC to the highest possible specification without going into the realms of the extreme. Equipped with a sixth-generation, water cooled, quad core Intel® Core™ i7-6700K processor, overclocked to run at 4.2 gigahertz, a top-of-the-range Nvidia GTX* 1080 GPU, and fast solid state drive memory, “it could handle any game you threw at it with ease, including VR titles,” said Josh.

However, it wasn’t enough to handle the end-to-end workflow for creating VR mixed-reality videos without seriously compromising video quality. A second system was brought in to handle the encoding and streaming, while the main rig ran the VR app and performed the capture and compositing.

There is, however, a new option emerging, in the form of the recently announced Intel® Core™ X-series processors with 12 and 18 cores, which are now rolling out commercially. Josh and his team have run the entire green-screen, VR mixed-reality video workflow successfully on the systems, both at the world premiere of the Intel Core i9 processors at Computex Taipei in May 2017, and a couple of weeks later at the Electronic Entertainment Expo (E3) in Los Angeles. These powerful processors reduce the need to split tasks across multiple machines while maintaining the quality—greatly simplifying the process, and allowing creators to replace multiple PCs and specialized equipment with a single, very powerful PC.

Intel stage presentation at Computex in Taipei, May 2017, showing the green screen studio (top-left), and the live mixed-reality stream (top-right)

Figure 11: Intel stage presentation at Computex in Taipei, May 2017, showing the green screen studio (top-left), and the live mixed-reality stream (top-right).

Enabling for Mixed Reality

Josh, Jerry, and the team at Intel have the most experience working with HTC Vive VR applications built in Unity using the SteamVR plugin, which facilitates the requirements for mixed-reality video—the third-person virtual camera, binding it to a controller or tracker, and outputting the background and foreground layers. Using the Unity 5* engine with the SteamVR plugin on HTC Vive ensures that the heavy lifting of mixed-reality enablement is already done for the developer.

As a result of the number of different platforms and engines, and the young nature of the technology, no single standard or consistent way to implement the technology and workflow across all of them currently exists, and a greater or lesser amount of programming may be required depending on which is used. While it’s understood that other platforms and engines—such as Unreal Engine* and Oculus Rift*—are developing their mixed-reality offerings, Josh recommends checking directly with the makers for up-to-date information regarding their specific capabilities and requirements.

One useful tip from Josh for developers who want to optimize their app for mixed reality is to avoid drawing hands in the game (or at least make it possible to switch them on and off), and stick to objects that the hand can hold instead. This is because if there is a hand, it’s never going to be completely matched with the user’s real hand, with the result being that it simply looks off, and breaks the illusion of immersion in the virtual environment. “If there’s no virtual hand, and your real hand is seen picking up objects or interacting with the world, it’s usually close enough that the illusion works,” explained Josh.

In this example, the user is holding blasters and no virtual hands are visible, which supports the overall illusion

Figure 12: In this example, the user is holding blasters and no virtual hands are visible, which supports the overall illusion.

Real Potential

The Intel team sees enormous future potential for the kind of VR mixed-reality video-production techniques that have been pioneered over the past 18 months by Vive, Google*, Kert Gartner, and an increasing number of independent developers, streamers, and YouTubers.

“There’s a lot of cool potential here for filmmaking and storytelling,” enthused Josh. “I can’t imagine it’s going to be very long before we start seeing the first films that are produced in VR mixed reality, with a virtual environment that people can be immersed in, and mixed reality used to tell a story inside that world.”

Josh can also envisage its adoption in journalism and weather reporting, with a reporter pictured live in an environment that would in reality be uninhabitable—for example, the eye of a storm, or a contaminated site. Meanwhile, Jerry has his eyes on a different prize: “I wonder how episodic TV shows would look if you could insert yourself into them somehow; for example, live shows that you join in VR, and stream your own version of.”

Running with that thought, Josh expects to see the adoption of mixed reality in the world of eSports, as more VR titles follow in the footsteps of magical dueling simulator The Unspoken*. “Imagine being able to essentially put yourself in the arena through remote VR technology, then doing your own commentary that you stream live, in mixed reality,” said Josh. “I think there’s a ton of potential.”

As developers begin to understand the value of mixed-reality video to communicate their VR experience to a wider audience, and its uses extend beyond the gaming and app world, the market is going to open up for specialist video production companies using the technology to show those experiences to their absolute best advantage.

Big Picture

“In the Developer Relations Division of the Software and Service Group at Intel, we live by the idea that these advanced processors—with billions of transistors—really aren’t good for much more than converting electricity into heat without providing great software experiences,” said Josh.

To help ensure those experiences get made, Josh, Jerry, Bob, and the team are committed to working closely with the developers lighting up Intel’s silicon. In the field of VR, there is probably no better way to showcase a new experience than with a mixed-reality video, which is why the team has been exploring the technique’s potential, and working to share their knowledge and inspirations with as many developers as possible.

“We want to work with VR developers to make their VR experiences enabled for mixed reality, and blow people away with these trailers,” said Josh. “We’re trying to help developers make the most amazing software experience possible, because those amazing software experiences are what unlock the potential of the hardware products.

“We’re always talking to developers large and small, working with them, and getting their input,” continued Josh. “We listen, and we try to make the things that will help them improve their software. That is the heart of what we do.”

More stories, tutorials, case studies, and other related materials are planned around VR mixed-reality technologies. To stay up-to-date with all the latest news, join the Intel developer program at: https://software.intel.com/gamedev.

↧

Exploit Nested Parallelism with OpenMP* Tasking Model

November 7, 2017, 1:30 am

Latest and popular articles on Intel Technologies

≫ Next: Monte Carlo European Option Pricing for Intel® Xeon® Processors

≪ Previous: Sharing VR Through Green Screen Mixed Reality Video

The new generation, Intel® Xeon® processor Scalable family (formerly code-named Skylake-SP), Intel’s most scalable processor has up to 28 processor cores per socket with options to scale from 2 to 8 sockets. Intel® Xeon Phi^TMprocessor provides massive parallelism with up to 72 cores per unit. More and more parallelism capabilities are introduced by hardware that requires software to exploit.

This is not always easy however in cases like lacking enough parallel tasks, huge temporary memory expansion when thread number grows, load imbalance etc. In these cases, nested parallelism can be helpful to scale parallel task number at multiple levels. It can also help inhibit temporary space explosion by share memory and parallelize at certain level while enclosed by another parallel region.

There are actually two ways to enable nested parallelism with OpenMP*. One is explicitly documented in OpenMP spec by setting OMP_NESTED environment variable or calling “omp_set_nested” runtime routines. There are some good examples and explanations on this topic from online tutorial: OpenMP Lab on Nested Parallelism and Task.

The other one is using OpenMP Tasking model. Comparing to other Worksharing constructs supported in OpenMP, tasking construct provides more flexibility in supporting various kinds of parallelism. It can be nested inside a parallel region, other task constructs, or other Worksharing constructs. With introducing taskloop reduction and taskgroup, this become more useful.

Here we use an example to demonstrate how to apply nested parallelism in different ways.

void fun1()
{
    for (int i=0; i<80; i++)
        ...
}


void main()
{
#pragma omp parallel
   {
#pragma omp for
       for (int i=0; i<100; i++)
           ...

#pragma omp for
       for (int i=0; i<10; i++)
           fun1();
    }
}

In the above example, the 2nd loop in main has a small trip count that can only be distributed to 10 threads with omp for. However there are 80 loop iterations in Fun1 which will be called 10 times in main loop. The product of loop trip count in Fun1 and the main loop will yield 800 iterations in total! This gives much more parallelism potential if parallelism can be added in both levels.

Here is how nested parallel regions work:

void fun1()
{
#pragma omp parallel for
    for (int i=0; i<80; i++)
        ...
}

void main
{
#Pragma omp parallel
    {
        #pragma omp for
        for (int i=0; i<100; i++)
            …

        #pragma omp for
        for (int i=0; i<10; i++)
            fun1();
    }
}

The problem with this implementation is you may either have insufficient threads for the 1st main loop as it has larger loop count, or create exploded number of threads for the 2nd main loop when OMP_NESTED=TRUE. The simple solution is to split the parallel region in main and create separate ones for each loop with a distinct thread number specified.

In contrast, here's how omp tasking works:

void fun1()
{
#pragma omp taskloop
     for (int I = 0; i<80; i++)
         ...
}

void main
{
#pragma omp parallel
    {
#pragma omp for
        for (int i=0; i<100; i++)
            ...
      
#pragma omp for
        for (int i=0; i<10; i++)
            fun1();
    }
}

As you can see, you don't have to worry about the thread number changes in 1st and 2nd main loops. Even though you still have a small amount of (10) threads allocated for 2nd main loop, the rest available threads will be able to be distributed through omp taskloop in fun1.

In general, OpenMP nested parallel regions is a way to distribute tasks by creating/forking more threads. In OpenMP, parallel region is the only construct determines execution thread number and controls thread affinity. Using nested parallel regions means each thread in parent region will yield multiple threads in enclosed regions, which in turn create a product of thread number.

Omp tasking shows another way to explore parallelism by adding more tasks, instead of threads. Though the thread number is unchanged as specified at the entry of the parallel region, the increased tasks from the nested tasking constructs can be distributed and executed by any available/idle threads in the current team of the same parallel region. This gives opportunities to fully use all threads’ capability, and improve balance of workloads automatically.

With the introducing of omp taskloop, omp taskloop reduction, omp taskgroup, omp taskgroup reduction, OpenMP tasking model becomes a more powerful resolution supporting nested parallelism. For more details on these new features in OpenMP 5.0TR, please refer to OpenMP* 5.0 support in Intel® Compiler 18.0.

Please note that we also received some known issue regarding nested parallelism with reduction clause in 18.0 initial version. This issue is expected to be fixed in 2018 Update 1 which will be available soon.

↧

Monte Carlo European Option Pricing for Intel® Xeon® Processors

November 3, 2017, 3:03 pm

Latest and popular articles on Intel Technologies

≫ Next: Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors on a Single Node

≪ Previous: Exploit Nested Parallelism with OpenMP* Tasking Model

Download [10KB]

Introduction

This is an update of the article Monte Carlo European Option Pricing with RNG Interface for Intel® Xeon Phi™ Coprocessor, which covered the technical details of the Monte Carlo methods and the random number generation with Intel® Math Kernel Library (Intel® MKL). In this article, we discuss the performance of the workload updated for Intel® Xeon® Scalable processors.

Code Changes

This section describes the changes in the source code from the previous version.

The source has been modified to read the input from the user to make the source run across various Intel® Xeon® platforms.

Another change includes a different method for random number stream generation. The new source uses a SIMD-oriented Fast Mersenne Twister pseudorandom number generator for improved performance. The random numbers generated are single precision for both the single- and double-precision versions.

Code Access

Monte Carlo European Option Pricing for Intel® Xeon® processors is maintained by Nimisha Raut and available under the Intel Sample Source Code License Agreement.

To access the code and test workloads, download the MonteCarlo.tar.gz file attached to this article

Build and Run Directions

Here are the steps for rebuilding the program:

Install Intel® Parallel Studio on your system.
Untar the MonteCarlo.tar.
Type make to build the binaries for single and double precision.
- For single precision: MonteCarloInsideBlockingSP.x
- For double precision: MonteCarloInsideBlockingDP.x
Where x is abbreviation for the platform as below:
- avx512 – for Intel Xeon Scalable processors supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set extensions
- avx2 – for systems supporting Intel^® Advanced Vector Extensions 2 instruction set extensions
- knl - for Intel® Xeon Phi™ processor x200 product family
- sse4_2 – for systems supporting Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) instruction set extensions
- scalar - scalar version

Make sure the host machine is powered by Intel Xeon processors.

$ lscpu
Architecture:	x86_64
CPU op-mode(s):	32-bit, 64-bit
Byte Order:	Little Endian
CPU(s):	80
On-line CPU(s) list:	0-79
Thread(s) per core:	2
Core(s) per socket:	20
Core(s) per socket:	2
NUMA node(s):	2
Vendor ID:	GenuineIntel
CPU family:	6
Model:	85
Model name:	Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping	4
CPU MHz:	1000.000
BogoMIPS:	4792.94
Virtualization:	VT-x
L1d cache:	32K
L1i cache:	32K
L2 cache	1024K
L3 cache:	28160K
NUMA node0 CPU(s):	0-19,40-59
NUMA node1 CPU(s):	20-39,60-79

Set the environment variables.
export OMP_NUM_THREADS=maximum threads supported on the system
export KMP_AFFINITY=compact,granularity=fine

Run the MonteCarlo SP and DP versions. Below is an example run with output.

$ ./MonteCarloInsideBlockingDP.avx512 112 1344k 256k 8k
Monte Carlo European Option Pricing Double Precision

Build Time       = Oct 26 2017 14:03:13
Path Length      = 262144
Number of Options= 1376256
Block Size       = 8192
Worker Threads   = 112

Starting options pricing...
Parallel simulation completed in 10.342098 seconds.
Validating the result...
L1_Norm          = 4.816983E-04
Average RESERVE  = 12.554841
Max Error        = 8.945331E-02
Test passed
==========================================
Time Elapsed = 10.342098
Opt/sec      = 133073.192723
==========================================

Performance across Generations of Intel® Xeon® processors

Monte Carlo European option pricing for Intel Xeon processors

References

About the Author

Nimisha Raut is currently a software engineer with Intel’s Financial Services Engineering team in the Intel Software and Services Group. Her major interest is parallel programming and performance analysis on Intel processors and coprocessors and Nvidia GPGPUs. She received a Master’s degree in Computer Engineering from Clemson University and a Bachelor’s degree in Electronics Engineering from Mumbai University, India.

↧

Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors on a Single Node

November 6, 2017, 11:04 am

Latest and popular articles on Intel Technologies

≫ Next: Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors for multi-node runs

≪ Previous: Monte Carlo European Option Pricing for Intel® Xeon® Processors

For cluster run, please refer to the recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors on cluster

Purpose

This recipe describes a step-by-step process for getting, building, and running NAMD (scalable molecular dynamics code) on the Intel® Xeon Phi™ processor and Intel® Xeon® processor E5 family to achieve better performance.

Introduction

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecule systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Below are the details for how to build NAMD on the Intel Xeon Phi processor and Intel Xeon processor E5 family. You can learn more about NAMD at http://www.ks.uiuc.edu/Research/namd/.

Building and Running NAMD on the Intel® Xeon® Processor E5-2697 v4 (formerly Broadwell (BDW)), Intel® Xeon Phi™ Processor 7250 (formerly Knight Landing (KNL)), and Intel® Xeon® Gold 6148 Processor (formerly Skylake (SKX))

Download the code

Download the latest NAMD source code from this site: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
Download the Charm++ 6.7.1 version.
a. You can get Charm++ from the NAMD source code of the Version Nightly Build.
b. Or download it separately: http://charmplusplus.org/download/
Download the fftw3 version: http://www.fftw.org/download.html
Version 3.3.4 is used is this run.
Download apoa1 and stvm workloads: http://www.ks.uiuc.edu/Research/namd/utilities/

Build the binaries

Set environment for compilation:

CC=icc; CXX=icpc; F90=ifort; F77=ifort
export CC CXX F90 F77
source /opt/intel/compiler/<version>/compilervars.sh intel64

Build fftw3:

cd <fftw_root_path>

./configure --prefix=<fftw_install_path> --enable-single --disable-fortran CC=icc
Use –xCORE-AVX512 for SKX, -xMIC-AVX512 for KNL and –xCORE-AVX2 for BDW

make CFLAGS=“-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits” clean install

Build a multicore version of Charm++:

cd <charm_root_path>

./build charm++ multicore-linux64 iccstatic --with-production “-O3 -ip”

Build NAMD:

a. Modify the arch/Linux-x86_64-icc to look like the following (select one of the FLOATOPTS options depending on the CPU type):

NAMD_ARCH = Linux-x86_64
CHARMARCH = multicore-linux64-iccstatic

# For KNL
FLOATOPTS = -ip -xMIC-AVX512  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

# For SKX
FLOATOPTS = -ip -xCORE-AVX512  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

# For BDW
FLOATOPTS = -ip -xCORE-AVX2  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

CXX = icpc -std=c++11 -DNAMD_KNL
CXXOPTS = -static-intel -O2 $(FLOATOPTS)
CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
CXXCOLVAROPTS = -O2 -ip
CC = icc
COPTS = -static-intel -O2 $(FLOATOPTS)

b. Compile NAMD:

./config Linux-x86_64-icc --charm-base <charm_root_path> --charm-arch multicore-linux64- iccstatic --with-fftw3 --fftw-prefix <fftw_install_path> --without-tcl --charm-opts –verbose

ii.

gmake –j

Other system setup

Change the kernel setting for KNL: “nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271” Here is one way to change the settings (this could be different for every system):
a. To be safe, first save your original grub.cfg:
```
cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.ORIG
```
b. In “/etc/default/grub” add (append) the following to
```
“GRUB_CMDLINE_LINUX”: nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271
```
c. Save your new configuration:
```
grub2-mkconfig -o /boot/grub2/grub.cfg 
```
d. Reboot the system. After logging in, verify the settings with “cat /proc/cmdline”
Change next lines in *.namd file for both workloads:
numsteps 1000
outputtiming 20
outputenergies 600

Run NAMD

on SKL/BDW (ppn = 40 / ppn = 72 correspondingly):

./namd2 +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)

on KNL (ppn = 136 (2 hyper threads per core), MCDRAM in flat mode, similar performance in cache):
```
numactl -p 1 ./namd2 +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)
```

KNL example:

numactl -p 1 <namd_root_path>/Linux-KNL-icc/namd2 +p 136 apoa1/apoa1.namd +pemap 0-135

Performance results reported in the Intel Salesforce repository (ns/day; higher is better):

Workload

2S Intel® Xeon® Processor E5-2697 v4 18c 2.3 GHz (ns/day)

Intel® Xeon Phi™ Processor 7250 bin1 (ns/day)

Intel® Xeon Phi™ Processor 7250 versus 2S Intel® Xeon® Processor E5-2697 v4 (speedup)

stmv

0.45

0.55

1.22x

apoa1

5.5

6.18

1.12x

Workload	2S Intel® Xeon® Gold 6148 Processor 20c 2.4 GHz (ns/day)	Intel® Xeon Phi™ Processor 7250 versus 2S Intel® Xeon® Processor E5-2697 v4 (speedup)
stmv	0.73	1.44x
apoa1 original	7.68	1.43x
apoa1	8.70	1.44x

Systems configuration

Processor	Intel® Xeon® Processor E5-2697 v4	Intel® Xeon® Gold 6148 Processor	Intel® Xeon Phi™ Processor 7250
Stepping	1 (B0)	1 (B0)	1 (B0) Bin1
Sockets / TDP	2S / 290W	2S / 300W	1S / 215W
Frequency / Cores / Threads	2.3 GHz / 36 / 72	2.4 GHz / 40 / 80	1.4 GHz / 68 / 272
DDR4	8x16 GB 2400 MHz (128 GB)	12x16 GB 2666 MHz (192 GB)	6x16 GB 2400 MHz
MCDRAM	N/A	N/A	16 GB Flat
Cluster/Snoop Mode/Mem Mode	Home	Home	Quadrant/flat
Turbo	On	On	On
BIOS	GRRFSDP1.86B0271.R00.1510301446		GVPRCRB1.86B.0010.R02.1608040407
Compiler	ICC-2017.0.098	ICC-2016.4.298	ICC-2017.0.098
Operating System	Red Hat Enterprise Linux* 7.2	Red Hat Enterprise Linux 7.3	Red Hat Enterprise Linux 7.2
Operating System	(3.10.0-327.e17.x86_64)	(3.10.0-514.6.2.0.1.el7.x86_64.knl1)	(3.10.0-327.22.2.el7.xppsl_1.4.1.3272._86_64)

↧

Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors for multi-node runs

November 7, 2017, 8:25 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Trace Analyzer and Collector Release Notes

≪ Previous: Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors on a Single Node

Download [96KB]

For single-node runs, refer to the recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ processors

Purpose

Introduction

NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Below are the details for how to build NAMD on Intel Xeon Phi processor and Intel Xeon processor E5 family. You can learn more about NAMD at http://www.ks.uiuc.edu/Research/namd/.

Building and Running NAMD for Cluster on the Intel® Xeon® processors

E5-2697 v4 (formerly Broadwell (BDW)), Intel® Xeon Phi™ processor 7250 (formerly Knights Landing (KNL)), and Intel® Xeon® Gold 6148 processor (formerly Skylake (SKX))

Download the code

Download the latest NAMD source code from this site: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
Download Open Fabric Interfaces (OFI). NAMD uses Charm++/OFI for multi-node.
- You can use the installed OFI library, which comes with the IFS package, or download and build it manually.
- To check the version of the installed OFI use the “fi_info --version” command (OFI1.4.2 was used here).
- The OFI library can be downloaded from https://github.com/ofiwg/libfabric/releases.
Download Charm++ with OFI support:
From here: http://charmplusplus.org/download/
or
git clone: http://charm.cs.illinois.edu/gerrit/charm.git
Download the fftw3 version: http://www.fftw.org/download.html
Version 3.3.4 is used is this run.
Download the apao and stvm workloads: http://www.ks.uiuc.edu/Research/namd/utilities/

Build the Binaries

Set the environment for compilation:

CC=icc; CXX=icpc; F90=ifort; F77=ifort
export CC CXX F90 F77
source /opt/intel/compiler/<version>/compilervars.sh intel64

Build the OFI library (you can skip this step if you want to use the installed OFI library):
1. cd <libfabric_root_path>
2. ./autogen.sh
3. ./configure --prefix=<libfabric_install_path> --enable-psm2
4. make clean && make -j12 all && make install
5. custom OFI can be used further using LD_PRELOAD or LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=<libfabric_install_path>/lib:${LD_LIBRARY_PATH}
mpiexec.hydra …
or
LD_PRELOAD=<libfabric_install_path>/lib/libfabric.so mpiexec.hydra …
Build fftw3:
1. cd <fftw_root_path>
2. ./configure --prefix=<fftw_install_path> --enable-single --disable-fortran CC=icc
  Use –xCORE-AVX512 for SKX, -xMIC-AVX512 for KNL and –xCORE-AVX2 for BDW
3. make CFLAGS=“-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits” clean install
Build multi-node version of Charm++:
1. cd <charm_root_path>
2. ./build charm++ ofi-linux-x86_64 icc smp --basedir <libfabric_root_path> --with-production “-O3 -ip” -DCMK_OPTIMIZE

Build NAMD:

Modify the arch/Linux-x86_64-icc to look like the following (select one of the FLOATOPTS options depending on the CPU type):

NAMD_ARCH = Linux-x86_64
CHARMARCH = multicore-linux64-iccstatic

# For KNL
FLOATOPTS = -ip -xMIC-AVX512  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

# For SKX
FLOATOPTS = -ip -xCORE-AVX512  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

# For BDW
FLOATOPTS = -ip -xCORE-AVX2  -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE

CXX = icpc -std=c++11 -DNAMD_KNL
CXXOPTS = -static-intel -O2 $(FLOATOPTS)
CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
CXXCOLVAROPTS = -O2 -ip
CC = icc
COPTS = -static-intel -O2 $(FLOATOPTS)

Compile NAMD
1. ./config Linux-x86_64-icc --charm-base <charm_root_path> --charm-arch ofi-linux-x86_64-smp-icc --with-fftw3 --fftw-prefix <fftw_install_path>--without-tcl --charm-opts -verbose
2. cd Linux-x86_64-icc
3. make clean && gmake –j

Build memopt NAMD binaries:
Like BDW/KNL build with extra options “–with-memopt” for config.

Other Setup

Change the next lines in the *.namd file for both the stmv and opao1 workloads:
numsteps: 1000
outputtiming: 20
outputenergies: 600

Run the Binaries

Set the environment for launching:
1. source /opt/intel/compiler/<version>/compilervars.sh intel64
2. source /opt/intel/impi/<version>/intel64/bin/mpivars.sh
3. specify host names to run on in “hosts” file
4. export MPIEXEC=“mpiexec.hydra -hostfile ./hosts”
5. export PSM2_SHAREDCONTEXTS=0 (if you use PSM2 < 10.2.85)

Launch the task (for example with N nodes, with 1 process per node and PPN cores):

$MPPEXEC -n N -ppn 1 ./namd2 +ppn (PPN-1) <workload_path> +pemap 1-(PPN-1) +commap 0

For example for BDW (PPN=72):
$MPPEXEC -n 8 -ppn 1 ./namd2 +ppn 71 <workload_path> +pemap 1-71 +commap 0

For example for KNL (PPN=68, without hyper threads):
$MPPEXEC -n 8 -ppn 1 ./namd2 +ppn 67 <workload_path> +pemap 1-67 +commap 0

For example for KNL (with 2 hyper threads per core):
$MPPEXEC -n 8 -ppn 1 ./namd2 +ppn 134 <workload_path> +pemap 0-66+68 +commap 67

For KNL with MCDRAM in flat mode:

$MPPEXEC -n N -ppn 1 numactl -p 1 ./namd2 +ppn (PPN-1) <workload_path> +pemap 1-(PPN-1) +commap 0

Remarks

To achieve better scale on multi-node, increase the count of the communication threads (1, 2, 4, 8, 13, 17). For example, the following is a command for N KNL nodes with 17 processes per node and 8 threads per process (7 worker threads and 1 communication thread):

$MPPEXEC -n $(($N*17)) -ppn 17 numactl -p 1 ./namd2 +ppn 7 <workload_path> +pemap 0-67,68-135:4.3 +commap 71-135:4

Basic Charm++/OFI knobs (should be added as NAMD parameters)

+ofi_eager_maxsize: (default: 65536) Threshold between buffered and RMA paths
+ofi_cq_entries_count: (default: 8) Maximum number of entries to read from the completion queue with each call to fi_cq_read().
+ofi_use_inject: (default: 1) whether to use buffered send.
+ofi_num_recvs: (default: 8) Number of pre-posted receive buffers.
+ofi_runtime_tcp: (default: off) during the initialization phase the OFI EP names need to be exchanged among all nodes.
By default, the exchang is done with both PMI and OFI. If this flag is set then the exchange is done with PMI only.

For example:

$MPPEXEC -n 2 -ppn 1 ./namd2 +ppn 1 <workload_path> +ofi_eager_maxsize 32768 +ofi_num_recvs 16

Best performance results reported on an up to 128 Intel® Xeon Phi™ processor nodes cluster (ns/day; higher is better)

Workload/Node (2HT)	1	2	4	8	16
stmv (ns/day)	0.55	1.05	1.86	3.31	5.31

Workload/Node (2HT)	8	16	32	64	128
stmv.28M (ns/day)	0.152	0.310	0.596	1.03	1.91

↧

Intel® Trace Analyzer and Collector Release Notes

November 13, 2017, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® MPI Library Release Notes

≪ Previous: Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors for multi-node runs

This page provides the current Release Notes for Intel® Trace Analyzer and Collector. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

All files are in PDF format - Adobe Reader* (or compatible) required.
To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

2018

Update 1

Release Notes for Linux*Release Notes for Windows*

Overview:

Fix for the --summary CLI option.
Performance improvements in Imbalance Diagram building.

Initial Release

Release Notes for Linux*Release Notes for Windows*

Overview:

Support for OpenSHMEM* applications.
MPI Performance Snapshot distribution model change.
Feature removals.

2017

Update 4

Release Notes for Linux*Release Notes for Windows*

Overview:

Bug fixes.

Update 3

Release Notes for Linux*Release Notes for Windows*

Overview:

Bug fixes.

Update 2

Release Notes for Linux*Release Notes for Windows*

Overview:

Enhancements in function color selection on timelines.

Update 1

Release Notes for Linux*Release Notes for Windows*

Overview:

Zooming support with a mouse wheel on timelines.

Initial Release

Release Notes for Linux*Release Notes for Windows*

Overview:

New OTF2 to STF converter.
New library for collecting MPI load imbalance.

↧

Intel® MPI Library Release Notes

November 13, 2017, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Trace Analyzer and Collector Release Notes for Windows* OS

≪ Previous: Intel® Trace Analyzer and Collector Release Notes

This page provides the current Release Notes for Intel® MPI Library. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

2018

Update 1

Linux* Release Notes Windows* Release Notes

Startup performance improvements.

Initial Release

Linux* Release Notes Windows* Release Notes

Hydra startup improvements.
Improved support for Intel® Omni-Path Architecture.
Support removal for the Intel® Xeon Phi™ coprocessor (code named Knights Corner).
New deprecations.

2017

Update 4

Linux* Release Notes Windows* Release Notes

Performance tuning for processors based on Intel® microarchitecture codenamed Skylake and for Intel® Omni-Path Architecture.
Deprecated support for the IPM statistics format.

Update 3

Linux* Release Notes Windows* Release Notes

Hydra startup improvements.
Default fabrics list change.

Update 2

Linux* Release Notes Windows* Release Notes

New environment variables: I_MPI_HARD_FINALIZE and I_MPI_MEMORY_SWAP_LOCK.

Update 1

Linux* Release Notes Windows* Release Notes

PMI-2 support for SLURM*, improved SLURM support by default.
Improved mini-help and diagnostic messages, man1 pages for mpiexec.hydra, hydra_persist, and hydra_nameserver.
New deprecations.

Initial Release

Linux* Release Notes Windows* Release Notes

Support for the MPI-3.1 standard.
New topology-aware collective communication algorithms.
Effective MCDRAM (NUMA memory) support.
Controls for asynchronous progress thread pinning.
Performance tuning.
New deprecations.

5.1

Update 3 Build 223

Linux* Release Notes

Fix for issue with MPI_Abort call on threaded applications (Linux* only)

Update 3

Linux* Release Notes Windows* Release Notes

Fixed shared memory problem on Intel® Xeon Phi™ processor (codename: Knights Landing)
Added new algorithms and selection mechanism for nonblocking collectives
Added new psm2 option for Intel® Omni-Path fabric
Added I_MPI_BCAST_ADJUST_SEGMENT variable to control MPI_Bcast
Fixed long count support for some collective messages
Reworked the binding kit to add support for Intel® Many Integrated Core Architecture and support for ILP64 on third party compilers
The following features are deprecated in this version of the Intel MPI Library. For complete list of all deprecated and removed features, visit our deprecation page.
- SSHM
- MPD (Linux*)/SMPD (Windows*)
- Epoll
- JMI
- PVFS2

Update 2

Linux* Release Notes Windows* Release Notes

Intel® MPI Library now supports YARN* cluster manager (Linux* only)
DAPL library UCM settings are automatically adjusted for MPI jobs of more than 1024 ranks, resulting in more stable job start-up (Linux* only)
ILP64 support enhancements, support for MPI modules in Fortran 90
Added the direct receive functionality for the TMI fabric (Linux* only)
Single copy intra-node communication using Linux* supported cross memory attach (CMA) is now default (Linux* only)

Update 1

Linux* Release Notes Windows* Release Notes

Changes to the named-user licensing shceme. See more details in the Installation Instructions section of Intel® MPI Library Installation Guide.
Various bug fixes for general stability and performance.

Initial Release

Linux* Release Notes Windows* Release Notes

Added support for OpenFabrics Interface* (OFI*) v1.0 API
Added support for Fortran* 2008
Updated the default value for I_MPI_FABRICS_LIST
Added brand new Troubleshooting chapter to the Intel® MPI Library User's Guide
Added new application-specific features in the Automatic Tuner and Hydra process manager
Added support for the MPI_Pcontrol feature for improved internal statistics
Increased the possible space for MPI_TAG
Changed the default product installation directories
Various bug fixes for general stability and performance

↧

Intel® Trace Analyzer and Collector Release Notes for Windows* OS

November 13, 2017, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Trace Analyzer and Collector Release Notes for Linux* OS

≪ Previous: Intel® MPI Library Release Notes

Overview

Intel® Trace Analyzer and Collector is a powerful tool for analyzing MPI applications, which essentially consists of two parts:

Intel® Trace Collector is a low-overhead tracing library that performs event-based tracing in applications at runtime. It collects data about the application MPI and serial or OpenMP* regions, and can trace custom set functions. The product is completely thread safe and integrates with C/C++, FORTRAN and multithreaded processes with and without MPI. Additionally it can check for MPI programming and system errors.
Intel® Trace Analyzer is a GUI-based tool that provides a convenient way to monitor application activities gathered by the Intel Trace Collector. You can view the desired level of detail, quickly identify performance hotspots and bottlenecks, and analyze their causes.

To receive technical support and updates, you need to register your product copy. See Technical Support below.

What's New

Intel® Trace Analyzer and Collector 2018 Update 1

Fix for the --summary CLI option.
Performance improvements in Imbalance Diagram building.

Intel® Trace Analyzer and Collector 2018

MPI Performance Snapshot is no longer a part of Intel Trace Analyzer and Collector and is available as a separate product. See http://www.intel.com/performance-snapshot for details.
Removed the macOS* support.
Documentation is now removed from the product package and is available online.

Intel® Trace Analyzer and Collector 2017 Update 4

Bug fixes.

Intel® Trace Analyzer and Collector 2017 Update 3

Bug fixes.

Intel® Trace Analyzer and Collector 2017 Update 2

Enhancements in function color selection on timelines.

Intel® Trace Analyzer and Collector 2017 Update 1

Added zooming support with a mouse wheel on timelines.
Deprecated support for the ITF format.

Intel® Trace Analyzer and Collector 2017

Introduced a new API function VT_registerprefixed.
Custom plug-in framework is now removed.
All product samples are moved online to https://software.intel.com/en-us/product-code-samples.

Key Features

Advanced GUI: user-friendly interface, high-level scalability, support of STF trace data
Aggregation and Filtering: detailed views of runtime behavior grouped by functions or processes
Fail-Safe Tracing: improved functionality on prematurely terminated applications with deadlock detection
Intel® MPI Library Interface: support of tracing on internal MPI states, support of MPI-IO
Correctness Checking: check for MPI and system errors at runtime (including distributed memory checking)
ROMIO*: extended support of MPI-2 standard parallel file I/O
Comparison feature: compare two trace files and/or two regions (in one or two trace files)
Command line interface for the Intel Trace Analyzer

System Requirements

Hardware Requirements

Systems based on the Intel® 64 architecture, in particular:
- Intel® Core™ processor family
- Intel® Xeon® E5 v4 processor family recommended
- Intel® Xeon® E7 v3 processor family recommended
- 2nd Generation Intel® Xeon Phi™ Processor (formerly code named Knights Landing)
1 GB of RAM per core (2 GB recommended)
1 GB of free hard disk space

Software Requirements

Operating systems:
- Microsoft* Windows Server* 2008, 2008 R2, 2012, 2012 R2, 2016
- Microsoft* Windows* 7, 8.x, 10
MPI implementations:
- Intel® MPI Library 5.0 or newer
Compilers:
- Intel® C++/Fortran Compiler 15.0 or newer (required for OpenMP* support)
- Microsoft* Visual Studio* Compilers 2013, 2015, 2017

Known Issues and Limitations

Tracing of MPI applications, which include the MPI_Comm_spawn function calls, is not supported.
Intel® Trace Analyzer may get into an undefined state if too many files are opened at the same time.
In some cases symbols information may appear incorrectly in the Intel® Trace Analyzer if you discarded symbols information from object files.
MPI Correctness Checking is available with the Intel® MPI Library only.

Technical Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides Priority Support at our Online Service Center web site.

In order to get support you need to register your product in the Intel® Registration Center. If your product is not registered, you will not receive priority support.

↧

Intel® Trace Analyzer and Collector Release Notes for Linux* OS

November 13, 2017, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® MPI Library Release Notes for Windows* OS

≪ Previous: Intel® Trace Analyzer and Collector Release Notes for Windows* OS

Overview

Intel® Trace Analyzer and Collector is a powerful tool for analyzing MPI applications, which essentially consists of two parts:

Intel® Trace Collector is a low-overhead tracing library that performs event-based tracing in applications at runtime. It collects data about the application MPI and serial or OpenMP* regions, and can trace custom set functions. The product is completely thread safe and integrates with C/C++, FORTRAN and multithreaded processes with and without MPI. Additionally it can check for MPI programming and system errors.
Intel® Trace Analyzer is a GUI-based tool that provides a convenient way to monitor application activities gathered by the Intel Trace Collector. You can view the desired level of detail, quickly identify performance hotspots and bottlenecks, and analyze their causes.

To receive technical support and updates, you need to register your product copy. See Technical Support below.

What's New

Intel® Trace Analyzer and Collector 2018 Update 1

Fix for the --summary CLI option.
Performance improvements in Imbalance Diagram building.

Intel® Trace Analyzer and Collector 2018

Added support for OpenSHMEM* applications.
MPI Performance Snapshot is no longer a part of Intel Trace Analyzer and Collector and is available as a separate product. See http://www.intel.com/performance-snapshot for details.
Removed the macOS* support.
Removed support for the Intel® Xeon Phi™ coprocessor (code named Knights Corner).
Removed support for the indexed trace file (ITF) format.
Documentation is now removed from the product package and is available online.

Intel® Trace Analyzer and Collector 2017 Update 4

GStreamer* dependencies removal.

Intel® Trace Analyzer and Collector 2017 Update 3

Bug fixes.

Intel® Trace Analyzer and Collector 2017 Update 2

Enhancements in function color selection on timelines.

Intel® Trace Analyzer and Collector 2017 Update 1

Added zooming support with a mouse wheel on timelines.
Deprecated support for the ITF format.

Intel® Trace Analyzer and Collector 2017

Introduced an OTF2 to STF converter otf2-to-stf (preview feature).
Introduced a new library for collecting MPI load imbalance (libVTim).
Introduced a new API function VT_registerprefixed.
Custom plug-in framework is now removed.
All product samples are moved online to https://software.intel.com/en-us/product-code-samples.

Key Features

Advanced GUI: user-friendly interface, high-level scalability, support of STF and OTF2 trace data
Aggregation and Filtering: detailed views of runtime behavior grouped by functions or processes
Fail-Safe Tracing: improved functionality on prematurely terminated applications with deadlock detection
Intel® MPI Library Interface: support of tracing on internal MPI states, support of MPI-IO
Correctness Checking: check for MPI and system errors at runtime (including distributed memory checking)
ROMIO*: extended support of MPI-2 standard parallel file I/O
Comparison feature: compare two trace files and/or two regions (in one or two trace files)
Command line interface for the Intel Trace Analyzer

System Requirements

Hardware Requirements

Systems based on the Intel® 64 architecture, in particular:
- Intel® Core™ processor family
- Intel® Xeon® E5 v4 processor family recommended
- Intel® Xeon® E7 v3 processor family recommended
- 2nd Generation Intel® Xeon Phi™ Processor (formerly code named Knights Landing)
1 GB of RAM per core (2 GB recommended)
1 GB of free hard disk space

Software Requirements

Operating systems:
- Red Hat* Enterprise Linux* 6, 7
- Fedora* 23, 24
- CentOS* 6, 7
- SUSE* Linux Enterprise Server* 11, 12
- Ubuntu* LTS 14.04, 16.04
- Debian* 7, 8
MPI implementations:
- Intel® MPI Library 5.0 or newer
Compilers:
- Intel® C++/Fortran Compiler 15.0 or newer (required for OpenMP* support)
- GNU*: C, C++, Fortran 77 3.3 or newer, Fortran 95 4.4.0 or newer

Known Issues and Limitations

Static Intel® Trace Collector libraries require Intel® MPI Library 5.0 or newer.
Tracing of MPI applications, which include the MPI_Comm_spawn function calls, is not supported.
Intel® Trace Analyzer may get into an undefined state if too many files are opened at the same time.
In some cases symbols information may appear incorrectly in the Intel® Trace Analyzer if you discarded symbols information from object files.
MPI Correctness Checking is available with the Intel® MPI Library only.
Intel® Trace Analyzer requires libpng 1.2.x (libpng12.so), otherwise the Intel Trace Analyzer GUI cannot be started.
Intel® Trace Analyzer and Collector does not support Fortran applications or libraries compiled with the -nounderscore option. Only functions with one or two underscores at the end of the name are supported. See details on Fortran naming conventions at https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gfortran/Naming-conventions.html

Technical Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides Priority Support at our Online Service Center web site.

In order to get support you need to register your product in the Intel® Registration Center. If your product is not registered, you will not receive Priority Support.

↧

Intel® MPI Library Release Notes for Windows* OS

November 13, 2017, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® MPI Library Release Notes for Linux* OS

≪ Previous: Intel® Trace Analyzer and Collector Release Notes for Linux* OS

Overview

Intel® MPI Library is a multi-fabric message passing library based on ANL* MPICH3* and OSU* MVAPICH2*.

Intel® MPI Library implements the Message Passing Interface, version 3.1 (MPI-3) specification. The library is thread-safe and provides the MPI standard compliant multi-threading support.

To receive technical support and updates, you need to register your product copy. See Technical Support below.

Product Contents

The Intel® MPI Library Runtime Environment (RTO) contains the tools you need to run programs including scalable process management system (Hydra), supporting utilities, and dynamic libraries.
The Intel® MPI Library Development Kit (SDK) includes all of the Runtime Environment components and compilation tools: compiler wrapper scripts (mpicc, mpiicc, etc.), include files and modules, static libraries, debug libraries, and test codes.

What's New

Intel® MPI Library 2018 Update 1

Bug fixes.

Intel® MPI Library 2018

Deprecated support for the IPM statistics format.
Hard finalization is now the default.
Documentation has been removed from the product and is now available online.

Intel® MPI Library 2017 Update 4

Minor changes.

Intel® MPI Library 2017 Update 3

Minor changes.

Intel® MPI Library 2017 Update 2

Added an environment variable I_MPI_HARD_FINALIZE.

Intel® MPI Library 2017 Update 1

Support for topology-aware collective communication algorithms (I_MPI_ADJUST family).
Deprecated support for cross-OS launches.

Intel® MPI Library 2017

Support for the MPI-3.1 standard.
Removed the SMPD process manager.
Removed the SSHM support.
Deprecated support for the Intel® microarchitectures older than the generation codenamed Sandy Bridge.
Bug fixes and performance improvements.
Documentation improvements.

Key Features

MPI-1, MPI-2.2 and MPI-3.1 specification conformance.
MPICH ABI compatibility.
Support for any combination of the following network fabrics:
- RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*.
- Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects.
(SDK only) Support for Intel® 64 architecture clusters using:
- Intel® C++/Fortran Compiler 14.0 and newer.
- Microsoft* Visual C++* Compilers.
(SDK only) C, C++, Fortran 77, and Fortran 90 language bindings.
(SDK only) Dynamic linking.

System Requirements

Hardware Requirements

Systems based on the Intel® 64 architecture, in particular:
- Intel® Core™ processor family
- Intel® Xeon® E5 v4 processor family recommended
- Intel® Xeon® E7 v3 processor family recommended
1 GB of RAM per core (2 GB recommended)
1 GB of free hard disk space

Software Requirements

Operating systems:
- Microsoft* Windows Server* 2008, 2008 R2, 2012, 2012 R2, 2016
- Microsoft* Windows* 7, 8.x, 10
(SDK only) Compilers:
- Intel® C++/Fortran Compiler 15.0 or newer
- Microsoft* Visual Studio* Compilers 2013, 2015, 2017
Batch systems:
- Microsoft* Job Scheduler
- Altair* PBS Pro* 9.2 or newer
Recommended InfiniBand* software:
- Windows* OpenFabrics* (WinOF*) 2.0 or newer
- Windows* OpenFabrics* Enterprise Distribution (winOFED*) 3.2 RC1 or newer for Microsoft* Network Direct support
- Mellanox* WinOF* Rev 4.40 or newer
Additional software:
- The memory placement functionality for NUMA nodes requires the libnuma.so library and numactl utility installed. numactl should include numactl, numactl-devel and numactl-libs.

Known Issues and Limitations

Cross-OS runs using ssh from a Windows* host fail. Two workarounds exist:
- Create a symlink on the Linux* host that looks identical to the Windows* path to pmi_proxy.
- Start hydra_persist on the Linux* host in the background (hydra_persist &) and use -bootstrap service from the Windows* host. This requires that the Hydra service also be installed and started on the Windows* host.
Support for Fortran 2008 is not implemented in Intel® MPI Library for Windows*.
Enabling statistics gathering may result in increased time in MPI_Finalize.
In order to run a mixed OS job (Linux* and Windows*), all binaries must link to the same single or multithreaded MPI library. The single- and multithreaded libraries are incompatible with each other and should not be mixed. Note that the pre-compiled binaries for the Intel® MPI Benchmarks are inconsistent (Linux* version links to multithreaded, Windows* version links to single threaded) and as such, at least one must be rebuilt to match the other.
If a communication between two existing MPI applications is established using the process attachment mechanism, the library does not control whether the same fabric has been selected for each application. This situation may cause unexpected applications behavior. Set the I_MPI_FABRICS variable to the same values for each application to avoid this issue.
If your product redistributes the mpitune utility, provide the msvcr71.dll library to the end user.
The Hydra process manager has some known limitations such as:
- stdin redirection is not supported for the -bootstrap service option.
- Signal handling support is restricted. It could result in hanging processes in memory in case of incorrect MPI job termination.
- Cleaning up the environment after an abnormal MPI job termination by means of mpicleanup utility is not supported.
ILP64 is not supported by MPI modules for Fortran 2008.
When using the -mapall option, if some of the network drives require a password and it is different from the user password, the application launch may fail.

Technical Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides Priority Support at our Online Service Center web site.

In order to get support you need to register your product in the Intel® Registration Center. If your product is not registered, you will not receive Priority Support.

↧

Intel® MPI Library Release Notes for Linux* OS

November 13, 2017, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: Download Documentation: Intel® Parallel Studio XE (Current and Previous)

≪ Previous: Intel® MPI Library Release Notes for Windows* OS

Overview

Intel® MPI Library is a multi-fabric message passing library based on ANL* MPICH3* and OSU* MVAPICH2*.

Intel® MPI Library implements the Message Passing Interface, version 3.1 (MPI-3) specification. The library is thread-safe and provides the MPI standard compliant multi-threading support.

To receive technical support and updates, you need to register your product copy. See Technical Support below.

Product Contents

The Intel® MPI Library Runtime Environment (RTO) contains the tools you need to run programs including scalable process management system (Hydra), supporting utilities, and shared (.so) libraries.
The Intel® MPI Library Development Kit (SDK) includes all of the Runtime Environment components and compilation tools: compiler wrapper scripts (mpicc, mpiicc, etc.), include files and modules, static (.a) libraries, debug libraries, and test codes.

What's New

Intel® MPI Library 2018 Update 1

Improved startup performance on many/multicore systems (I_MPI_STARTUP_MODE).
Bug fixes.

Intel® MPI Library 2018

Improved startup times for Hydra when using shm:ofi or shm:tmi.
Hard finalization is now the default.
The default fabric list is changed when Intel® Omni-Path Architecture is detected.
Added environment variables: I_MPI_OFI_ENABLE_LMT, I_MPI_OFI_MAX_MSG_SIZE, I_MPI_{C,CXX,FC,F}FLAGS, I_MPI_LDFLAGS, I_MPI_FORT_BIND.
Removed support for the Intel® Xeon Phi™ coprocessor (code named Knights Corner).
I_MPI_DAPL_TRANSLATION_CACHE is now disabled by default.
Deprecated support for the IPM statistics format.
Documentation is now online.

Intel® MPI Library 2017 Update 4

Performance tuning for processors based on Intel® microarchitecture codenamed Skylake and for Intel® Omni-Path Architecture.

Intel® MPI Library 2017 Update 3

Hydra startup improvements (I_MPI_JOB_FAST_STARTUP).
Default value change for I_MPI_FABRICS_LIST.

Intel® MPI Library 2017 Update 2

Added environment variables I_MPI_HARD_FINALIZE and I_MPI_MEMORY_SWAP_LOCK.

Intel® MPI Library 2017 Update 1

PMI-2 support for SLURM*, improved SLURM support by default.
Improved mini help and diagnostic messages, man1 pages for mpiexec.hydra, hydra_persist, and hydra_nameserver.
Deprecations:
- Intel® Xeon Phi™ coprocessor (code named Knights Corner) support.
- Cross-OS launches support.
- DAPL, TMI, and OFA fabrics support.

Intel® MPI Library 2017

Support for the MPI-3.1 standard.
New topology-aware collective communication algorithms (I_MPI_ADJUST family).
Effective MCDRAM (NUMA memory) support. See the Developer Reference, section Tuning Reference > Memory Placement Policy Control for more information.
Controls for asynchronous progress thread pinning (I_MPI_ASYNC_PROGRESS).
Direct receive functionality for the OFI* fabric (I_MPI_OFI_DRECV).
PMI2 protocol support (I_MPI_PMI2).
New process startup method (I_MPI_HYDRA_PREFORK).
Startup improvements for the SLURM* job manager (I_MPI_SLURM_EXT).
New algorithm for MPI-IO collective read operation on the Lustre* file system (I_MPI_LUSTRE_STRIPE_AWARE).
Debian Almquist (dash) shell support in compiler wrapper scripts and mpitune.
Performance tuning for processors based on Intel® microarchitecture codenamed Broadwell and for Intel® Omni-Path Architecture (Intel® OPA).
Performance tuning for Intel® Xeon Phi™ Processor and Coprocessor (code named Knights Landing) and Intel® OPA.
OFI latency and message rate improvements.
OFI is now the default fabric for Intel® OPA and Intel® True Scale Fabric.
MPD process manager is removed.
Dedicated pvfs2 ADIO driver is disabled.
SSHM support is removed.
Support for the Intel® microarchitectures older than the generation codenamed Sandy Bridge is deprecated.
Documentation improvements.

Key Features

MPI-1, MPI-2.2 and MPI-3.1 specification conformance.
Support for Intel® Xeon Phi™ processors (formerly code named Knights Landing).
MPICH ABI compatibility.
Support for any combination of the following network fabrics:
- Network fabrics supporting Intel® Omni-Path Architecture (Intel® OPA) devices, through either Tag Matching Interface (TMI) or OpenFabrics Interface* (OFI*).
- Network fabrics with tag matching capabilities through Tag Matching Interface (TMI), such as Intel® True Scale Fabric, Infiniband*, Myrinet* and other interconnects.
- Native InfiniBand* interface through OFED* verbs provided by Open Fabrics Alliance* (OFA*).
- Open Fabrics Interface* (OFI*).
- RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*.
- Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects.
(SDK only) Support for Intel® 64 architecture and Intel® MIC Architecture clusters using:
- Intel® C++/Fortran Compiler 14.0 and newer.
- GNU* C, C++ and Fortran 95 compilers.
(SDK only) C, C++, Fortran 77, Fortran 90, and Fortran 2008 language bindings.
(SDK only) Dynamic or static linking.

System Requirements

Hardware Requirements

Systems based on the Intel® 64 architecture, in particular:
- Intel® Core™ processor family
- Intel® Xeon® E5 v4 processor family recommended
- Intel® Xeon® E7 v3 processor family recommended
- 2nd Generation Intel® Xeon Phi™ Processor (formerly code named Knights Landing)
1 GB of RAM per core (2 GB recommended)
1 GB of free hard disk space

Software Requirements

Operating systems:
- Red Hat* Enterprise Linux* 6, 7
- Fedora* 23, 24
- CentOS* 6, 7
- SUSE* Linux Enterprise Server* 11, 12
- Ubuntu* LTS 14.04, 16.04
- Debian* 7, 8
(SDK only) Compilers:
- GNU*: C, C++, Fortran 77 3.3 or newer, Fortran 95 4.4.0 or newer
- Intel® C++/Fortran Compiler 15.0 or newer
Debuggers:
- Rogue Wave* Software TotalView* 6.8 or newer
- Allinea* DDT* 1.9.2 or newer
- GNU* Debuggers 7.4 or newer
Batch systems:
- Platform* LSF* 6.1 or newer
- Altair* PBS Pro* 7.1 or newer
- Torque* 1.2.0 or newer
- Parallelnavi* NQS* V2.0L10 or newer
- NetBatch* v6.x or newer
- SLURM* 1.2.21 or newer
- Univa* Grid Engine* 6.1 or newer
- IBM* LoadLeveler* 4.1.1.5 or newer
- Platform* Lava* 1.0
Recommended InfiniBand* software:
- OpenFabrics* Enterprise Distribution (OFED*) 1.5.4.1 or newer
- Intel® True Scale Fabric Host Channel Adapter Host Drivers & Software (OFED) v7.2.0 or newer
- Mellanox* OFED* 1.5.3 or newer
Virtual environments:
- Docker* 1.13.0
Additional software:
- The memory placement functionality for NUMA nodes requires the libnuma.so library and numactl utility installed. numactl should include numactl, numactl-devel and numactl-libs.

Notes for Cluster Installation

When installing the Intel® MPI Library on all the nodes of your cluster without using a shared file system, you need to establish passwordless SSH connection between the cluster nodes. This process is described in detail in the Intel® Parallel Studio XE Installation Guide (see section 2.1).

Known Issues and Limitations

If you observe performance degradation with an MPI application that utilizes the RMA functionality, you are recommended to set I_MPI_SCALABLE_OPTIMIZATION = 0 to get a performance gain.
The I_MPI_JOB_FAST_STARTUP variable takes effect only when shm is selected as the intra-node fabric.
ILP64 is not supported by MPI modules for Fortran* 2008.
In case of program termination (like signal), remove trash in the /dev/shm/ directory manually with:
```
rm -r /dev/shm/shm-col-space-*
```
In case of large number of simultaneously used communicators (more than 10,000) per node, it is recommended to increase the maximum numbers of memory mappings with one of the following methods:
- ```
echo 1048576 > /proc/sys/vm/max_map_count
```
- ```
sysctl -w vm.max_map_count=1048576
```
- disable shared memory collectives by setting the variable: I_MPI_COLL_INTRANODE=pt2pt
On some Linux* distributions Intel® MPI Library may fail for non-root users due to security limitations. This was observed on Ubuntu* 12.04, and could impact other distributions and versions as well. Two workarounds exist:
- Enable ptrace for non-root users with:
```
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
```
- Revert the Intel® MPI Library to an earlier shared memory mechanism, which is not impacted, by setting: I_MPI_SHM_LMT=shm
Ubuntu* does not allow attaching a debugger to a non-child process. In order to use -gdb, this behavior must be disabled by setting the sysctl value in /proc/sys/kernel/yama/ptrace_scope to 0.
Cross-OS runs using ssh from a Windows* host fail. Two workarounds exist:
- Create a symlink on the Linux* host that looks identical to the Windows* path to pmi_proxy.
- Start hydra_persist on the Linux* host in the background (hydra_persist &) and use -bootstrap service from the Windows* host. This requires that the Hydra service also be installed and started on the Windows* host.
The OFA fabric and certain DAPL providers may not work or provide worthwhile performance with the Intel® Omni-Path Fabric. For better performance, try choosing the OFI or TMI fabric.
Enabling statistics gathering may result in increased time in MPI_Finalize.
In systems where some nodes have only Intel® True Scale Fabric or Intel® Omni-Path Fabric available, while others have both Intel® True Scale and e.g. Mellanox* HCAs, automatic fabric detection will lead to a hang or failure, as the first type of nodes will select ofi/tmi, and the second type will select dapl as the internode fabric. To avoid this, explicitly specify a fabric that is available on all the nodes.
In order to run a mixed OS job (Linux* and Windows*), all binaries must link to the same single or multithreaded MPI library. The single- and multithreaded libraries are incompatible with each other and should not be mixed. Note that the pre-compiled binaries for the Intel® MPI Benchmarks are inconsistent (Linux* version links to multithreaded, Windows* version links to single threaded) and as such, at least one must be rebuilt to match the other.
Intel® MPI Library does not support using the OFA fabric over an Intel® Symmetric Communications Interface (Intel® SCI) adapter. If you are using an Intel SCI adapter, such as with Intel® Many Integrated Core Architecture, you will need to select a different fabric.
The TMI and OFI fabrics over PSM do not support messages larger than 2³² - 1 bytes. If you have messages larger than this limit, select a different fabric.
If a communication between two existing MPI applications is established using the process attachment mechanism, the library does not control whether the same fabric has been selected for each application. This situation may cause unexpected applications behavior. Set the I_MPI_FABRICS variable to the same values for each application to avoid this issue.
Do not load thread-safe libraries through dlopen(3).
Certain DAPL providers may not function properly if your application uses system(3), fork(2), vfork(2), or clone(2) system calls. Do not use these system calls or functions based upon them. For example, system(3), with OFED* DAPL provider with Linux* kernel version earlier than official version 2.6.16. Set the RDMAV_FORK_SAFE environment variable to enable the OFED workaround with compatible kernel version.
MPI_Mprobe, MPI_Improbe, and MPI_Cancel are not supported by the TMI and OFI fabrics.
You may get an error message at the end of a checkpoint-restart enabled application, if some of the application processes exit in the middle of taking a checkpoint image. Such an error does not impact the application and can be ignored. To avoid this error, set a larger number than before for the -checkpoint-interval option. The error message may look as follows:
```
[proxy:0:0@hostname] HYDT_ckpoint_blcr_checkpoint (./tools/ckpoint/blcr/
ckpoint_blcr.c:313): cr_poll_checkpoint failed: No such process
[proxy:0:0@hostname] ckpoint_thread (./tools/ckpoint/ckpoint.c:559):
blcr checkpoint returned error
[proxy:0:0@hostname] HYDT_ckpoint_finalize (./tools/ckpoint/ckpoint.c:878)
 : Error in checkpoint thread 0x7
```
Intel® MPI Library requires the presence of the /dev/shm device in the system. To avoid failures related to the inability to create a shared memory segment, make sure the /dev/shm device is set up correctly.
Intel® MPI Library uses TCP sockets to pass stdin stream to the application. If you redirect a large file, the transfer can take long and cause the communication to hang on the remote side. To avoid this issue, pass large files to the application as command line options.
DAPL auto provider selection mechanism and improved NUMA support require dapl-2.0.37 or newer.
If you set I_MPI_SHM_LMT=direct, the setting has no effect if the Linux* kernel version is lower than 3.2.
When using the Linux boot parameter isolcpus with an Intel® Xeon Phi™ processor using default MPI settings, an application launch may fail. If possible, change or remove the isolcpus Linux boot parameter. If it is not possible, you can try setting I_MPI_PIN to off.
In some cases, collective calls over the OFA fabric may provide incorrect results. Try setting I_MPI_ADJUST_ALLGATHER to a value between 1 and 4 to resolve the issue.

Technical Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides Priority Support at our Online Service Center web site.

In order to get support you need to register your product in the Intel® Registration Center. If your product is not registered, you will not receive Priority Support.

↧

Download Documentation: Intel® Parallel Studio XE (Current and Previous)

November 13, 2017, 12:14 am

Latest and popular articles on Intel Technologies

≫ Next: Introduction to Hyperscan

≪ Previous: Intel® MPI Library Release Notes for Linux* OS

This page provides downloadable documentation packages for all editions of Intel® Parallel Studio XE (Cluster, Composer, Professional).

Each package includes documentation for Intel Parallel Studio XE components, such as compilers (Intel C++ Compiler, Intel Fortran Compiler), libraries (e.g., Intel Math Kernel Library, Intel Integrated Performance Primitives), performance analyzers (e.g., Intel VTune Amplifier, Intel Inspector, Intel Advisor), and others. The full list of included components and respective documentation formats is available in the readme file in each package.

The packages provide downloadable copies of the web documentation formats and do not include the documents shipped offline with the product (e.g., Getting Started pages, Installation Guides).

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

2018

↧

Introduction to Hyperscan

November 9, 2017, 2:16 pm

Latest and popular articles on Intel Technologies

≫ Next: Using Intel Software and Google Cloud Platform

≪ Previous: Download Documentation: Intel® Parallel Studio XE (Current and Previous)

Hyperscan is a high performance regular expression matching library from Intel that runs on x86 platforms and offers support for Perl Compatible Regular Expressions (PCRE) syntax, simultaneous matching of groups of regular expressions, and streaming operations. It is released as open source software under a BSD license. Hyperscan presents a flexible C API and a number of different modes of operation to ensure its applicability in real networking scenarios. Moreover, a focus on efficient algorithms and the use of Intel® Streaming SIMD Extensions (Intel® SSE) enables Hyperscan to achieve high matching performance. It is suitable for usage scenarios such as deep packet inspection (DPI), intrusion detection systems (IDS), intrusion prevention system (IPS), and firewalls, and has been deployed in network security solutions worldwide. Hyperscan has also been integrated into widely used open-source IDS and IPS products like Snort* and Suricata*.

Under the Hood

Hyperscan’s workflow can be divided into two parts: compile time and run-time.

Compile time

Hyperscan comes with a regular expression compiler written in C++. As shown in Figure 1, it takes regular expressions as input. Depending on the Intel® architecture platform features available, user-defined modes and pattern features, Hyperscan generates a corresponding pattern database through a complex graph analysis and optimization process. The generated database can also be serialized and stored in memory for later use by the runtime.

Figure 1: Hyperscan compilation process

Figure 2: Hyperscan run-time

Run-time

The Hyperscan run-time is developed in C. Figure 2 shows a high-level block diagram of the main components of the run-time. You need to pre-allocate a scratch space for temporary information used during scanning, and then use the compiled database to call Hyperscan's scan APIs to trigger internal matching engines (nondeterministic finite automaton (NFA), deterministic finite automaton (DFA), and so on) to match the corpus. Hyperscan accelerates these engines with the help of single instruction, multiple data (SIMD) instructions provided by the Intel processor, and matches are delivered to the user application for processing via a user-provided callback function. Since the Hyperscan pattern database is read-only, users can share the database between multiple CPU cores or multiple threads to enhance matching scalability.

Features

Versatile Functionality

Hyperscan supports cross-compilation for multiple Intel processors, with specific optimizations for different instruction sets. It has no operating system restrictions, supports both virtual machine and container scenarios, covers most PCRE syntax, and supports complex expressions that include syntaxes such as ". *" and "[^>] *". Different modes of operation (streaming, block, and vectored) are available to meet the requirements of different scenarios. If requested through the use of a per-pattern flag, Hyperscan can find the starting and ending positions of the matching data in the input stream. For more information, see the current version of the Hyperscan Developer Reference Guide.

Large-scale Matching

Depending on complexity, Hyperscan can support matching a large set of rules. Unlike most regular matching engines, Hyperscan supports multi-pattern matching. After you specify a unique ID for each rule, Hyperscan is able to compile the rules into a database and output all the current matching rule IDs during the matching process.

Figure 3: Data scattered in different units in time order

Streaming mode

Hyperscan supports three modes of operation: block mode, streaming mode, and vectored mode. Block mode is the most straightforward, where a single contiguous block of data is scanned, with matches returned to the caller as they are found. Streaming mode is designed for cross-packet matching in networking scenarios where the data to be scanned is broken up into multiple packets. In streaming mode, Hyperscan can save the match state for the current data block and use it as the initial match state when a new data block arrives. As shown in Figure 3, streaming mode guarantees the consistency of the final matches regardless of how the “xxxxabcxxxxxxxdefx“ data is split into packets over time. In addition, Hyperscan can compress the saved match state to reduce the application’s memory footprint. Streaming mode operation provides a simple way to scan data that arrives over a period of time without requiring you to buffer and rescan packets or limit scanning to a fixed window of historical data. Finally, there is vectored mode, which offers scanning in sequence of a set of data blocks that are not contiguous in memory.

High Performance and Scalability

Hyperscan requires the Intel® Streaming SIMD Extensions 3 instruction set at a minimum and makes use of SIMD instructions to accelerate matching performance. Below, we provide a brief summary of a publicly-available performance demo, Performance Analysis of Hyperscan with hsbench.

We use three different pattern sets for this analysis.

Snort Literals is a set of 3,316 literal patterns extracted from the sample ruleset included with the Snort* 3 network intrusion detection system.
Snort PCREs is a set of 847 regular expressions that was also extracted from the sample ruleset that includes Snort 3, taken from rules targeted at HTTP traffic.
Teakettle 2500 is a set of 2,500 synthetic patterns generated with a script that produces regular expressions of limited complexity. We tested these pattern sets on alexa200.db, a large traffic sample constructed from a PCAP capture of an automated Web browser browsing a subset of the top sites listed on Alexa*.

These pattern sets and corpora are available at https://01.org/blogs/jpviiret/2017/performance-analysis-hyperscan-hsbench.

Figure 4 shows Hyperscan's matching performance (Gbps) in block mode on the Intel® Xeon® processor E5-2699 v4 @ 2.20 GHz.

Figure 4: Hyperscan performance in block mode on different rule sets.

Figure 4 shows that Hyperscan can achieve good single core performance using different rule sets. Moreover, it has high scalability, in which its matching performance grows almost linearly as the number of cores in use increases.

Integration of Hyperscan and the DPDK

Figure 5: Performance of Hyperscan and Data Plane Development Kit integration

The Data Plane Development Kit (DPDK) enables high speed network packet processing and forwarding, and is widely applied in the industry. Hyperscan and DPDK can be integrated into a high-performance DPI solution. Figure 5 shows the performance data of the integrated solution. In the test, we used real patterns and HTTP traffic as input. The integration of Hyperscan and DPDK delivers high performance, and at larger packets sizes the performance can reach wire speed in this test.

Summary

Hyperscan provides a flexible, easy to use library that enables you to match large numbers of patterns simultaneously with high performance and good scalability, as well as providing unique functionality for network packets processing. The integration of Hyperscan and the DPDK also provides mature and efficient solutions for DPI, IDS, IPS and other related products.

About the Author

Xiang Wang is a software engineer working on Hyperscan at Intel. His major areas of focus include automata theory and regular expression matching. He works on a pattern matching engine optimized by Intel architecture that is at the core of DPI, IDS, IPS, and firewalls in the network security domain.

↧

Using Intel Software and Google Cloud Platform

November 13, 2017, 7:43 pm

Latest and popular articles on Intel Technologies

≫ Next: Enabling Intel® MKL in PETSc applications

≪ Previous: Introduction to Hyperscan

Do you develop software in the cloud? We are working with Google to provide a great experience on Google Cloud Platform for users of Intel software.

As a small first step, we have published docker images for Intel Python in Google's Container Registry. With much faster download speeds than Docker Hub, pulling images is 20-30% faster, depending on the instance type and image you are downloading. For more information, please read instructions for using Intel Python from containers. Intel Python includes numpy, scipy, and scikit-learn optimized for Intel Skylake processors available on Google Cloud Platform.

For the convenience of Linux users on Google Cloud Platform, we provide Intel Python, Intel MKL and other performance libraries from yum and apt servers. The software is free, including commercial use.

Look for more to come as we roll out other features to make developing and running in the cloud better with Intel Software and Google Cloud Platform.

↧

Enabling Intel® MKL in PETSc applications

November 12, 2017, 11:27 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Math Kernel Library (Intel® MKL) and pkg-config tool

≪ Previous: Using Intel Software and Google Cloud Platform

PETSc(Portable, Extensible Toolkit for Scientific Computation) is an open source suite of data structures and routines for the parallel solution of scientific applications modelled by partial differential equations. Starting from release 3.8 PETSc users can benefit from enabling Intel® MKL sparse linear operations inside their application. Latest version of PETSc supports analogue for AIJ and BAIJ matrix formats that calls Intel® MKL Sparse BLAS Inspector Executor kernels for matrix vector multiplication.

Inspector-Executor API for Sparse BLAS presented in MKL is a two-stage approach that divides all sparse operations into analysis and execution steps. During the initial analysis stage, the API inspects the matrix sparsity pattern, applies matrix structure changes and converts matrix to internal format. Internal matrix format is chosen based on sparsity pattern to enable better parallelism and vectorization on execution stage. In the execution stage, subsequent routine calls reuse this information in order to improve performance.

With new update, PETSc users can easily switch to MKL Sparse BLAS Inspector-Executor API for sparse linear algebra operations and get performance benefit for most of PETSc solvers.

The following MKL functionality is currently supported in PETSc:

MKL BLAS/LAPACK as basic linear algebra operations

MKL PARDISO and Parallel Direct Sparse Solver for Cluster as direct solvers

MKL Sparse BLAS IE for AIJ and BAIJ matrix operations (PETSc 3.8 version and later)

How to use PETSc with MKL:

Download development version of PETSc from bitbucket repository or latest released PETSc version from official website. Instructions for download can be found on PETSc website.

Configure PETSc with MKL Sparse BLAS by adding --with-blas-lapack-dir=/path/to/mkl to configuration line. To use MKL_Pardiso, PETSc should be configured with --with-mkl_pardiso-dir=/path/to/mkl

Ex: ./configure --with-blas-lapack-dir=/path/to/mkl --with-mkl_pardiso-dir=/path/to/mkl

More information on PETSc installation can be found here.

When running PETSc application pass “-mat_type aijmkl” or “-mat_type baijmkl” to executable or set matrix type using MatSetType(A,MATAIJMKL)/ MatSetType(A,MATBAIJMKL) call in source code. That will enable MKL Sparse BLAS IE as default package for all sparse matrix operations

Ex: ./ex100 –mat_type aijmkl

Or in the source code:

ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);

ierr = MatSetType(A,MATAIJMKL);CHKERRQ(ierr);

ierr = MatSetUp(A);CHKERRQ(ierr);

To run PETSc application with MKL PARDISO as a direct solver run the code with -pc_type lu -pc_factor_mat_solver_package mkl_pardiso.

For more information, see PETSc examples

↧

Intel® Math Kernel Library (Intel® MKL) and pkg-config tool

November 13, 2017, 1:10 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® Security Dev API: 1.1 Get Started Guide

≪ Previous: Enabling Intel® MKL in PETSc applications

The pkg-config tool[1] is a widely used tool that many users apply to their makefiles. Intel® Math Kernel Library (Intel® MKL) provides pkg-config metadata files for this tool starting with the Intel MKL 2018 Update 1 release.

The Intel MKL pkg-config metadata files cover only the most popular Intel MKL configuration on 64-bit Linux/macOS/Windows operating systems for C.

This table describes the Intel MKL pkg-config metadata files provided and the Intel MKL configurations that they cover.

pkg-config is a helper tool that provide the necessary options for compiling and linking an application to a library. For example, if you want to build your source code test.c with Intel MKL, you can call the pkg-config tool and provide the name of the Intel MKL pkg-config metadata as an input parameter. The full compiling and linking line would be:

Linux OS: icc test.c `pkg-config --cflags --libs mkl-dynamic-lp64-iomp`

Windows OS: for /F "delims=," %i in ('pkg-config --cflags --libs mkl-dynamic-lp64-iomp') do icl test.c %i

For the mkl-dynamic-lp64-iomp Intel MKL configuration (dynamic linking, LP64 interface and Intel® OpenMP threading library) the pkg-config tool generates the following full executable command line on Linux OS:

icc test.c -I/opt/intel/compilers_and_libraries/linux/mkl/include -L/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64_lin -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -L/opt/intel/compilers_and_libraries/linux/mkl/../compiler/lib/intel64_lin -liomp5 -lpthread -lm -ldl

To get the compilation line for Intel MKL, call pkg-config with the --cflags option:

pkg-config --cflags mkl-dynamic-lp64-iomp

To get the link line for Intel MKL, call pkg-config with the --libs option:

pkg-config --libs mkl-dynamic-lp64-iomp

The pkg-config tool helps avoid large hard-coded link lines inside makefiles and makes it easy to change the linking line by using another pkg-config metadata file as an input parameter for the pkg-config tool or by adjusting the metadata file itself.

To adjust compilation and link options in the pkg-config metadata file:

Go to the <mkl_install_dir>/mkl/bin/pkgconfig directory.
Specify the ${prefix} variable, which contains the full path to the Intel MKL directory that could be changed if another path to MKL is needed.
Specify Libs:the link line that is returned by pkg_config --libs. You can get the preferred link line for the Intel MKL configuration using the Intel MKL Link Line Advisor or the offline Intel MKL link line tool. In the link line returned by the advisor, be sure that you change the external environment variable ${MKLROOT} to the internal pkg-config ${prefix}, since the pkg-config metadata file will not work with the external environment variable. However you can set it outside with the pkg-config tool (see the pkg-config tool man page for more information).
Specify Cflags: the compiler options that will be returned by pkg-config --cflags. You can also update it (see the instruction from step 3).

Example of Intel MKL pkg-config metadata file mkl-dynamic-lp64-iomp for Linux OS:

prefix=/opt/intel/compilers_and_libraries/linux/mkl

exec_prefix=${prefix}

libdir=${exec_prefix}/lib/intel64_lin

omplibdir=${exec_prefix}/../compiler/lib/intel64_lin

includedir=${prefix}/include

#info

Name: mkl

Description: Intel(R) Math Kernel Library

Version: 2018 Update 1

URL: https://software.intel.com/en-us/mkl

#Link line

Libs: -L${libdir} -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -L${omplibdir} -liomp5 -lpthread -lm -ldl

#Compiler line

Cflags: -I${includedir}

To use Intel MKL pkg-config metadata files with the pkg-config tool please be sure that you add the full path for the Intel MKL pkg-config metadata files to PKG_CONFIG_PATH so that pkg-config can locate them.

Use the Intel MKL environment script to set up entire Intel MKL environment.

[1]https://linux.die.net/man/1/pkg-config

↧

Intel® Security Dev API: 1.1 Get Started Guide

November 14, 2017, 12:00 am

Latest and popular articles on Intel Technologies

≫ Next: IoT: Three Launches Within One Week!

≪ Previous: Intel® Math Kernel Library (Intel® MKL) and pkg-config tool

Pre-Installation Requirements

The Intel^® Security Dev API pre-installation requirements are listed below:

Item	Description
Hardware	Your development machine must have a Trusted Platform Module (TPM) version 2.0. (Or IBM TSS for TPM 2.0 simulation.) In all cases, simulation is strictly for development purposes and must never be used on the target device.*
Internet connection	Active and properly configured to reach the internet.
Operating system	Ubuntu 16.04 (64-bit desktop version)
Trusted Platform Module (TPM) Software Stack (TSS)	Deployment to actual TPM modules requires the TPM2-TSS package, located here: https://github.com/intel/tpm2-tss/blob/master/README.md

BIOS Settings

You may need to enable your TPM in the BIOS of your development machine. Please consult your motherboard manufacturer for instructions on how to access the BIOS.

Get Started with Intel Security Dev API

Note: Because files are installed in /opt/, all commands must be run as root, or via sudo.

These instructions assume that you have installed TSS, as discussed in the pre-installation requirements section. If you have not installed TSS, you must do so before installing the API.
Unpack the archive by running:
tar -xvf IntegrationBuild_TPM_###_isecInstall.tar.gz
(Replace ### with the build number)
- This will unpack the following files:
  isecsdk_linux_1.1.bin
  GetStarted.txt
Make the .bin file executable by running the following:
chmod +x ./isecsdk_linux_1.1.bin
If the installer is started with no arguments, it will attempt a fresh install of the Intel® Security Dev API for TPM development.
sudo -E ./isecsdk_linux_1.1.bin

Note: The installation will fail if a previous installation of the SDK is found. To resolve the problem, Uninstall the SDK, as described below, and then attempt to install the SDK again by following the instructions in this section.

To access the command-line help, run sudo -E ./isecsdk_linux_1.1.bin Help

After Installing TSS and the SDK, navigate to /opt/intel/isecsdk/api/go to install the Go language package.

Get Go at https://golang.org/dl/
Extract the archive into /usr/local, creating a Go tree in /usr/local/go. For example:
tar -C /usr/local -xzf go$VERSION.$OS-$ARCH.tar.gz
Update your PATH:
export PATH=$PATH:/usr/local/go/bin
Define a workspace directory, and set your GOPATH:
export GOPATH=<workspace>/go
Create the Go <workspace>/src directory
mkdir -p $GOPATH/src cd $GOPATH/src
The isecGo.tar.gz file is provided with the isec installer, and is located in /opt/intel/isecsdk/api/go, untar it by running the following:
tar xzvf /opt/intel/isecsdk/api/go/isecGo.tar.gz
Build the sample by running the following:

cd $GOPATH/src/intel.com/isec/samples/rsakey

go build -tags "tpm debug"

Remove the ./keys dir if it exists:

sudo rm -rf ./keys

Start the App:

sudo ./rsakey

Congratulations! The Intel^® Security Dev API is ready to use. Please see the developers guide, located at <TDB>

Legal and Disclaimers

INTEL CONFIDENTIAL

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

No computer system can be absolutely secure.

Intel, the Intel logo, and Intel Core are trademarks of Intel Corporation in the U.S. and/or other countries.

↧