Quantcast
Channel: Intel Developer Zone Articles
Viewing all 3384 articles
Browse latest View live

Using the Intel® NUC and Ubuntu* to Build a Cloud-Connected Sensor Application

$
0
0

Introduction

This paper will show you how to install Ubuntu* Linux* on an Intel® NUC and use it to build a modular IoT application with cloud connectivity. You'll see how to read real-time sensor data from a temperature and humidity sensor, view it locally on the Intel® NUC, and transmit it to Amazon Web Services* IoT. AWS IoT is used here as an example cloud back-end, where data can be integrated with other cloud applications. You'll build and deploy the application as a snap package — a universal Linux package format that makes it easy to deploy and manage Linux applications running on devices such as the Intel® NUC.

Setup and Prerequisites

  • Intel® NUC connected to a LAN network with Internet connectivity.
  • Computer keyboard and monitor connected to the Intel® NUC (needed for Ubuntu install).
  • RH-USB temperature and humidity sensor.
  • A developer workstation or laptop.
  • An active Amazon Web Services* IoT account.

Installing Ubuntu* on the Intel® NUC

We'll use Ubuntu Server 16.04 to support both application development and run-time deployment on the Intel® NUC. Make sure the BIOS has been updated to the latest version and configure the BIOS for Linux operating system, with the internal eMMC drive enabled.

Follow the Alternative install: Ubuntu Server 16.04 LTS installation instructions to download an ubuntu-server-16.04-XXXXXX.iso image file and install it on the Intel® NUC. The installation process involves loading the image file onto a USB drive, plugging the USB drive into the Intel® NUC, booting from the USB drive, and installing Ubuntu into the /dev/mmcblk0 internal flash drive. These operations are done using a temporary computer monitor and USB keyboard connected to the Intel® NUC. During the installation process you'll be asked to set a hostname and username/password for Ubuntu. You can set them to your own preferences, however for this example we will be using ubuntu for the hostname, ubuntu for the username, and ubuntu for the password.

Once the operating system is installed, remove the installation USB drive and reboot the Intel® NUC so it boots Ubuntu from the internal eMMC drive. Log into the Intel® NUC as the ubuntu user using either the keyboard and monitor, or by logging in from a developer workstation over the network using ssh.

Update the base operating system by running the following commands.

$ sudo apt update
$ sudo apt -y upgrade
$ sudo reboot

Allow the Intel® NUC to reboot.

Connect RH-USB Sensor

The RH-USB sensor is an industrial temperature and humidity sensor with a serial interface that connects via USB. Plug the RH-USB connector into the USB jack on the front of the Intel® NUC. After that's done, Ubuntu will create a serial tty device named /dev/ttyUSB0. Confirm that the device was created by running the following command and inspecting the output (the dash option is the lowercase letter L).

$ ls -l /dev/ttyUSB*
crw-rw---- 1 root dialout 188, 0 May 10 12:37 /dev/ttyUSB0

Give the ubuntu user read/write access to the sensor's serial device by running the following command:

$ sudo usermod -a -G dialout ubuntu

You must log out and log back in as the ubuntu user to pick up the permission change.

We'll use the screen command to verify the sensor is working. screen is a terminal program that allows you to type characters and view responses. Install and run screen using the following commands:

$ sudo apt install -y screen
$ screen /dev/ttyUSB0 9600

After screen has started, press Enter and you should see the >' character appear. That is the RH-USB command prompt. Type the letters PA and press Enter. You won't see the letters PA but you should see a set of digits like the following.

>50.4,72.8>

The digits are the humidity (%) and temperature (degrees F) readings from the RH-USB, separated by a comma.

Exit the screen program by typing Control-A followed by backslash and then typing y to confirm exiting screen.

Creating a Snap

Snaps are self-contained Linux packages that facilitate deploying complete applications on Linux systems such as Ubuntu. They can contain application programs along with prerequisite commands and libraries needed by the application. This alleviates the need to manually install prerequisite packages on target systems in order to run the application. Snaps also make it easier to check for and deploy new versions of applications as well as rollback to earlier versions if there is a problem.
We'll create our sensor application as a self-contained snap package. Snaps can be built on a separate development computer and then deployed on a target computer, or can be built and deployed on the same computer which is how we'll do it on the Intel® NUC.
Install the snap build tools using the following command:
$ sudo apt install -y snapcraft build-essential git
Snapcraft can be used to create and initialize a new snap, however we'll clone an existing application repository and use that to initialize our files. The application is written in Node.js* and uses the AWS* IoT Device SDK for JavaScript* to provide functions for securely transmitting data to AWS IoT.
$ cd ~
$ git clone https://github.com/gregtoth/snap-rhaws.git
The name of our snap is rhaws and the source files making up the application are:
  • snap/snapcraft.yaml - defines the overall Snap package.
  • src/package.json - identifies required Node.js support libraries.
  • src/rhaws.js - the Node.js application that reads the sensor and sends data to the cloud.
The rhaws.js application implements the following functions:
  • Initialize the RH-USB sensor serial port and load AWS IoT security credentials.
  • Periodically send a PA command to the RH-USB to read temperature and humidity.
  • Read and parse the temperature and humidity values returned by the RH-USB.
  • Generate a JSON payload message containing time and raw data and transmit it to AWS IoT
Build the snap using the snapcraft command:
$ cd ~/snap-rhaws
$ snapcraft
The build process downloads and assembles prerequisite components including Node.js and NPM packages listed in package.json. It then builds a snap package and puts the result in file rhaws_1.0.0_amd64.snap. The snap file is a self-contained package containing Node.js, dependent libraries, and our sensor application in a distribution package that's ready to deploy to the Intel® NUC.

Configure and Load AWS* IoT Keys

Before we can connect to AWS IoT, we must configure access keys and load them on the Intel® NUC. The access keys are not stored directly in the application or the snap package for security reasons, and because they will be different for each user.

Log into the AWS IoT cloud console from your developer workstation and navigate to Security > Certificates. Create a new certificate and download the certificate file, private key file and root CA file. Activate the certificate and attach a policy that allows (at a minimum) connecting to AWS IoT and publishing to the 'nuc/temperature' topic.

On the Intel® NUC create a directory that will hold the AWS IoT keys, use the following commands.

$ cd ~
$ mkdir aws-iot
$ chmod 755 aws-iot

Copy the downloaded certificate file, the private key file and root CA file from your developer workstation to the Intel® NUC, using a tool such as sftp, and place them in the /home/ubuntu/aws-iot directory on the Intel® NUC. Rename the files with the following filenames, so it will match what's in the rhaws.js application program:

  • Certificate: mynuc-certificate.pem.crt
  • Private key: mynuc-private.pem.key
  • Root CA: rootca.pem Adjust the access permissions on the files using this command.
$ chmod 644 /home/ubuntu/aws-iot/*

Install and Test the Snap

Install the snap we built earlier by using the snap install command:

$ cd ~/snap-rhaws
$ sudo snap install --devmode rhaws_1.0.0_amd64.snap

Installed snaps are listed using the snap list command:

$ snap list
Name   Version  Rev   Developer  Notes
core   16-2     1689  canonical  -
rhaws  1.0.0    x1               devmode

The snap is configured to run as a service so it will automatically start running after it's installed, as well as when the Intel® NUC is rebooted. You can check the run state of the rhaws snap using the systemctl command:

$ systemctl status -l snap.rhaws.rhaws.service

In the AWS IoT cloud console navigate to the Test function and create a subscription to the nuc/temperature topic. You should see messages being received every 3 seconds containing a JSON object with time, temperature and humidity values.

On the Intel® NUC you can view the sensor data and transmission status by monitoring the log file where output from the rhaws.js program is written. Snaps run under systemd on Ubuntu and log output goes to the systemd log. Use the journalctl command to view systemd log messages; the -f option tails the log to display new entries. Type Control-C to stop viewing log output. The logs are useful for troubleshooting application problems.

$ journalctl -u snap.rhaws.rhaws.service -f
May 23 15:07:19 ubuntu snap[12602]: RH-USB: 50.1,73.6
May 23 15:07:19 ubuntu snap[12602]: Publish nuc/temperature: {"time":"2017-05-23T19:07:19.672Z","temperature":"73.6","humidity":"50.1"}
May 23 15:07:22 ubuntu snap[12602]: RH-USB: 50.1,73.6
May 23 15:07:22 ubuntu snap[12602]: Publish nuc/temperature: {"time":"2017-05-23T19:07:22.695Z","temperature":"73.6","humidity":"50.1"}

The rhaws application can be manually stopped and started using the  command.

$ sudo systemctl stop snap.rhaws.rhaws.service
$ sudo systemctl start snap.rhaws.rhaws.service

Remove the Snap

Use the snap remove command to stop the snap and remove it from the system.

$ sudo snap remove rhaws
rhaws removed
$ snap list
Name  Version  Rev   Developer  Notes
core  16-2     1689  canonical  -

Where to Go From Here

This application provides the basic foundation for creating and running a sensor application on the Intel® NUC as a snap package under Ubuntu. It continuously reads temperature and humidity data from the sensor and transmits the data to AWS IoT in the cloud. Once the data is in the cloud it can be further processed and stored by other applications, manipulated via rules, or visualized using a number of different tools. 

We developed the rhaws snap package in developer mode, which allows us to create and deploy the snap without having to sign it and to publish it to the Ubuntu Store. Snaps can be signed and published so that other people can find and use them without having to build the snap themselves. Publishing snaps involves creating an account on the Ubuntu Store, creating a signing key, signing the snap, and uploading it to the store. Developer mode also relaxes package security constraints that are typically tightened before a snap is published to the store.

Snaps can also be deployed to an Intel® NUC running Ubuntu Core, which is a lighter-weight version of Ubuntu. In order to deploy to an Intel® NUC, the installed Ubuntu Core image must contain support for accessing and using the USB serial port through which the RH-USB communicates.

More information about snaps, snapcraft, and Ubuntu running on Intel® NUC can be found on https://www.ubuntu.com and https://software.intel.com.


SoftLabNSK Builds a Universal, Ultra HD Broadcast Solution

$
0
0

SoftLab-NSK develops complete TV broadcast automation solutions that work with the 4K format and HEVC compression and include functionality for video encoding. When the company wanted to expand its flagship Forward T* line of playout servers, it needed the most efficient solution for video transcoding. After a through study, it chose technologies from Intel that support decoding, processing, encoding, and broadcasting 4K HEVC video from the output of the playout server:

  • Intel® Quick Sync Video, which uses the dedicated media processing capabilities of Intel® Graphics Technology for fast decoding and encoding, enabling the processor to complete other tasks and improving system responsiveness.
  • Intel® Media SDK, part of Intel® Media Server Studio. This cross-platform API is for developing media applications, on Windows* and embedded Linux*, for encoding Ultra HD video

Learn all about it in our new case study.

Winners of the 2017 Intel® Level Up Game Dev Contest

$
0
0

It was another great year for the Intel® Level Up Game Dev Contest! We had a record number of submissions from around the world. It was tough taking the hundreds of exciting games and narrowing it down to the 25 finalists. Once we narrowed it down to the top 5 finalists in 5 genres. The finalists where then given to the external panel of judges for scoring and selection.

We would like to thank everyone who participated in the contest in 2017. We had some truly amazing entries this year! We would like to thank everyone for submitting their games. It was a lot of fun reviewing your games and seeing so many fun games in development across the world. A special thanks also to our contest judges for 2017 and our great sponsors, Epic Games, Razer and Green Man Gaming, we couldn’t do it without you!

Without any further ado, the winners are…

Game of the Year & Best Puzzle/Physics Game  – Resynth by Polyphonic LP

 

Best Platformer Game & Best Use of Game Physics  – Pepper Grinder by Riv Hester

 

Best Adventure / Role Playing & Best Art Design – Cat Quest by The Gentelbros

 

Best Action Game – Megaton Rainfall by Pentadimmensional Games

 

Best Game – Open Genre – Paperbark by Paperhouse Games

 

Best Sound – Yankai’s Peak by Kenny Sun

 

Best Game with 3D Graphics – Stardropby Joure Visser

Best Character Design – The Adventure Pals by Massive Monster

Python issues with the Intel® MPI Library up to version 5.1.3

$
0
0

Intel® MPI Library versions before 2017 (<= 5.1.3) use Python 2 scripts for some infrastructure components - mostly related to the deprecated MPD startup mechanism. One example here is the mpiexec (symbolic link to mpiexe.py) script which uses the general Python shebang sequence.

#!/usr/bin/env python

While these scripts were written for Python 2, they are not necessarily Python 3 compatible. Therefore, issues might appear if the user is leveraging a Python 3 environment - implicit or explicit.

In order to work around the problem, the user can either use the Hydra startup infrastructure or make sure that the default Python environment is Python 2.

The current Python version can be determined using:

$ python -V

 

 

 

Manufacturing Package Fault Detection Using Deep Learning

$
0
0

Executive Summary

Intel's Software and Services Group (SSG) engineers recently worked with assembly and test factory engineers on a proof of concept focused on adopting deep learning technology based on Caffe* for manufacturing package fault detection. The results proved that neural network technology can be applied to silicon manufacturing. They also showed that the Intel® architecture platform has competitive performance and can easily be used to provide both neural network training and inference support.

Background

Silicon packaging, one aspect of semiconductor manufacturing, is a complex and expensive process that requires high quality. During the packaging process, various factors, such as fingerprints, scratches, and stains, can cause cosmetic damage. These damages need to be manually inspected to determine whether they exceed the threshold of allowable damages. As the final checkpoint on the product line with more than 11 criteria rules, the inspection process is subjective due to the complexity of the damage scenarios and human error and inconsistency.

A deep neural network has been proven to outperform traditional methods in terms of image processing. Although topologies such as GoogLeNet have shown good accuracy on general ImageNet tests, questions remain as to whether these topologies can be used for high-quality manufacturing tests. Typically neural network training is done on a GPU, and whether the Intel® architecture platform can provide similar capability is yet another question. This proof of concept (PoC) provided positive answers to these questions.

Problem Statement

This PoC aimed to reduce the human review rate for package cosmetic damage at the final inspection point, while keeping the false negative ratio at the same level as the human rate. The input was package photos, and the goal was to perform binary classification on each of them, indicating whether the package was rejected or passed. Manual inspection followed a set of rejection criteria on damages of particular shapes and location and sizes exceeding a particular threshold. The manual inspection required a low false negative, and the majority of the input photos that were inspected passed. The unbalanced pass/reject input photo number ratio, coupled with complicated judgment criteria rules made the manual inspection work tedious and prone to errors.

Solutions

SSG first proposed GoogLeNet V1 topology based on the convolutional neural network (CNN). This topology balances the training/inference time and testing accuracy, making it well suited for use as image classification. To tailor the topology, we took only the green channel input and reduced the Full Connection layer class from 1,000 to 2 for binary classification. The required top-1 accuracy is much stricter than that for the standard GoogLeNet V1 on ImageNet-1k (approximately 68.7 percent).

The training was supervised learning. We took about 4,000 images as input, labeled them as either passed or rejected, and then rotated each image 36 times, 10 degrees each time, for data augmentation purposes. We did not augment the images with different scales, since the damage criteria were size sensitive. We fed the input into GoogLeNet V1 for training. For the output result, we classified each original image by 36 images result ensemble.

TopologyCustomized GoogLeNet V1Standard GoogLeNet V1
Number of image classes2 (passed or rejected)1,000 (ImageNet-1k)
Number of training input images14,400 (4000´36)1.2 million (ImageNet-1k)
Top-1 accuracyHuman-level false negative (far greater than 68.7 percent)68.7 percent

In addition to CNN, we added region-based CNN (RCNN) to meet the strict false negative rate goal, which is typical for manufacturing. This goal is hard to achieve with a single CNN classification model. A major problem is that the human decision of “passed” or “rejected” is likely inconsistent with the class boundaries (for example, damages with a size around the threshold). These ambiguous labels confuse CNN. We first tackled the problem by sacrificing the false positive rate to reduce the false negative rate. To do this, we relabeled the input images to put images around the boundary into the rejected class. However, the results were still not satisfying. Even though the false negatives went down, the false positives went up rapidly. Eventually we decided to add RCNN to enhance the detection accuracy.

RCNN can detect an object's location, size, and type. We used the ZFNet-based Faster RCNN model. We leveraged the class probability output of CNN and further categorized the image samples into three classes: strong passed, rejected, and weak passed. The weak passed category, which has low confidence per CNN probability, is likely a result of ambiguous training, so we took the weak passed class output and fed it into the RCNN network. Any input with a detectable defect was put in the rejected class; otherwise the input was put in the strong passed class.

All rejected images were inspected again by humans, who served as the final gatekeepers.

inspection flowchart

Results

The final output from the two concatenated networks was revealing. The false negative rate consistently met the expected human-level accuracy with the months' live manufacturing data testing. The false positive rate was approximately 30 percent, which means that 70 percent of the manual inspection effort was saved. The human inspectors used to identify one rejected image out of 10 input images. Now that rate is one out of three, which inspires them to work with lower workload.

We compared the prediction result from pure CNN to the CNN and RCNN approach. RCNN helped to reduce the false negative rate to the strict target, while still keeping the false positive within an acceptable range. Since human inspectors may also misclassify the image samples around the criteria boundary, the model also served as a cross-check to educate human inspectors and improve their inspection quality.

The deep neural network solution based on Caffe optimized for Intel architecture was developed by SSG and handed over to TMG engineers, who played the role of domain experts. Even without artificial intelligence knowledge, TMG engineers are able to easily fine-tune the network with new samples. Our practice showed that by gradually adding new unseen samples to the training set and retraining the models, this deep learning-based solution can show consistent performance over time.

Intel® Architecture for Training

The solution was initially designed to run on a GPU, but later a decision was made to migrate it to the Intel architecture platform. The requirement is to finish the GoogLeNet V1 training within 15 hours. With a little migration effort, we can run GoogLeNet V1 based on Caffe optimized for Intel architecture on the Intel® Xeon® processor E5-2699 and Intel® Xeon® Platinum 8180 processor platforms. Both these platforms can provide time to train within the requirement range.

HardwareTFlopsSoftwareBatch SizeImages per SecondTime to Train
Intel® Xeon® processor E5-26993.1Berkeley Vision and Learning Center Caffe with CPU mode361.555 days (estimated)
Intel Xeon processor E5-26993.1Caffe optimized for Intel® architecture with the Intel® Math Kernel Library (Intel® MKL)3617311.5 hours
Intel® Xeon® Platinum 8180 processor (2.5 GHz 8180)8.2Caffe optimized for Intel architecture with the Intel MKL362667.5 hours

Conclusion

This PoC demonstrated that deep learning technology can be applied to the manufacturing field with high-quality requirements. The architecture of CNN can learn the sophisticated features from the input images for classification. The combination of CNN and RCNN can provide low false negative rates with a reasonable false positive rate.

The PoC also proved that the Intel architecture platform can be used for real deep learning application training. Both the Intel Xeon processor E5-2699 and Intel Xeon Platinum 8180 processor platforms can meet user requirements. Caffe optimized for Intel architecture improved the training performance by 100 times on Intel Xeon Platinum 8180 over Berkeley Vision and Learning Center Caffe on Intel Xeon processor E5-2699. This significant performance improvement has made training on Intel architecture a reality.

More than Just a Pretty Game (With Dinosaurs)

$
0
0

The original article is published by Intel Game Dev on VentureBeat*: More than Just a Pretty Game (With Dinosaurs). Get more game dev news and related topics from Intel on VentureBeat.

“It came about as an accident…I wish there was a better answer,” is Pubgames Director Chris Murphy’s candid response to explaining the origin of the studio in 2011. Based in Melbourne, Australia, the founding developers considered their options while completing their final year of university. “We were doing double degrees in Computer Science and Multimedia,” says Murphy, “and there were things we didn’t know, but we didn’t know at the time exactly what we didn’t know.”

Solving this conundrum was relatively simple as it turned out. “We started building games for the sake of building games, to try and figure out what would go wrong,” he adds.

Their first game, a mobile space shooter called BlastPoints, led to the group getting noticed and them setting up a small business to formalize the organization. As an official company, they could then apply for government grants that assisted in getting the group into production. “Now we have our name on a bunch of titles and a semi-stable business,” says Murphy.

With what Murphy describes as the “snowball now rolling downhill,” it seemed the team may have solved the problem of knowing what they were doing. But as for the true complexities of game development, even with the lessons learned, Murphy and Lead Gameplay Programmer Luke O’Byrne both chuckle at the notion that “we still haven’t.”

The initial smaller-scale successes afforded the chance for the team to take on its first big PC project. “We pushed so heavily to do Primal Carnage: Extinction,” says Murphy. “Aside from it being a really fun concept of being dinosaurs to eat your friends, we saw it as a great place to see how an economy would go in games.” The game, released in 2014, still maintains an enthusiastic audience, and the studio continues to add significant upgrades.

It wasn’t just the dinosaur fights that attracted Pubgames to Extinction, but the opportunity to figure new methods of making the game development economy work. “We were one of the first in the world to use Valve’s economy system, and suspect through our compliance—and tears and sweat—they learned about the Steam inventory service, and others. Having players able to trade and sell between each other generated so many things that we never expected,” says Murphy.

Among these ideas was the concept of giving most of the game content away for free. Then, with players trading among themselves, purchasing packages, and playing the game for the opportunity to get rarer and rarer items, the system reflected some of the new ideas on indie game economies. “We also ended up with a great, engaged community,” says O’Byrne.


Above: Primal Carnage Extinction—capturing dinosaurs while wearing your cool shades.

Diversify: A key to success

Statistics exist that suggest in the last year more games were released on Steam than the total of its previous history, but that the revenues generated were the same. That means there are more games earning the same amount of money, but spread more thinly among the numerous options out there for gamers.

So, the Pubgames team accepts that finding that hit idea becomes more of a challenge. “Being given a blank slate and blank check doesn’t tend to lead to the most creative product. Having some limitations can help the creativity to focus on what the players want, what they should expect,” says O’Byrne.

That could mean game-styled or game-related content, but sold to clients in enterprise and training businesses.

“We’ve done some pretty weird stuff.” says Murphy, “Our work has always been super-varied. One project could be minor contract stuff, then we’re designing dynamic weather systems in the next, and then a full-blown PC and PS4 in the next.”

It’s a diversification plan that fits with the notion of ensuring that each project doesn’t simply pay for itself, but also helps take the sting out of the next project becoming some terrifying gamble.

“The panic is always there,” says Murphy, “but we’ve been really lucky. Because of how we started, there were some hard years, so it has been a long, hard road. Hoping for the best and planning for the worst.”

“There’s been a real shift to using game technologies and ideas in non-game areas,” says O’Byrne. At a recent VR conference, it was noted that a common theme was taking game-style content and applying it to work for training, or for museum exhibits, and many more functions that make non-game teaching aids and other content interactive.

They also note a trend in the Australian indie development community where teams in Melbourne have jumped into the mobile space and VR (many studios share communal offices in combined workspaces and incubators), while the Sydney studios are applying their game technologies to enterprise work and marketing.

“Without the overseas financial backing, teams have been forced to go for smaller projects,” says O’Byrne. “But doing ten of those over, say, ten years can really push company growth versus two or three big game projects in that time.”


Above: Primal Carnage Onslaught will enjoy a full release as the marketplace develops with more headsets and more powerful PCs in the hands of gamers.

VR too soon?

Exhaustively exploring new technologies is part of the kick that drives these computer science majors, and the flexibility of a small indie studio to pivot into a new space is a luxury not available to the larger AAA publishers. “If Augmented Reality looks like it’s actually going to be a thing, then we can chase it,” says Murphy.

That was partly the case with the team’s follow-up to Primal Carnage Extinction with Onslaught, which pitched the dino fighting action into the VR realm.

“We went early with VR, that’s fair to say, hoping to capitalize on the early audience, and our community of players in Extinction,” says O’Byrne. “Onslaught started as a weekend R&D project. What can we do with VR? Very quickly we had a fun project and realized we could do this,” he adds.

However, the Early Access results weren’t performing as the team had hoped so they are candid about having to sideline the project for a while and turn attention to some of the diversified other projects. They did confirm that the game is still in active development.

Such are the benefits of a nimble, small studio, and challenges of a small studio’s project not performing as hoped. It’s this flexibility of the marketplace that keeps the team open-minded about where it sees itself in the future.

“Eighteen months ago, I would have said something very different,” says Murphy, “but internally we have an enterprise project we can’t talk about just yet. We also released a successful training course. I wouldn’t have said a year ago that the company products are dinosaurs and enterprise software.”

“We’ll always look to what can be achieved with new technologies because it’s always a challenge creatively and intellectually,” says O’Byrne. So is assessing the influence on game design as well as the marketing of platforms like YouTube and Twitch that they accept can change the dynamic of the industry. “Open-world horror games like PUBG and DayZ aren’t successful just because of the genre, but because they play well to the audience with the crazy stuff that can happen, and they can find absurd things absolutely hilarious,” adds Murphy.

So, this group of university friends continues to shift wherever it needs to in order to survive as a successful indie studio, remain working together, and keep playing with the new, wonderful toys that come along.

Seems like one happy accident.

Bah VR! Holograms are the Future

$
0
0

The original article is published by Intel Game Dev on VentureBeat*: Bah VR! Holograms are the Future. Get more game dev news and related topics from Intel on VentureBeat.

My brain does hurt a little bit right now.

I don’t think it can fully grasp the future vision that Euclideon CEO Bruce Dell is peddling. Well, peddling suggests a sales job rather than a real life working product; but he says that his team has solved a huge problem with hologram tables, and will solve many more of the challenges in the years ahead.

So, time to take a deep breath and get my inadequate head around the story of Euclideon, and in particular Dell’s work in graphics technology that—if proven all above board and kosher—is destined to revolutionize many industries around the world, including gaming.

But first, to be clear, Euclideon is not making a game! This is about technology. It’s about atoms. It’s about graphics processing without using a graphics card. It’s about holograms. It’s about hologram tables supporting multiple users who only need to be wearing a simple pair of sunglasses.

Why that’s a big deal? We’ll get to that. First, where did Euclideon come from?

“Twenty years ago. I believed there was another way to make 3D graphics work,” says Dell. “They’re all done with polygons, but I thought they could be made out of little atoms…the problem is that the more of them you have, the slower your computer will run.”

Okay, reinventing the popular gaming notion that graphics made with polygons, and depth supplied by bump maps — now we’re talking about atoms.

If I wasn’t on the other end of a call, but sitting in front of Dell, he would have witnessed one of those blank-faced nods that suggests understanding, but the lack of PhDs in computer science, graphics, and god knows what else, in fact is saying, “I’ll just believe you because I don’t know how to pursue the details of the atom as a graphics image.”


Above: The sunglasses are simple method of viewing the hologram images.

Adds Dell: “I saw Donkey Kong Country and misunderstood that these 3D graphics were being made and recorded as flat sprites. I thought they were being processed with all the lighting, etc. So, I thought 3D graphics were a long way ahead.”

Working his theories—unpopular or unexpected though they may be (in fact, some considered it futile research)—Dell pursued his belief in the atom graphics process. “After 13 years, I had achieved and refined it so much it was working extremely well,” he says. The apparent breakthrough led to articles in significant technical magazines such as New Scientist and Popular Science, which boosted Dell’s profile and ultimately resulted in folks from one of the top tech companies in Australia joining his crusade.

“Now we had unlimited graphics power,” he says of the ultimate result of this technology. It also caused great debate, if not outright online arguments, with who Dell describes—without naming names—as the head of Minecraft stating that it couldn’t be done while the head of Crytek suggested it could, it was just considered not possible.

But the technology was not going to be applied to games. Instead, Dell focused on laser scanning and the massive graphics and processing requirements of geo-spacial companies. These companies would take a laser scanner to a location and scan it to create a 3D model. For example, scanning a house so that it could be shown remotely by a realtor. The problem was that the laser scanners in use at the time had limitations.

“Take a laser scanner into a house, and when you moved, you’d lose 99% of the points, and they would redraw when you stopped,” Dell explains. But with the unlimited ability to run what’s called point cloud data, it was surmised that you could run the scanners on locations much larger than a house.

Related to this was the understanding that you cannot stream 3D graphics; they need to be loaded and then they can be used. Dell’s technology addressed that conundrum. “We progressed the technology to the point where it could stream 3D graphics off the hard drive and over the internet,” says Dell. The application of this achievement has not gone unnoticed around the world, with Dell telling us that companies and countries as diverse as the French railway system, the Hungarian and Turkish governments, and the Tokyo traffic authority all using the technology.


Above: The device in his hand allows the image to be scaled and manipulated easily.

No to VR, AR or Hololens…

Now arrives VR with all its potential and all its challenges as a technology. Dell stated that Oculus Chief Scientist Michael Abrash, suggested at a presentation that VR technology would require 1,000 times more polygons before games looked as good as the best on PC and console. (Abrash apparently suggested that solving that issue with VR technology would at least generate a ton of rewarding jobs for scientists and researchers over the next few years, so if you wanted a silver lining…)

For Dell, VR reminded him of 3D TVs: they seemed like a good idea at the time, right until it was proven that nobody wanted them. Working with businesses and the military, the barrier for these business clients was wearing the bulky headsets.

Augmented Reality looked more intriguing to Dell, but ultimately still posed the same challenge for businesses reluctant to use the required headsets. Microsoft’s Hololens was also intriguing, but Dell understood that hologram tables had a problem: one user could see the image perfectly, but a second user would have a distorted view.

“We went looking to see if there was a way to do a hologram table that could support more than one user,” Dell says. “We went into very weird ends of science that went into changing the fields of light-waves, then changing them back for different users,” he added. More importantly, he states: “We’ve actually been successful.”

“We’ve been able to produce a table where more than one person can see it…I can see the front of the building and you could see the back,” says Dell. With the table R&D now complete, Dell states that “we’re the first in the world to make a multi-user hologram table where you can just wear simple sunglasses.”

When do we see the proof? Dell is currently raising capital for the manufacturing that Euclideon will handle itself, courtesy of its 240 different component suppliers.

If that capital is raised and manufacturing begins, the first tables 1.5 meters by 1.5 meters will roll off their assembly line and into businesses such as real estate companies who will map out a shopping center in hologram form before breaking ground on actual construction. Dell says that he’s already received requests for tables up to 4×4 meters, which presents new problems of projection that his R&D team needs to resolve in the months and years ahead.


Above: Prototype model of the hologram table.

But the claims that any amount of data can be streamed anywhere presents a future where businesses share one hologram table and multiple users will be able to assess the content. For the games industry, Dell suggests the technology means much better graphics running much faster than they currently do with polygons and graphics cards. “A company like Crytek has a lot of artists, but when we can go out to a forest with a laser scanner and scan it, it means the cost of resources goes down dramatically,” he suggests.

Dell intends for Euclideon to be the leading hologram company in the world, blending the atom graphics technology and internet streaming into these devices. He may even be open to sharing the technology for others to use. It’s bold, ambitious, and controversial. Dell even suggests that the company is figuring out—maybe even has figured out—how to allow a hologram table to be visible without the use of glasses at all.

If you scoff, you’re not alone, but if it just so happens to be the real deal, the potential for games and for businesses of all types is staggering.

You get your head around all that?

Chasing the VR American Dream. From Australia

$
0
0

The original article is published by Intel Game Dev on VentureBeat*: Chasing the VR American Dream. From Australia. Get more game dev news and related topics from Intel on VentureBeat.

logo/banner page for Australian game The American Dream

A VR game set with an Americana vibe and the distinctive visuals of the 1950s being made by a small indie team in Melbourne, Australia? Didn’t quite see that one coming. Oh, and your hands are replaced by guns, so you shoot everything to achieve even the simplest tasks. So yes, you must also hang in there for the social commentary from overseas folks passing along their takes on certain aspects of American culture. It may not be for everyone.

"We take you through life as a gun-toting American!" Says Nicholas McDonnell, Managing Director and artist at Samurai Punk, with a chuckle.

American Dream VR boldly makes a statement of political satire in a game that's sure to be met with loud opinions from across the political spectrum. Sure, it's a method of getting noticed, but it rolls out of a type of game that this fledgling team wanted to make. However, it's not just a cynical attention grab.

McDonnell and co-founder Winston Tang had been developing games together since mid-2013, working on game jams outside their day job. In 2014 they took the brave step to start out together with a game that McDonnell describes as a "mash-up of Tetris." It proved, like any first try might, to be good practice. "It was good to get a failure done since we learned a lot of things about how to make games and how to market them (or more accurately how not to market them)," he adds.

Next up was Screencheat, a well-received split-screen action game that displayed plenty of its own innovative gameplay ideas and quirky humor. It allowed the addition of two more staffers, bringing the total to five, and the collective vision of what Samurai Punk wanted to represent took a more solid shape. "We're trying to do the things people wouldn't do," McDonnell says. "We're going into a small market [VR] with a political satire, taking the piss out of American gun culture. We want to make games that have something to say, be they social commentary or interesting gameplay mechanics or styles."

Like many similar indies, they ply their trade out of a co-working space in Melbourne. (And like many, there can be quite a bit of background noise when conducting interviews, prompting a "Hey, Winston, calm your farm!" retort from McDonnell. [Insert your own Australian accent to further appreciate potential crazy of small studios in Melbourne.]

Sales of these games, along with government funding, investments, or small local grants that McDonnell wrangles in his role as managing director has allowed them to turn their attention to VR as the ideal platform for American Dream.

Animated gif of gun, teddy bear and toddler
Above: A slightly different take on the ‘gun show.'

How to make the American Dream

"AD started out honestly as a joke," McDonnell freely states. "We were joking that we play a lot of shooters, and that led to questions like ‘what does the Call Of Duty guy do when he goes home? Does he shoot open the door, grab a beer, and shoot off the cap?" It's all part of the statement that guns are the principal characters in a lot of shooters, and when you stretch that quirky notion in a particular direction you end up with guns for hands and shooting to perform actions as basic as making a cup of coffee and flipping burgers in American Dream VR.

"Track controllers started offering new options for what we needed," says McDonnell of the move towards VR after managing to snag a PlayStation VR kit. "Traditional FPS games are like a point-and-click with a gun. So, for American Dream, VR made more sense for the game design. It's certainly not a financial decision, but it looks like there is steady growth across the board in VR."

Animated gif of gun, assembly rollers, dog
Above: This amusement park ‘ride' is sponsored by the gun companies to teach American families about gun uses for everyone from flipping burgers to brewing coffee.

Learning programming in VR was only one of the significant challenges for the small team. "Each platform has its own approach to support," explains McDonnell, "so Vive is more community driven; PlayStation is more traditional with developer forums, private support, and account managers."

There are also differences in the tracking ranges between the systems that caused the team to make some important gameplay decisions for American Dream. "We could start at Vive, then bring down the tracking on Oculus, and then down to PS4…and one developer did this and that just sounded like a fucking nightmare. We didn't want to create extra work for ourselves since we were already learning this new programming for VR."

This led to a change from moving around the game world to being driven around in a bullet-shaped cart. "It's set in a kind of Epcot Center run by gun companies that are showing Americans how to use guns," says McDonnell. Guiding your path is the game's narrator, Buddy Washington, who talks through the mouth of a Labrador. You can't make this up.

Screenshot of 4th of July event, gun, dog, fireworks

Above: After flipping burgers, fire the gun at the fireworks to celebrate Fourth of July.

 

"Once we built the ability to sit in a room and shoot, we thought of the things you do every day," says McDonnell. "You eat, drink coffee, go to work, take a shit, have kids. So, then it's about thinking what's funny and physically interactive? You're not shooting at moving targets, it's more like you're completing a set of tasks. Like the burger flipping level, which is getting the burger on the bun, and then get it taken out to the customer…all by shooting at certain pieces of the environment."

Through this mechanic, the game takes you on a journey through the stages of life as perceived in a 1950s-style America. "A lot of the inspiration was from straight video games because we played a lot of shooters," adds McDonnell. "where a game like Half-Life has you shoot a lock to open a door or a vending machine to get a can."

Unsurprisingly, the community reaction has been mixed. "YouTube comments are pretty much split 50-50," says McDonnell, clearly proud of that achievement. "Either it's funny for taking the piss, or some people are ‘what is this?' or ‘stupid Australians', and some, ‘yeah, fuck America,'" he adds.

A special Fourth of July trailer illustrated fairly clearly how the mechanics will work. Whatever the reaction, it represents a risk and an opportunity for the small group who look to a future that still has a lot of questions with unclear solutions.

"Do we dive into a three-year game; do we just disappear into the ether? Or do we keep whatever momentum might happen from American Dream and move straight onto the next thing?" questions McDonnell.

I think we can probably guess the answer to that one.


Coarray Fortran 32-bit doesn't work on 64-bit Microsoft* Windows

$
0
0

Version : Intel® Visual Fortran Compiler 17.0, 18.0

Operating System : Microsoft* Windows 10 64-bit, Microsoft* Windows Server 2012 R2 64-bit

Problem Description : Coarray Fortran 32-bit doesn't work on Microsoft* Windows 10 or Microsoft* Windows Server 2012 R2 (only on 64-bit OS)  due to required utilities “mpiexec.exe” and “smpd.exe” not working properly.

Resolution Status :

It is a compatibility issue. You need to change the compatibility properties in order to run “mpiexec.exe” and “smpd.exe” correctly. Following workaround should resolve the problem:

1. Go to folder where your “mpiexec.exe” and “smpd.exe” files are located.
2. For both files follow these steps:

  • Right click > Properties > Compatibility Tab
  • Make sure the “Run this program in compatibility mode for:” box is checked and Windows Vista (Service Pack 2) is chosen.
  • Click Apply and close the Properties window.

Coarray Fortran 32-bit application should work fine if all steps followed carefully.

Intel® Xeon® Scalable Processor Cryptographic Performance

$
0
0

Executive Summary

The new Intel® Xeon® Scalable processor family provides dramatically improved cryptographic performance for data at rest and in transit. Many Advanced Encryption Standard (AES)1 based encryption schemes will immediately benefit from the 75 percent improvement in Intel® Advanced Encryption Standard New Instruction (Intel® AES-NI) instruction latency. In addition to improvements in existing technologies, the new Intel® Advanced Vector Extensions 512 (Intel® AVX–512) instruction family brings up to 3X performance gains1 over previous-generation Intel® Advanced Vector Extensions 2 (Intel® AVX2) implementations of secure hashing. These performance gains for cryptographic primitives improve throughput for intensive workloads in markets such as networking and storage, lowering the barrier to making encryption ubiquitous.

Overview

With strong security becoming a ubiquitous data center application prerequisite, any associated cryptographic performance tax takes computes away from the main function. Intel’s focus on providing primitives in every core to accelerate cryptographic algorithms has helped alleviate this burden and enable demanding workloads to achieve remarkable throughputs on general purpose servers2. One example of this is the continued performance increases of Intel® AES-NI since its original launch in 2010. This focus also holds for the introduction of new features that provide significant gains over previous generations1.

With Intel® Xeon Scalable Processors, the improved Intel AES-NI design and introduction of Intel® AVX-512 brings a new level of cryptographic performance to the data center. This paper examines the gains seen in two modes of AES operation, Galois counter mode (GCM) and cipher block chaining (CBC), as a result of the Intel AES-NI improvements. The impact of Intel AVX-512 will be demonstrated with the secure hashing algorithms (SHA-1, SHA-256, and SHA-512)3, in particular comparing the new results against Intel® AVX2 based implementations from the previous Haswell/Broadwell generation of Intel Xeon processors.

Intel® Xeon® Scalable Processor Improvements

The cryptographic performance enhancements seen in the Intel Xeon Scalable processors are due to new instructions, micro architectural updates, and novel software implementations. Intel AVX-512 doubles the instruction operand size from 256 bits in Intel AVX2 to 512 bits. In addition to the 2X increase in amount of data that can be processed at once, powerful new instructions such as VPTERNLOG enable more complex operations to be executed per cycle. Combining Intel AVX-512 with the multibuffer software technique for parallel processing of data streams spectacularly improves SHA performance. The latency reduction of the AES Encrypt and AES Decrypt (AESENC/AESDEC) instructions along with the improved microarchitecture have shown gains in both parallel and serial modes of AES operation.

AES

The new Intel® Xeon® Scalable processor has significantly reduced the latency of AES instructions from seven cycles in the previous Xeon v4 generation down to four cycles. This reduction benefits serial modes of AES operation, such as Cipher Block Chaining (CBC) encrypt. As with most new Intel® microarchitectures introduced, improvements in the core design manifest into appreciable performance gains. For optimized implementations of AES GCM, the parallel paths of the AESENC and PCLMULQDQ instructions have improved to the point where the authentication path is almost free.

SHA

Moving from Intel AVX2 to Intel AVX-512 implementations of the SHA family brings benefits beyond the doubling of data buffers that can be processed at once. Two key additions are the expansion of registers available from the 16 256-bit YMMs in Intel AVX2 to the 32 512-bit ZMMs in Intel AVX-512 and the more powerful instructions in Intel AVX-512. With more registers available, the message schedule portion of SHA can be stored in registers and no longer has to be saved on the stack. With more powerful instructions, the number of instructions that need to be executed is reduced and the dependencies are eliminated.

A closer examination of the power of the VPTERNLOG instruction can be illustrated on the SHA-256 Ch and Maj functions. Table 1 shows the Ch and Maj functions along with the Boolean logic table.

Table 1. SHA-256 Ch and Maj function logic tables.

Ch (e, f, g) = (e & f) ^ (~e & g)

Maj (a, b, c) = (a & b) ^ (a & c) ^ (b & c)

e

f

g

Result (0xCA)

a

b

c

Result (0xE8)

0

0

0

0

0

0

0

0

0

0

1

1

0

0

1

0

0

1

0

0

0

1

0

0

0

1

1

1

0

1

1

1

1

0

0

0

1

0

0

0

1

0

1

0

1

0

1

1

1

1

0

1

1

1

0

1

1

1

1

1

1

1

1

1

The VPTERNLOG instruction takes three operands and an immediate specifying of the Boolean logic function to execute. Tables 2 and 3 compare the Intel AVX2 and Intel AVX-512 instruction sequences for the SHA-256 Ch and Maj functions. Note that register to register copies are generally free in the microarchitecture.

Table 2. Intel® AVX2 and Intel® AVX-512 instruction sequence for the SHA-256 Ch function.

Ch (e, f, g) = (e & f) ^ (~e & g)

Note this is equivalent to ((f ^ g) & e) ^ g)

Intel® AVX2

Intel® AVX-512

vpxor ch, f, g

vpand ch, ch, e

vpxor ch, ch, g

vmovdqa32 ch, e

vpternlogd ch, f, g, 0xCA

Table 3. Intel® AVX2 and Intel® AVX-512 instruction sequence for the SHA-256 Maj function.

Maj (a, b, c) = (a & b) ^ (a & c) ^ (b & c)

Note this is equivalent to ((a ^ c) & b) | (a & c)

Intel® AVX2

Intel® AVX-512

vpxor maj, a, c

vpand maj, maj, b

vpand tmp, a, c

vpor maj, maj, tmp

vmovdqa32 maj, a

vpternlogd maj, b, c, 0xE8

Performance Gains

Cryptographic performance of Intel Xeon Scalable Processors shows per core gains of 1.181X to over 3X compared to the previous Xeon v4 Processors. The performance of some of the most commonly used cryptographic algorithms in secure networking and storage are highlighted using two popular open source libraries, OpenSSL*4 and the Intel® Intelligent Storage Acceleration Library (Intel® ISA-L)5.

Methodology

In order to maximize reproducibility and be able to project performance to different frequency and core count processors, the results are reported in cycles/byte (lower is better). The platforms are tuned for performance and turbo mode is disabled to allow for consistent core frequency for every run. To get throughput numbers in bytes per second for a specific processor, divide the processor’s frequency by the cycles/byte value reported. For total system performance multiply that value by the number of cores as these performance results have a nice linear scale.

The processors used for these performance tests are the Intel® Xeon® Gold 6152 processor and Intel Xeon processor E5-2695 v4, each with 16 GB of memory and running Ubuntu* 16.04.1.

OpenSSL*

Results shown in Table 4 are collected from the OpenSSL v1.1.0f speed application on 8 KB buffers using the following commands:

openssl speed -mr -evp aes-128-cbc
openssl speed -mr -evp aes-128-gcm

Table 4. OpenSSLU speed results for AES CBC Encrypt and AES GCM (cycles/byte).

Algorithm

Xeon V4

Xeon Scalable

Xeon Scalable Gain

AES-128-CBC Encrypt

4.44

2.64

1.68

AES-128-GCM

0.77

0.65

1.18

Intel® ISA-L

Results shown in Table 5 are collected from the Intel ISA-L crypto version v2.19.0 cold cache performance tests using the following commands:

make perfs
make perf

Table 5. Intel® ISA-L performance test results for SHA Multibuffer (cycles/byte).

Algorithm

Xeon V4

Xeon Scalable

Xeon Scalable Gain

SHA-1

1.13

0.44

2.55

SHA-256

2.60

0.87

2.97

SHA-512

3.24

1.07

3.03

Figure 1. Single core Xeon Scalable Processor performance gain over previous generation Xeon v4.

Conclusion

The new Intel Xeon Scalable processors continue the tradition of lowering the computational burden of cryptographic algorithms. By incorporating the open source optimized cryptographic software libraries profiled in this paper, your application will take advantage of the best performance from the latest processor features.

Acknowledgements

We thank Jim Guilford, Ilya Albrekht, and Greg Tucker for their contributions to the optimized code. We also thank Jon Strang for preparing the Intel Xeon platforms in the performance tests.

References

1. “Federal Information Processing Standards Publication 197 Advanced Encryption Standard” http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf

2. “6WIND Boosts IPsec with Intel Xeon Scalable Processors” http://www.6wind.com/wp-content/uploads/2017/07/6WIND-Purley-Solution-Brief.pdf

3. “Federal Information Processing Standards Publication 180-4 Secure Hash Standard” http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf

4. OpenSSL https://github.com/openssl/openssl

5. Intel Intelligent Storage Acceleration Library Crypto Version https://github.com/01org/isa-l_crypto

1 – Performance claims based on measured data and methodology outline in the “Performance Gains” section of this document

Infographic: Get Started with AI Development Today

$
0
0

Read our latest infographic to discover the opportunities presented by artificial intelligence for professional developers. Whether you're interested in applications that use deep learning, machine learning, or broader AI capabilities, now is the time to learn AI skills. As our data shows, AI is now more accurate than humans for selected applications, and the AI market is expected to boom over the next three to eight years.

Our flowchart shows you how you can get started with training and deploying your AI model today. The infographic introduces you to the Deep Learning Training Tool Beta, which uses a single interface from framework installation to model deployment.

Discover more AI resources here

Intel® Integrated Performance Primitives Release Notes and New Features

$
0
0

This page provides the current Release Notes for Intel® Integrated Performance Primitives. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

 

2018

Initial Release

Release Notes

What's New in Intel® IPP 2018:

  • Added new functions to support the LZ4 data compression and decompression. This release also introduces the patch files for LZ4 source to provide drop-in optimization with the Intel® IPP functions.
  • Introduced the standalone cryptography packages. The cryptography functions no longer depend on the main Intel® IPP packages, and can be used without the main Intel® IPP packages.
  • Introduced the optimization code for the GraphicsMagick source. The code can provide drop-in optimization on GraphicsMagick with the Intel® IPP functions:
    • The code supports GraphicsMagick version 1.3.25, and provides optimization for the following GraphicsMagick APIs: ResizeImage, ScaleImage, GaussianBlurImage, FlipImage, and FlopImage.
    • The optimization code can improve the APIs performance by up to 4x, depending on the functionality, input parameters, and processors.
  • Made the Integration Wrappers APIs part of the Intel® IPP packages.
  • Computer Vision:
    • Added the 64-bit data length support for Canny edge detection functions (ippiCanny_32f8u_C1R_L).
  • Color Conversion:
    • Added the ippiDemosaicVNG functions that support the demosaicing algorithm with VNG interpolation.
  • Cryptography:
    • Added the Elliptic Curves key generation and Elliptic Curves based Diffie-Hellman shared secret functionality.
    • Added the Elliptic Curves sign generation and verification functionalities for the DSA, NR, and SM2 algorithms..
  • Performance:
    • Extended optimization for the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction sets.
    • Improved performance of LZO data compression functions on Intel® AVX2) and Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2).
  • Other Changes:
    • Removed support for Intel® Pentium® III processor. The minimal supported instruction set is Intel® Streaming SIMD Extensions 2 (Intel® SSE2).
    • Removed support for the Intel® Xeon Phi™ x100 product family coprocessor (formerly code name Knights Corner) in this release.

      The Intel® Xeon Phi™ x100 product family coprocessor (formerly code named Knights Corner) was officially announced end of life in January 2017. As part of the end of life process, the support for this family will only be available in the Intel® Parallel Studio XE 2017 version. Intel® Parallel Studio XE 2017 will be supported for a period of 3 years ending in January 2020 for the Intel® Xeon Phi™ x100 product family. Support will be provided for those customers with active support.

Known Issues:

  • The release only provides optimization on Intel® SSE4.2 and later instructions for the morphology image processing functions. Users may notice performance degradation in those functions on some old processors. If the old instructions optimization for the morphology functions is important for your applications, please submit your feedback through the product support.

Threading Notes:

  • To support the internal threading in the Intel® IPP functions, Intel® IPP provides the Threading Layer APIs in the platform-aware functions. These APIs can support both 64-bit object sizes (for large size images and signal data) and internal threading in Intel® IPP functions. Check the “Threading Layer Functions” pat in the Intel® IPP Developer Reference to get more information on these APIs., Your feedback on extending the new threading functions is welcome.
  • The legacy Intel IPP threaded libraries are available by the custom installation, and the code written with these libraries will still work as before. However, the threaded library will not expand its threading functions, and the new threading will be developed only in the new Intel® IPP threading layer APIs. User’s application is recommended to use the new Intel® IPP threading layer APIs or implement the external threading in their applications.

System Requirements

For information about the Intel® IPP system requirements, please visit Intel® Integrated Performance Primitives (Intel® IPP) 2018 System Requirements page.

2017

Update 3

Release Notes

What's New in Intel® IPP 2017 Update 3:

  • Fixed some known problems in Intel® IPP Cryptography functions
  • Added support for Microsoft Visual Studio* 2017 on Windows*.
  • Added Support for Conda* repositories installation.
Update 2

Release Notes

What's New in Intel® IPP 2017 Update 2:

  • Added the new functions in ZLIB to support the user-defined Huffman tables, which allows to increase ZLIB compression ratio at the fastest compression level.
  • Increased LZO compression performance by 20% t0 50%. Added level 999 support in LZO decompression.
  • Introduced support for Intel® Xeon Phi™ processor x200 (formerly Knights Landing) leverage boot mode in the Intel IPP examples.
  • Added an example code on building custom dispatcher for the processor-specific optimization codes.
  • Fixed a number of internal and external defects. Visit the Intel® IPP 2017 bug fixes for more information.
Update 1

Release Notes

What's New in Intel® IPP 2017 Update 1:

  • Added Support of Intel® Xeon Phi™ processor x200 (formerly Knights Landing) leverage boot mode on Windows.
  • Added the following new functions in the cryptography domain:
    • Added functions for the finite field GF(p) arithmetic, and the elliptic curves over the finite field GF(p).
    • Added ippsECCPBindGxyTblStd functions that allow to control memory size for the elliptic curves over GF(p).
  • Fixed a number of internal and external defects. Visit the Intel® IPP 2017 bug fixes for more information.
Initial Release

Release Notes

What's New in Intel® IPP 2017:

  • Added Support of Intel® Xeon Phi™ processor x200 (formerly Knights Landing) leverage boot mode on Windows.
  • Added Intel® IPP Platform-Aware APIs to support 64-bit parameters for image dimensions and vector length on 64-bit platforms and 64-bit operating systems:
    • This release provides 64-bit data length support in the memory allocation, data sorting, image resizing, and image arithmetic functions.
    • Intel® IPP Platform-Aware APIs support external tiling and threading by processing tiled images, which enables you to create effective parallel pipelines at the application level.
  • Introduced new Integration Wrappers APIs for some image processing and computer vision functions as a technical preview. The wrappers provide the easy-to-use C and C++ APIs for Intel® IPP functions, and they are available as a separate download in the form of source and pre-built binaries.
  • Performance and Optimization:
    • Extended optimization for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set on Intel® Many Integrated Core Architectures (Intel® MIC Architectures). Please see the Intel® IPP Functions Optimized for Intel® AVX-512 article for more information.
    • Extended optimization for Intel® AVX-512 instruction set on Intel® Xeon® processors.
    • Extended optimization for Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction set on the 6th Generation Intel® Core™ processors. Please see the Intel® IPP Functions Optimized for Intel® AVX2 article for more information.
    • Extended optimization for Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) instruction set on Intel® Atom™ processors.
  • Data Compression:
    • Added the patch files for the zlib source to provide drop-in optimization with Intel® IPP functions. The patches now supports zlib version 1.2.5.3, 1.2.6.1, 1.2.7.3 and 1.2.8.
    • Significantly improved performance of zlib compression functions on the standard compression modes.
    • Introduced a new fastest zlib data compression mode, which can significantly improve compression performance with only a small sacrifice in compression ratio.
  • Signal Processing:
    • Added the ippsIIRIIR functions that perform zero-phase digital IIR filtering.
    • Added 64-bit data length support to the ippsSortRadixAscend and ippsSortRadixDescend functions.
    • Added unsigned integer data support to the ippsSortRadixAscend, ippsSortRadixDescend, ippsSortRadixIndexAscend and ippsSortRadixIndexDescend functions.
  • Image Processing:
    • Added the ippiScaleC functions to support image data scaling and shifting for different data types by using 64-bit floating multiplier and offset.
    • Added the ippiMulC64f functions to support image data multiplication by a 64-bit floating point value.
  • Removed the tutorial from the installation package, and its sample code and documentation are now provided online.
  • Threading Notes: Though Intel® IPP threaded libraries are not installed by default, these threaded libraries are available by the custom installation, so the code written with these libraries will still work as before. However, the multi-threaded libraries are deprecated and moving to external threading is recommended. Your feedback on this is welcome.
  • Installation on IA-32 architecture hosts is no longer supported, and the Intel IPP packages for Intel® 64 architecture hosts include both 64-bit and 32-bit Intel IPP libraries.

9.0

Update 4

Release Notes

What's New in Intel® IPP 9.0 Update 4:

Update 3

Release Notes

What's New in Intel® IPP 9.0 Update 3:

  • Improved zlib decompression performance for small data for Intel® 64 architectures.
  • Fixed a number of internal and external defects,including the memory corruption problem on ippiSet_16u_C1R functions.
Update 2

Release Notes

What's New in Intel® IPP 9.0 Update 2:

  • Image Processing:
    • Added the contiguous volume format (C1V) support to the following 3D data processing functions: ipprWarpAffine, ipprRemap, and ipprFilter.
    • Added the ippiFilterBorderSetMode function to support high accuracy rounding mode in ippiFilterBorder.
    • Added the ippiCopyMirrorBorder function for copying the image values by adding the mirror border pixels.
    • Added mirror border support to the following filtering functions: ippiFilterBilateral, ippiFilterBoxBorder, ippiFilterBorder, ippiFilterSobel, and ippiFilterScharr.
    • Kernel coefficients in the ippiFilterBorder image filtering functions are used in direct order, which is different from the ippiFilter functions in the previous releases.
  • Computer Vision:
    • Added 32-bit floating point input data support to the ippiSegmentWatershed function.
    • Added mirror border support to the following filtering functions: ippiFilterGaussianBorder, ippiFilterLaplacianBorder, ippiMinEigenVal, ippiHarrisCorner, ippiPyramidLayerDown, and ippiPyramidLayerUp.
  • Signal Processing:
    • Added the ippsThreshold_LTAbsVal function, which uses the vector absolute value.
    • Added the ippsIIRIIR64f functions to perform zero-phase digital IIR filtering.
  • The multi-threaded libraries only depend on the Intel® OpenMP* libraries; their dependencies on the other Intel® Compiler runtime libraries were removed.
  • Fixed a number of internal and external defects.
Update 1

Release Notes

What's New in Intel® IPP 9.0 Update 1:

  • Enabled stack protection to enhance security of the Intel® IPP functions at Linux*. To link with Intel® IPP libraries, glibc version 2.4 or higher is now required.
  • Added the following new functions in the Signal Processing, Color Conversion and Cryptography domains:
    • The in-place functions on normalizing the elements of a vector: ippsNormalize.
    • The functions on computing the minimum or maximum absolute value of a vector: ippsMinAbs and ippsMaxAbs.
    • The functions on BGR to YCbCr420 color format conversion: ippiBGRToYCbCr420.
    • The functions on pseudorandom number generation optimized by the Intel RDRAND instruction.
  • Optimized the following functions on Intel® Advanced Vector Extensions 2 (Intel® AVX2) both for Intel® 64 and IA-32 Architectures:
    • Signal Processing: ippsSumLn, ippsNormalize, ippsMinAbs, and ippsMaxAbs.
    • Image Processing: ippiConvert_32s16s, ippiHOG_16s32f_C1R, and ippiSwapChannels_32s_C3C4R.
    • Color Conversion: ippiColorToGray and YCbCr to RGB/BGR conversion functions.
  • Improved the LZO decompression function ippsDecodeLZO performance for Intel® Quark™ processors.
  • Fixed the position-independent code (PIC) problem in the Linux* dynamic libraries. The share libraries now provide the full PIC symbols.
Initial Release

Release Notes

What's New in Intel® IPP 9.0:

  • Extended optimization for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set in the Computer Vision and Image Processing functions.
  • Extended optimization for Intel® Atom™ processors in the Computer Vision and Image Processing functions.
  • Added optimization for Intel® Quark™ processors to the Cryptography and Data Compression functions.
  • Introduced the new Cryptography functions for SM2/SM3/SM4 algorithms.
  • Added a custom dynamic library building tool, which enables users to build the dynamic library containing the selected Intel® IPP functions.
  • Added the new APIs to support external threading.
  • Improved the CPU dispatcher by using target processor features instead of processor types. The static linkage does not require to explicitly call the processor initialization function now.
  • Provided the new native libraries for 64-bit Android* applications, and replaced the old ones from the Linux* binary.
  • Removed internal memory allocation in the single-threaded libraries.
  • The single-threaded libraries removed the dependency on the Intel® Compiler runtime libraries. The multi-threaded libraries only depends on the Intel® OpenMP* libraries.
  • Image Processing domain changes:
    • New implementation of perspective warping functions.
    • New image filtering functions with border support: ippiFilterMedianBorder, ippiFilterMaxBorder, ippiFilterMinBorder, ippiFilterLaplaceBorder, ippiFilterHipassBorder, ippiFilterSharpenBorder.
    • New implementation of image rotation functionality with the ippiGetRotateTransform and ippiWarpAffine APIs.
    • 3D data processing functions are now available in the image processing domain.
  • Some Intel IPP domains and functions are now legacy:
    • The following Intel® IPP domains are legacy, and they are removed from the main packages: Audio Coding (ippAC), Video Coding (ippVC), Speech Coding (ippSC), Image Compression (ippJP), Data Integrity (ippDI), Generated Transforms (ippGEN), Small Matrices (ippMX), and Realistic Rendering (ippRR). That means these domains won't be optimized for new architectures (the latest optimizations are targeted for Intel® Advanced Vector Extensions 2) and any newly detected performance and stability issues won't be fixed. Find the alternative suggestion for the deprecated functions.
    • Some Intel IPP functions, including the functions for internal memory allocation, are deprecated in the main package. See the alternatives for the deprecated functions.
  • Fixed a number of internal and external defects.

Intel® Hardware Accelerated Execution Manager (Intel® HAXM)

$
0
0

Intel® Hardware Accelerated Execution Manager (Intel® HAXM) is a hardware-assisted virtualization engine (hypervisor) that uses Intel® Virtualization Technology (Intel® VT) to speed up Android* app emulation on a host machine. In combination with Android x86 emulator images provided by Intel and the official Android SDK Manager, Intel HAXM allows for faster Android emulation on Intel VT enabled systems.

The following platforms are supported by Intel HAXM:

Microsoft Windows*
Windows® 10 (32/64-bit), Windows 8* and 8.1* (32/64-bit), Windows 7* (32/64-bit)

Installation Guide and System Requirements - Windows

haxm-windows_v6_2_1.zip (6.2.1)

Description:
System Driver
(Aug 21, 2017)

Size: 2,632 KB

Checksums:
(MD5)
4c72a68333ff1c98cb8b129506b12435
(SHA-1)
b943fee2ed8e10a5d9ed7d4368650cbf93522908

 

macOS*
Mac* OS X® 10.10 (Yosemite) or higher, up to macOS* 10.12 (Sierra)

Installation Guide and System Requirements – macOS

haxm-macosx_v6_2_1.zip (6.2.1)

Description:
System Driver
(Aug 21, 2017)

Size: 217 KB

Checksums:
(MD5)
7246d2ce3237531829e9aaad8a8a4639
(SHA-1)
8d787efadcf518e94de080ad5919424b1afc84af

 

Linux*

Installation Guide and System Requirements - Linux

Vectorization and Array Contiguity with the Intel® Fortran Compiler

$
0
0

Subroutine dummy arguments can be pointers or assumed shape arrays, e.g.:

SUBROUTINE SUB(X, Y)
    REAL, DIMENSION(:)          :: X  ! assumed shape array
    REAL, DIMENSION(:), POINTER :: Y  ! pointer

This avoids the need to pass parameters such as array bounds explicitly. The Fortran standard allows the actual arguments to be non-contiguous array sections or pointers, e.g.:

    CALL SUB(A(1:100:10)    ! non-unit stride
    CALL SUB(B(2:4,:)       ! incomplete columns

Therefore, the compiler cannot assume that consecutive elements of X and Y are adjacent in memory and so cannot blindly issue efficient vector loads for several elements at once.

    If you know that such dummy arguments will always be contiguous in memory, you can use the CONTIGUOUS keyword to tell the compiler and it will generate more efficient code, e.g.:

  REAL, DIMENSION(:), CONTIGUOUS          :: X  ! assumed shape array
  REAL, DIMENSION(:), CONTIGUOUS, POINTER :: Y  ! pointer

However, the calling routine also needs to know that the arrays are contiguous (see https://software.intel.com/en-us/videos/effective-parallel-optimizations-with-intel-fortran). When multiple routines are involved, it may be simpler to use a command line switch to tell the compiler that assumed shape arrays and/or pointers are always contiguous. The version 18 compiler supports the new options:

   -assume contiguous_assumed_shape and  -assume contiguous_pointer (Linux*)
   /assume:contiguous_assumed_shape and  /assume_contiguous_pointer (Windows*)

These will cause the compiler to assume that all such objects are contiguous in memory.
In some cases where contiguity is unknown, the version 18 compiler may generate alternative code versions for the contiguous and non-contiguous cases and check the stride at run-time to determine which version to execute.

Consider the following example (shown for Linux but applicable to other OS):

  subroutine sub(a,b)
    real, pointer, dimension(:) :: a,b
    integer :: i,n
   
    n = size(a,1)
!$OMP SIMD  
    do i=1,n
       a(i) = log(b(i))
    enddo
  end subroutine sub

ifort -c -qopt-report=3 -qopt-report-file=stderr sub.f90

LOOP BEGIN at sub.f90(7,5)
   remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
   remark #15346: vector dependence: assumed ANTI dependence between b%2e%2e(0) (8:15) and a(i) (8:8)
LOOP END

When compiled without OpenMP, the loop is not vectorized because the compiler must assume that the pointers A and B might alias each other (the data they point to might overlap). This can be overcome by activating the OpenMP SIMD directive with -qopenmp-simd, which tells the compiler it can assume there is no overlap and no dependency:

ifort -c -qopenmp-simd -qopt-report=3 -qopt-report-file=stderr sub.f90

LOOP BEGIN at sub.f90(7,5)
   remark #15328: vectorization support: non-unit strided load was emulated for the variable <b(i)>, stride is unknown to compiler   [ sub.f90(8,19) ]
   remark #15329: vectorization support: non-unit strided store was emulated for the variable <a(i)>, stride is unknown to compiler   [ sub.f90(8,8) ]
   remark #15305: vectorization support: vector length 4
   remark #15309: vectorization support: normalized vectorization overhead 0.007
   remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
   remark #15452: unmasked strided loads: 1
   remark #15453: unmasked strided stores: 1
   remark #15475: --- begin vector cost summary ---
   remark #15476: scalar cost: 106
   remark #15477: vector cost: 35.500
   remark #15478: estimated potential speedup: 2.980
   remark #15482: vectorized math library calls: 1
   remark #15488: --- end vector cost summary ---
LOOP END

The loop has been vectorized successfully, with an estimated speed-up, but the compiler had to generate non-unit-strided loads and stores because it did not know whether X and Y were contiguous.

If we assert to the compiler that the pointer arguments are contiguous:

ifort -c -qopenmp-simd -assume contiguous_pointer -qopt-report=4  -qopt-report-file=stderr sub.f90

LOOP BEGIN at sub.f90(7,5)
   remark #15389: vectorization support: reference b(i) has unaligned access   [ sub.f90(8,19) ]
   remark #15388: vectorization support: reference a(i) has aligned access   [ sub.f90(8,8) ]
   remark #15381: vectorization support: unaligned access used inside loop body
   remark #15305: vectorization support: vector length 4
   remark #15309: vectorization support: normalized vectorization overhead 0.179
   remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
   remark #15442: entire loop may be executed in remainder
   remark #15449: unmasked aligned unit stride stores: 1
   remark #15450: unmasked unaligned unit stride loads: 1
   remark #15475: --- begin vector cost summary ---
   remark #15476: scalar cost: 106
   remark #15477: vector cost: 19.500
   remark #15478: estimated potential speedup: 5.120
   remark #15482: vectorized math library calls: 1
   remark #15488: --- end vector cost summary ---
LOOP END

The compiler is able to vectorize the loop using unit stride loads and stores and the estimated speed-up increases accordingly. (Note that this is only an estimate based on what is known at compile time; the actual speed-up is influenced by many factors, such as data location and alignment, and can be substantially different). Using the CONTIGUOUS keyword instead of the command line switch would have the same effect.

In conclusion, if you know that pointer arrays or assumed shape dummy arguments will always correspond to a contiguous space in memory, you can help the compiler to vectorize more efficiently my telling it so. Use either the CONTIGUOUS keyword or the command line switches  -assume contiguous_assumed_shape (/assume:contiguous_assumed_shape) or -assume contiguous_pointer (/assume:contiguous_pointer) which are new in the version 18 compiler.

Intel® Math Kernel Library Release Notes and New Features

$
0
0

This page provides the current Release Notes for Intel® Math Kernel Library. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

Please see the following links to the online resources and documents for the latest information regarding Intel MKL:

2018

Initial Release

What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018

  • BLAS Features:
    • Introduced compact GEMM and TRSM functions (mkl_{s,d,c,z}gemm_compact and mkl_{s,d,c,z}trsm_compact) to work on groups of matrices in compact format and service functions to support the new format
    • Introduced optimized integer matrix-matrix multiplication routines GEMM_S8U8S32 and GEMM_S16S16S32 to work with quantized matrices for all architectures.
  • BLAS Optimizations: 
    • Optimized SGEMM and SGEMM packed for Intel® Xeon Phi™  processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instructions
    • Optimized GEMM_S8U8S32 and GEMM_S16S16S32 for AVX2, AVX512 and Intel® Xeon Phi™ processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups
  • Deep Neural Network:
    • Added support for non-square pooling kernels
    • Improved performance of large non-square kernels on Intel® Xeon Phi™ processors
    • Optimized conversions between plain (nchw, nhwc) and internal data layouts
  • LAPACK:
    • Added the following improvements and optimizations for small matrices (N<16):
      • Direct Call feature extended with Cholesky and QR factorizations providing significant performance boost
      • Introduced LU and Inverse routines without pivoting with significantly better performance: mkl_?getrfnp and mkl_?getrinp
      • Introduced Compact routines for much faster solving of multiple matrices packed together: mkl_?getr[f|i]np_compact, mkl_?potrf_compact and mkl_?geqrf_compact
    • Added ?gesvd, ?geqr/?gemqr, ?gelq/?gemlq  optimizations for tall-and-skinny/short-and-wide matrice
    • Added optimizations for ?pbtrs routine
    • Added optimizations for ?potrf routine for Intel® Threading Building Blocks layer      
    • Added optimizations for CS decomposition routines: ?dorcsd and ?orcsd2by1
    • Introduced factorization and solve routines based on Aasen's algorithm: ?sytrf_aa/?hetrf_aa, ?sytrs_aa/?hetrs_aa
    • Introduced new (faster)_rk routines for symmetric indefinite (or Hermitian indefinite) factorization with bounded Bunch-Kaufman (rook) pivoting algorithm
  • ScaLAPACK:
    • Added optimizations (2-stage band reduction) for p?syevr/p?heevr routines for JOBZ=’N’ (eigenvalues only) case
  • FFT:
    • Introduced Verbose support for FFT domain, which enables users to capture the FFT descriptor information for Intel MKL
    • Improved performance for 2D real-to-complex and complex-to-real for  Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and Intel® Xeon Phi™ processor 72** ( formerly Knights Landing)
    • Improved performance for 3D complex-to-complex for  Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and Intel® Xeon Phi™ processor 72** ( formerly Knights Landing)
  • Intel® Optimized High Performance Conjugate Gradient Benchmark:         
    • New version of benchmark with Intel® MKL API
  • Sparse BLAS:
    • Introduced Symmetric Gauss-Zeidel preconditioner
    • Introduced Symmetric Gauss-Zeidel preconditioner with ddot calculation of resulted and initial arrays
    • Sparse Matvec routine with ddot calculation of resulted and initial arrays
    • Sparse Syrk routine with both OpenMP and Intel® Threading Building Block support
    • Improved performance of Sparse MM and MV functionality for Intel® AVX-512 Instruction Set
  • Direct Sparse Solver for Cluster:
    • Add support of transpose solver
  • Vector Mathematics:
    • Added 24 new functions: v?Fmod, v?Remainder, v?Powr, v?Exp2; v?Exp10; v?Log2; v?Logb; v?Cospi; v?Sinpi; v?Tanpi; v?Acospi; v?Asinpi; v?Atanpi; v?Atan2pi; v?Cosd; v?Sind; v?Tand; v?CopySign; v?NextAfter; v?Fdim; v?Fmax; v?Fmin; v?MaxMag and v?MinMag including optimizations for processors based on Intel(R) Advanced Vector Extensions 512 (Intel® AVX-512)
  • Data Fitting:
    • Cubic spline-based interpolation in ILP64 interface was optimized up to 8x times on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and 2.5x on Intel® Xeon Phi™ processor 72** (formerly Knights Landing) 
  • Documentation:
    • Starting with this version of Intel® MKL, most of the documentation for Parallel Studio XE is only available online at https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation. You can also download it from the Intel Registration Center > Product List > Intel® Parallel Studio XE Documentation
  • Intel continually evaluates the markets for our products in order to provide the best possible solutions to our customer’s challenges. As part of this on-going evaluation process Intel has decided to not offer Intel® Xeon Phi™ 7200 Coprocessor (codenamed Knights Landing Coprocessor) products to the market.
    • Given the rapid adoption of Intel® Xeon Phi™ 7200 processors, Intel has decided to not deploy the Knights Landing Coprocessor to the general market.
    • Intel® Xeon Phi™ Processors remain a key element of our solution portfolio for providing customers the most compelling and competitive solutions possible.
  • Support for the Intel® Xeon Phi™ x100 product family coprocessor (formerly code name Knights Corner) is removed in this release. The Intel® Xeon Phi™ x100 product family coprocessor (formerly code name Knights Corner) was officially announced end of life in January 2017.  As part of the end of life process, the support for this family will only be available in the Intel® Parallel Studio XE 2017 version.  Intel® Parallel Studio XE 2017 will be supported for a period of 3 years ending in January 2020 for the Intel® Xeon Phi™ x100 product family.  Support will be provided for those customers with active support.

Product Content

Intel MKL can be installed as a part of the following suite:

Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer

Known Issues

  • Convolution primitives for forward pass may return incorrect results or crashes for the case where input spatial dimensions smaller than kernel spatial dimensions for Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
  • Intel® MKL FFT – complex-to-complex in-place batched 1D FFT with transposed output returns incorrect output
  • Intel® ScaLAPACK may fail with OpenMPI* 1.6.1 and later releases due to known OpenMPI* issue: https://github.com/open-mpi/ompi/issues/3937. As a workaround, please avoid using OpenMPI
  • Intel® VML functions may raise spurious FP exceptions even if the (default) ML_ERRMODE_EXCEPT is not set. Recommendation: do not unmask FP exceptions before calling VML functions.
  • When an application uses Vector Math functions with the single dynamic library (SDL) interface combined with TBB threading layer, the application may generate runtime error “Intel MKL FATAL ERROR: Error on loading function mkl_vml_serv_threader_c_1i_2o.”

Intel® MPI Library Release Notes for Windows* OS

$
0
0

Overview

Intel® MPI Library is a multi-fabric message passing library based on ANL* MPICH3* and OSU* MVAPICH2*.

Intel® MPI Library implements the Message Passing Interface, version 3.1 (MPI-3) specification. The library is thread-safe and provides the MPI standard compliant multi-threading support.

To receive technical support and updates, you need to register your product copy. See Technical Support below.

Product Contents

  • The Intel® MPI Library Runtime Environment (RTO) contains the tools you need to run programs including scalable process management system (Hydra), supporting utilities, and dynamic libraries.
  • The Intel® MPI Library Development Kit (SDK) includes all of the Runtime Environment components and compilation tools: compiler wrapper scripts (mpicc, mpiicc, etc.), include files and modules, static libraries, debug libraries, and test codes.

What's New

Intel® MPI Library 2018

  • Deprecated support for the IPM statistics format.
  • Hard finalization is now the default.
  • Documentation has been removed from the product and is now available online.

Intel® MPI Library 2017 Update 2

  • Added an environment variable I_MPI_HARD_FINALIZE.

Intel® MPI Library 2017 Update 1

  • Support for topology-aware collective communication algorithms (I_MPI_ADJUST family).
  • Deprecated support for cross-OS launches.

Intel® MPI Library 2017

  • Support for the MPI-3.1 standard.
  • Removed the SMPD process manager.
  • Removed the SSHM support.
  • Deprecated support for the Intel® microarchitectures older than the generation codenamed Sandy Bridge.
  • Bug fixes and performance improvements.
  • Documentation improvements.

Key Features

  • MPI-1, MPI-2.2 and MPI-3.1 specification conformance.
  • MPICH ABI compatibility.
  • Support for any combination of the following network fabrics:
    • RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*.
    • Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects.
  • (SDK only) Support for Intel® 64 architecture clusters using:
    • Intel® C++/Fortran Compiler 14.0 and newer.
    • Microsoft* Visual C++* Compilers.
  • (SDK only) C, C++, Fortran 77, and Fortran 90 language bindings.
  • (SDK only) Dynamic linking.

System Requirements

Hardware Requirements

  • Systems based on the Intel® 64 architecture, in particular:
    • Intel® Core™ processor family
    • Intel® Xeon® E5 v4 processor family recommended
    • Intel® Xeon® E7 v3 processor family recommended
  • 1 GB of RAM per core (2 GB recommended)
  • 1 GB of free hard disk space

Software Requirements

  • Operating systems:
    • Microsoft* Windows Server* 2008, 2008 R2, 2012, 2012 R2, 2016
    • Microsoft* Windows* 7, 8.x, 10
  • (SDK only) Compilers:
    • Intel® C++/Fortran Compiler 15.0 or newer
    • Microsoft* Visual Studio* Compilers 2013, 2015, 2017
  • Batch systems:
    • Microsoft* Job Scheduler
    • Altair* PBS Pro* 9.2 or newer
  • Recommended InfiniBand* software:
    • Windows* OpenFabrics* (WinOF*) 2.0 or newer
    • Windows* OpenFabrics* Enterprise Distribution (winOFED*) 3.2 RC1 or newer for Microsoft* Network Direct support
    • Mellanox* WinOF* Rev 4.40 or newer
  • Additional software:
    • The memory placement functionality for NUMA nodes requires the libnuma.so library and numactl utility installed. numactl should include numactlnumactl-devel and numactl-libs.

Known Issues and Limitations

  • Cross-OS runs using ssh from a Windows* host fail. Two workarounds exist:
    • Create a symlink on the Linux* host that looks identical to the Windows* path to pmi_proxy.
    • Start hydra_persist on the Linux* host in the background (hydra_persist &) and use -bootstrap service from the Windows* host. This requires that the Hydra service also be installed and started on the Windows* host.
  • Support for Fortran 2008 is not implemented in Intel® MPI Library for Windows*.
  • Enabling statistics gathering may result in increased time in MPI_Finalize.
  • In order to run a mixed OS job (Linux* and Windows*), all binaries must link to the same single or multithreaded MPI library.  The single- and multithreaded libraries are incompatible with each other and should not be mixed. Note that the pre-compiled binaries for the Intel® MPI Benchmarks are inconsistent (Linux* version links to multithreaded, Windows* version links to single threaded) and as such, at least one must be rebuilt to match the other.
  • If a communication between two existing MPI applications is established using the process attachment mechanism, the library does not control whether the same fabric has been selected for each application. This situation may cause unexpected applications behavior. Set the I_MPI_FABRICS variable to the same values for each application to avoid this issue.
  • If your product redistributes the mpitune utility, provide the msvcr71.dll library to the end user.
  • The Hydra process manager has some known limitations such as:
    • stdin redirection is not supported for the -bootstrap service option.
    • Signal handling support is restricted. It could result in hanging processes in memory in case of incorrect MPI job termination.
    • Cleaning up the environment after an abnormal MPI job termination by means of mpicleanup utility is not supported.
  • ILP64 is not supported by MPI modules for Fortran 2008.
  • When using the -mapall option, if some of the network drives require a password and it is different from the user password, the application launch may fail.

Technical Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides priority customer support at our Online Support Service Center web site, http://www.intel.com/supporttickets.

In order to get support you need to register your product in the Intel® Registration Center. If your product is not registered, you will not receive priority support.

Intel® MPI Library Release Notes for Linux* OS

$
0
0

Overview

Intel® MPI Library is a multi-fabric message passing library based on ANL* MPICH3* and OSU* MVAPICH2*.

Intel® MPI Library implements the Message Passing Interface, version 3.1 (MPI-3) specification. The library is thread-safe and provides the MPI standard compliant multi-threading support.

To receive technical support and updates, you need to register your product copy. See Technical Support below.

Product Contents

  • The Intel® MPI Library Runtime Environment (RTO) contains the tools you need to run programs including scalable process management system (Hydra), supporting utilities, and shared (.so) libraries.
  • The Intel® MPI Library Development Kit (SDK) includes all of the Runtime Environment components and compilation tools: compiler wrapper scripts (mpicc, mpiicc, etc.), include files and modules, static (.a) libraries, debug libraries, and test codes.

What's New

Intel® MPI Library 2018

  • Improved startup times for Hydra when using shm:ofi or shm:tmi.
  • Hard finalization is now the default.
  • The default fabric list is changed when Intel® Omni-Path Architecture is detected.
  • Added environment variables: I_MPI_OFI_ENABLE_LMT, I_MPI_OFI_MAX_MSG_SIZE, I_MPI_{C,CXX,FC,F}FLAGS, I_MPI_LDFLAGS, I_MPI_FORT_BIND.
  • Removed support for the Intel® Xeon Phi™ coprocessor (code named Knights Corner).
  • Deprecated support for the IPM statistics format.
  • Documentation is now online.

Intel® MPI Library 2017 Update 3

  • Hydra startup improvements (I_MPI_JOB_FAST_STARTUP).
  • Default value change for I_MPI_FABRICS_LIST.

Intel® MPI Library 2017 Update 2

  • Added environment variables I_MPI_HARD_FINALIZE and I_MPI_MEMORY_SWAP_LOCK.

Intel® MPI Library 2017 Update 1

  • PMI-2 support for SLURM*, improved SLURM support by default.
  • Improved mini help and diagnostic messages, man1 pages for mpiexec.hydra, hydra_persist, and hydra_nameserver.
  • Deprecations:
    • Intel® Xeon Phi™ coprocessor (code named Knights Corner) support.
    • Cross-OS launches support.
    • DAPL, TMI, and OFA fabrics support.

Intel® MPI Library 2017

  • Support for the MPI-3.1 standard.
  • New topology-aware collective communication algorithms (I_MPI_ADJUST family).
  • Effective MCDRAM (NUMA memory) support. See the Developer Reference, section Tuning Reference > Memory Placement Policy Control for more information.
  • Controls for asynchronous progress thread pinning (I_MPI_ASYNC_PROGRESS).
  • Direct receive functionality for the OFI* fabric (I_MPI_OFI_DRECV).
  • PMI2 protocol support (I_MPI_PMI2).
  • New process startup method (I_MPI_HYDRA_PREFORK).
  • Startup improvements for the SLURM* job manager (I_MPI_SLURM_EXT).
  • New algorithm for MPI-IO collective read operation on the Lustre* file system (I_MPI_LUSTRE_STRIPE_AWARE).
  • Debian Almquist (dash) shell support in compiler wrapper scripts and mpitune.
  • Performance tuning for processors based on Intel® microarchitecture codenamed Broadwell and for Intel® Omni-Path Architecture (Intel® OPA).
  • Performance tuning for Intel® Xeon Phi™ Processor and Coprocessor (code named Knights Landing) and Intel® OPA.
  • OFI latency and message rate improvements.
  • OFI is now the default fabric for Intel® OPA and Intel® True Scale Fabric.
  • MPD process manager is removed.
  • Dedicated pvfs2 ADIO driver is disabled.
  • SSHM support is removed.
  • Support for the Intel® microarchitectures older than the generation codenamed Sandy Bridge is deprecated.
  • Documentation improvements.

Key Features

  • MPI-1, MPI-2.2 and MPI-3.1 specification conformance.
  • Support for Intel® Xeon Phi™ processors (formerly code named Knights Landing).
  • MPICH ABI compatibility.
  • Support for any combination of the following network fabrics:
    • Network fabrics supporting Intel® Omni-Path Architecture (Intel® OPA) devices, through either Tag Matching Interface (TMI) or OpenFabrics Interface* (OFI*).
    • Network fabrics with tag matching capabilities through Tag Matching Interface (TMI), such as Intel® True Scale Fabric, Infiniband*, Myrinet* and other interconnects.
    • Native InfiniBand* interface through OFED* verbs provided by Open Fabrics Alliance* (OFA*).
    • Open Fabrics Interface* (OFI*).
    • RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*.
    • Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects.
  • (SDK only) Support for Intel® 64 architecture and Intel® MIC Architecture clusters using:
    • Intel® C++/Fortran Compiler 14.0 and newer.
    • GNU* C, C++ and Fortran 95 compilers.
  • (SDK only) C, C++, Fortran 77, Fortran 90, and Fortran 2008 language bindings.
  • (SDK only) Dynamic or static linking.

System Requirements

Hardware Requirements

  • Systems based on the Intel® 64 architecture, in particular:
    • Intel® Core™ processor family
    • Intel® Xeon® E5 v4 processor family recommended
    • Intel® Xeon® E7 v3 processor family recommended
    • 2nd Generation Intel® Xeon Phi™ Processor (formerly code named Knights Landing)
  • 1 GB of RAM per core (2 GB recommended)
  • 1 GB of free hard disk space

Software Requirements

  • Operating systems:
    • Red Hat* Enterprise Linux* 6, 7
    • Fedora* 23, 24
    • CentOS* 6, 7
    • SUSE* Linux Enterprise Server* 11, 12
    • Ubuntu* LTS 14.04, 16.04
    • Debian* 7, 8
  • (SDK only) Compilers:
    • GNU*: C, C++, Fortran 77 3.3 or newer, Fortran 95 4.4.0 or newer
    • Intel® C++/Fortran Compiler 15.0 or newer
  • Debuggers:
    • Rogue Wave* Software TotalView* 6.8 or newer
    • Allinea* DDT* 1.9.2 or newer
    • GNU* Debuggers 7.4 or newer
  • Batch systems:
    • Platform* LSF* 6.1 or newer
    • Altair* PBS Pro* 7.1 or newer
    • Torque* 1.2.0 or newer
    • Parallelnavi* NQS* V2.0L10 or newer
    • NetBatch* v6.x or newer
    • SLURM* 1.2.21 or newer
    • Univa* Grid Engine* 6.1 or newer
    • IBM* LoadLeveler* 4.1.1.5 or newer
    • Platform* Lava* 1.0
  • Recommended InfiniBand* software:
    • OpenFabrics* Enterprise Distribution (OFED*) 1.5.4.1 or newer
    • Intel® True Scale Fabric Host Channel Adapter Host Drivers & Software (OFED) v7.2.0 or newer
    • Mellanox* OFED* 1.5.3 or newer
  • Virtual environments:
    • Docker* 1.13.0
  • Additional software:
    • The memory placement functionality for NUMA nodes requires the libnuma.so library and numactl utility installed. numactl should include numactlnumactl-devel and numactl-libs.

Known Issues and Limitations

  • The I_MPI_JOB_FAST_STARTUP variable takes effect only when shm is selected as the intra-node fabric.
  • ILP64 is not supported by MPI modules for Fortran* 2008.
  • In case of program termination (like signal), remove trash in the /dev/shm/ directory manually with:
    rm -r /dev/shm/shm-col-space-*
  • In case of large number of simultaneously used communicators (more than 10,000) per node, it is recommended to increase the maximum numbers of memory mappings with one of the following methods:
    • echo 1048576 > /proc/sys/vm/max_map_count
    • sysctl -w vm.max_map_count=1048576
    • disable shared memory collectives by setting the variable: I_MPI_COLL_INTRANODE=pt2pt
  • On some Linux* distributions Intel® MPI Library may fail for non-root users due to security limitations. This was observed on Ubuntu* 12.04, and could impact other distributions and versions as well. Two workarounds exist:
    • Enable ptrace for non-root users with:
      echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
    • Revert the Intel® MPI Library to an earlier shared memory mechanism, which is not impacted, by setting: I_MPI_SHM_LMT=shm
  • Ubuntu* does not allow attaching a debugger to a non-child process. In order to use -gdb, this behavior must be disabled by setting the sysctl value in /proc/sys/kernel/yama/ptrace_scope to 0.
  • Cross-OS runs using ssh from a Windows* host fail. Two workarounds exist:
    • Create a symlink on the Linux* host that looks identical to the Windows* path to pmi_proxy.
    • Start hydra_persist on the Linux* host in the background (hydra_persist &) and use -bootstrap service from the Windows* host. This requires that the Hydra service also be installed and started on the Windows* host.
  • The OFA fabric and certain DAPL providers may not work or provide worthwhile performance with the Intel® Omni-Path Fabric. For better performance, try choosing the OFI or TMI fabric.
  • Enabling statistics gathering may result in increased time in MPI_Finalize.
  • In systems where some nodes have only Intel® True Scale Fabric or Intel® Omni-Path Fabric available, while others have both Intel® True Scale and e.g. Mellanox* HCAs, automatic fabric detection will lead to a hang or failure, as the first type of nodes will select ofi/tmi, and the second type will select dapl as the internode fabric. To avoid this, explicitly specify a fabric that is available on all the nodes.
  • In order to run a mixed OS job (Linux* and Windows*), all binaries must link to the same single or multithreaded MPI library.  The single- and multithreaded libraries are incompatible with each other and should not be mixed. Note that the pre-compiled binaries for the Intel® MPI Benchmarks are inconsistent (Linux* version links to multithreaded, Windows* version links to single threaded) and as such, at least one must be rebuilt to match the other.
  • Intel® MPI Library does not support using the OFA fabric over an Intel® Symmetric Communications Interface (Intel® SCI) adapter. If you are using an Intel SCI adapter, such as with Intel® Many Integrated Core Architecture, you will need to select a different fabric.
  • The TMI and OFI fabrics over PSM do not support messages larger than 232 - 1 bytes. If you have messages larger than this limit, select a different fabric.
  • If a communication between two existing MPI applications is established using the process attachment mechanism, the library does not control whether the same fabric has been selected for each application. This situation may cause unexpected applications behavior. Set the I_MPI_FABRICS variable to the same values for each application to avoid this issue.
  • Do not load thread-safe libraries through dlopen(3).
  • Certain DAPL providers may not function properly if your application uses system(3), fork(2), vfork(2), or clone(2) system calls. Do not use these system calls or functions based upon them. For example, system(3), with OFED* DAPL provider with Linux* kernel version earlier than official version 2.6.16. Set the RDMAV_FORK_SAFE environment variable to enable the OFED workaround with compatible kernel version.
  • MPI_Mprobe, MPI_Improbe, and MPI_Cancel are not supported by the TMI and OFI fabrics.
  • You may get an error message at the end of a checkpoint-restart enabled application, if some of the application processes exit in the middle of taking a checkpoint image. Such an error does not impact the application and can be ignored. To avoid this error, set a larger number than before for the -checkpoint-interval option. The error message may look as follows:
    [proxy:0:0@hostname] HYDT_ckpoint_blcr_checkpoint (./tools/ckpoint/blcr/
    ckpoint_blcr.c:313): cr_poll_checkpoint failed: No such process
    [proxy:0:0@hostname] ckpoint_thread (./tools/ckpoint/ckpoint.c:559):
    blcr checkpoint returned error
    [proxy:0:0@hostname] HYDT_ckpoint_finalize (./tools/ckpoint/ckpoint.c:878)
     : Error in checkpoint thread 0x7
  • Intel® MPI Library requires the presence of the /dev/shm device in the system. To avoid failures related to the inability to create a shared memory segment, make sure the /dev/shm device is set up correctly.
  • Intel® MPI Library uses TCP sockets to pass stdin stream to the application. If you redirect a large file, the transfer can take long and cause the communication to hang on the remote side. To avoid this issue, pass large files to the application as command line options.
  • DAPL auto provider selection mechanism and improved NUMA support require dapl-2.0.37 or newer.
  • If you set I_MPI_SHM_LMT=direct, the setting has no effect if the Linux* kernel version is lower than 3.2.
  • When using the Linux boot parameter isolcpus with an Intel® Xeon Phi™ processor using default MPI settings, an application launch may fail. If possible, change or remove the isolcpus Linux boot parameter. If it is not possible, you can try setting I_MPI_PIN to off.
  • In some cases, collective calls over the OFA fabric may provide incorrect results. Try setting I_MPI_ADJUST_ALLGATHER to a value between 1 and 4 to resolve the issue.

Technical Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides priority customer support at our Online Support Service Center web site, http://www.intel.com/supporttickets.

In order to get support you need to register your product in the Intel® Registration Center. If your product is not registered, you will not receive priority support.

Lab7Systems Helps Manage an Ocean of Information

$
0
0

Finding efficient ways to manage the massive amounts of data generated by new technologies is a key concern for many industries. It’s especially challenging in the world of life sciences, where research breakthroughs are based on an ever-expanding ocean of information. With help from Intel and Intel® Parallel Studio XE, Lab7Systems is optimizing the open-source BioBuilds* tool collection to make life easier for bioinformaticians, scientists, and IT teams.

“Intel® compilers optimize the BioBuilds packages for superior performance on the Intel64 architecture, including auto-vectorization and autoparallelization for additional performance gains on modern, multi-core CPUs,” explained Cheng Lee, principal software architect for Lab7 Systems.

Get the whole story in our new case study.

Gentle Introduction to PyDaal: Vol 1 of 3 Data Structures

$
0
0

The Intel® Data Analytics Acceleration Library (Intel® DAAL) is written on Intel® Architecture optimized building blocks and includes support for all data analytics stages. Data-driven decision making is empowered by DAAL with foundations for data acquisition, preprocessing, transformation, data mining, modeling, validation. Python users can access these foundations with the python API for DAAL (named PyDaal). Machine learning with python gets an injection of power with PyDaal, accessed via a simple scripting API. Furthermore, PyDaal provides the unique capability to easily extend python scripted batch analytics to online (streaming) data acquisition and/or distributed math processing. To achieve best performance on a range of Intel® processors, Intel® DAAL uses optimized algorithms from the Intel® Math Kernel Library and Intel® Integrated Performance Primitives. Intel® DAAL provides APIs for C++, JAVA, and Python. In this Gentle Introduction series, we will cover the basics of PyDaal from the ground up. The first installment will introduce DAAL’s custom data structure, Numeric Table, and data management in the world of PyDaal.

Intel® Data Analytics Acceleration Library Release Notes and New Features

$
0
0

This page provides the current Release Notes for Intel® Data Analytics Acceleration Library. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features and changes in that version since the last release, and access the download buttons for the detailed release notes, which include important information, such as pre-requisites, software compatibility, installation instructions, and known issues.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

 

2018

Initial Release

Overview:

  • Introduced API modifications to streamline library usage and enable consistency across functionality.
  • Introduced support for Decision Tree for both classification and regression. The feature includes calculation of Gini index and Information Gain for classification, and mean squared error (MSE) for regression split criteria, and Reduced Error Pruning.
  • Introduced support for Decision Forest for both classification and regression. The feature includes calculation of Gini index for classification, variance for regression split criteria, generalization error, and variable importance measures such as Mean Decrease Impurity and Mean Decrease Accuracy.
  • Introduced support for varying learning rate in the Stochastic Gradient Descent algorithm for neural network training.
  • Introduced support for filtering in the Data Source including loading selected features/columns from CSV data source and binary representation of the categorical features.
  • Extended Neural Network layers with Element Wise Add layer.
  • Introduced new samples that allow easy integration of the library with Spark* MLlib.
  • Introduced service method for enabling thread pinning;Performance improvements in various algorithms on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server).

Known Issues:

  • Intel DAAL Python API (a.k.a. pyDAAL) is provided as source. When build it on Windows, users may see warning messages. These warning messages do not indicate critical issues and do not affect the library's functionality. 
  • Intel DAAL Python API (a.k.a. pyDAAL) built from the source does not work on OS X* El Capitan (version 10.11). Workaround: Users can get the Intel Distribution of Python as an Anaconda package (http://anaconda.org/intel/), which contains a pre-built pyDAAL that works on OS X* El Capitan.
Viewing all 3384 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>