Quantcast
Channel: Intel Developer Zone Articles
Viewing all 3384 articles
Browse latest View live

Intel® MPI Library 2018 Beta Release Notes


Intel® MPI Library 2018 Beta Release Notes for Linux* OS

$
0
0

Overview

Intel® MPI Library is a multi-fabric message passing library based on ANL* MPICH3* and OSU* MVAPICH2*.

Intel® MPI Library implements the Message Passing Interface, version 3.1 (MPI-3) specification. The library is thread-safe and provides the MPI standard compliant multi-threading support.

To receive technical support and updates, you need to register your product copy. See Technical Support below.

Product Contents

  • The Intel® MPI Library Runtime Environment (RTO) contains the tools you need to run programs including scalable process management system (Hydra), supporting utilities, and shared (.so) libraries.
  • The Intel® MPI Library Development Kit (SDK) includes all of the Runtime Environment components and compilation tools: compiler wrapper scripts (mpicc, mpiicc, etc.), include files and modules, static (.a) libraries, debug libraries, and test codes.

What's New

Intel® MPI Library 2018 Beta

  • Removed support for the Intel® Xeon Phi™ coprocessor (code named Knights Corner).

Intel® MPI Library 2017 Update 2

  • Added environment variables I_MPI_HARD_FINALIZE and I_MPI_MEMORY_SWAP_LOCK.

Intel® MPI Library 2017 Update 1

  • PMI-2 support for SLURM*, improved SLURM support by default.
  • Improved mini help and diagnostic messages, man1 pages for mpiexec.hydra, hydra_persist, and hydra_nameserver.
  • Deprecations:
    • Intel® Xeon Phi™ coprocessor (code named Knights Corner) support.
    • Cross-OS launches support.
    • DAPL, TMI, and OFA fabrics support.

Intel® MPI Library 2017

  • Support for the MPI-3.1 standard.
  • New topology-aware collective communication algorithms (I_MPI_ADJUST family).
  • Effective MCDRAM (NUMA memory) support. See the Developer Reference, section Tuning Reference > Memory Placement Policy Control for more information.
  • Controls for asynchronous progress thread pinning (I_MPI_ASYNC_PROGRESS).
  • Direct receive functionality for the OFI* fabric (I_MPI_OFI_DRECV).
  • PMI2 protocol support (I_MPI_PMI2).
  • New process startup method (I_MPI_HYDRA_PREFORK).
  • Startup improvements for the SLURM* job manager (I_MPI_SLURM_EXT).
  • New algorithm for MPI-IO collective read operation on the Lustre* file system (I_MPI_LUSTRE_STRIPE_AWARE).
  • Debian Almquist (dash) shell support in compiler wrapper scripts and mpitune.
  • Performance tuning for processors based on Intel® microarchitecture codenamed Broadwell and for Intel® Omni-Path Architecture (Intel® OPA).
  • Performance tuning for Intel® Xeon Phi™ Processor and Coprocessor (code named Knights Landing) and Intel® OPA.
  • OFI latency and message rate improvements.
  • OFI is now the default fabric for Intel® OPA and Intel® True Scale Fabric.
  • MPD process manager is removed.
  • Dedicated pvfs2 ADIO driver is disabled.
  • SSHM support is removed.
  • Support for the Intel® microarchitectures older than the generation codenamed Sandy Bridge is deprecated.
  • Documentation improvements.

Key Features

  • MPI-1, MPI-2.2 and MPI-3.1 specification conformance.
  • Support for Intel® Xeon Phi™ processors (formerly code named Knights Landing).
  • MPICH ABI compatibility.
  • Support for any combination of the following network fabrics:
    • Network fabrics supporting Intel® Omni-Path Architecture (Intel® OPA) devices, through either Tag Matching Interface (TMI) or OpenFabrics Interface* (OFI*).
    • Network fabrics with tag matching capabilities through Tag Matching Interface (TMI), such as Intel® True Scale Fabric, Infiniband*, Myrinet* and other interconnects.
    • Native InfiniBand* interface through OFED* verbs provided by Open Fabrics Alliance* (OFA*).
    • Open Fabrics Interface* (OFI*).
    • RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet*.
    • Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects.
  • Support for the following MPI communication modes related to Intel® Xeon Phi™ coprocessor:
    • Communication inside the Intel Xeon Phi coprocessor.
    • Communication between the Intel Xeon Phi coprocessor and the host CPU inside one node.
    • Communication between the Intel Xeon Phi coprocessors inside one node.
    • Communication between the Intel Xeon Phi coprocessors and host CPU between several nodes.
  • (SDK only) Support for Intel® 64 architecture and Intel® MIC Architecture clusters using:
    • Intel® C++/Fortran Compiler 14.0 and newer.
    • GNU* C, C++ and Fortran 95 compilers.
  • (SDK only) C, C++, Fortran 77, Fortran 90, and Fortran 2008 language bindings.
  • (SDK only) Dynamic or static linking.

System Requirements

Hardware Requirements

  • Systems based on the Intel® 64 architecture, in particular:
    • Intel® Core™ processor family
    • Intel® Xeon® E5 v4 processor family recommended
    • Intel® Xeon® E7 v3 processor family recommended
    • 2nd Generation Intel® Xeon Phi™ Processor (formerly code named Knights Landing)
  • 1 GB of RAM per core (2 GB recommended)
  • 1 GB of free hard disk space

Software Requirements

  • Operating systems:
    • Red Hat* Enterprise Linux* 6, 7
    • Fedora* 23, 24
    • CentOS* 6, 7
    • SUSE* Linux Enterprise Server* 11, 12
    • Ubuntu* LTS 14.04, 16.04
    • Debian* 7, 8
  • (SDK only) Compilers:
    • GNU*: C, C++, Fortran 77 3.3 or newer, Fortran 95 4.4.0 or newer
    • Intel® C++/Fortran Compiler 15.0 or newer
  • Debuggers:
    • Rogue Wave* Software TotalView* 6.8 or newer
    • Allinea* DDT* 1.9.2 or newer
    • GNU* Debuggers 7.4 or newer
  • Batch systems:
    • Platform* LSF* 6.1 or newer
    • Altair* PBS Pro* 7.1 or newer
    • Torque* 1.2.0 or newer
    • Parallelnavi* NQS* V2.0L10 or newer
    • NetBatch* v6.x or newer
    • SLURM* 1.2.21 or newer
    • Univa* Grid Engine* 6.1 or newer
    • IBM* LoadLeveler* 4.1.1.5 or newer
    • Platform* Lava* 1.0
  • Recommended InfiniBand* software:
    • OpenFabrics* Enterprise Distribution (OFED*) 1.5.4.1 or newer
    • Intel® True Scale Fabric Host Channel Adapter Host Drivers & Software (OFED) v7.2.0 or newer
    • Mellanox* OFED* 1.5.3 or newer
  • Virtual environments:
    • Docker* 1.13.0
  • Additional software:
    • The memory placement functionality for NUMA nodes requires the libnuma.so library and numactl utility installed. numactl should include numactlnumactl-devel and numactl-libs.

Known Issues and Limitations

  • The I_MPI_JOB_FAST_STARTUP variable takes effect only when shm is selected as the intra-node fabric.
  • ILP64 is not supported by MPI modules for Fortran* 2008.
  • In case of program termination (like signal), remove trash in the /dev/shm/ directory manually with:
    rm -r /dev/shm/shm-col-space-*
  • In case of large number of simultaneously used communicators (more than 10,000) per node, it is recommended to increase the maximum numbers of memory mappings with one of the following methods:
    • echo 1048576 > /proc/sys/vm/max_map_count
    • sysctl -w vm.max_map_count=1048576
    • disable shared memory collectives by setting the variable: I_MPI_COLL_INTRANODE=pt2pt
  • On some Linux* distributions Intel® MPI Library may fail for non-root users due to security limitations. This was observed on Ubuntu* 12.04, and could impact other distributions and versions as well. Two workarounds exist:
    • Enable ptrace for non-root users with:
      echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
    • Revert the Intel® MPI Library to an earlier shared memory mechanism, which is not impacted, by setting: I_MPI_SHM_LMT=shm
  • Ubuntu* does not allow attaching a debugger to a non-child process. In order to use -gdb, this behavior must be disabled by setting the sysctl value in /proc/sys/kernel/yama/ptrace_scope to 0.
  • Cross-OS runs using ssh from a Windows* host fail. Two workarounds exist:
    • Create a symlink on the Linux* host that looks identical to the Windows* path to pmi_proxy.
    • Start hydra_persist on the Linux* host in the background (hydra_persist &) and use -bootstrap service from the Windows* host. This requires that the Hydra service also be installed and started on the Windows* host.
  • The OFA fabric and certain DAPL providers may not work or provide worthwhile performance with the Intel® Omni-Path Fabric. For better performance, try choosing the OFI or TMI fabric.
  • Enabling statistics gathering may result in increased time in MPI_Finalize.
  • In systems where some nodes have only Intel® True Scale Fabric or Intel® Omni-Path Fabric available, while others have both Intel® True Scale and e.g. Mellanox* HCAs, automatic fabric detection will lead to a hang or failure, as the first type of nodes will select ofi/tmi, and the second type will select dapl as the internode fabric. To avoid this, explicitly specify a fabric that is available on all the nodes.
  • In order to run a mixed OS job (Linux* and Windows*), all binaries must link to the same single or multithreaded MPI library.  The single- and multithreaded libraries are incompatible with each other and should not be mixed. Note that the pre-compiled binaries for the Intel® MPI Benchmarks are inconsistent (Linux* version links to multithreaded, Windows* version links to single threaded) and as such, at least one must be rebuilt to match the other.
  • Intel® MPI Library does not support using the OFA fabric over an Intel® Symmetric Communications Interface (Intel® SCI) adapter. If you are using an Intel SCI adapter, such as with Intel® Many Integrated Core Architecture, you will need to select a different fabric.
  • The TMI and OFI fabrics over PSM do not support messages larger than 232 - 1 bytes. If you have messages larger than this limit, select a different fabric.
  • If a communication between two existing MPI applications is established using the process attachment mechanism, the library does not control whether the same fabric has been selected for each application. This situation may cause unexpected applications behavior. Set the I_MPI_FABRICS variable to the same values for each application to avoid this issue.
  • Do not load thread-safe libraries through dlopen(3).
  • Certain DAPL providers may not function properly if your application uses system(3), fork(2), vfork(2), or clone(2) system calls. Do not use these system calls or functions based upon them. For example, system(3), with OFED* DAPL provider with Linux* kernel version earlier than official version 2.6.16. Set the RDMAV_FORK_SAFE environment variable to enable the OFED workaround with compatible kernel version.
  • MPI_Mprobe, MPI_Improbe, and MPI_Cancel are not supported by the TMI and OFI fabrics.
  • You may get an error message at the end of a checkpoint-restart enabled application, if some of the application processes exit in the middle of taking a checkpoint image. Such an error does not impact the application and can be ignored. To avoid this error, set a larger number than before for the -checkpoint-interval option. The error message may look as follows:
    [proxy:0:0@hostname] HYDT_ckpoint_blcr_checkpoint (./tools/ckpoint/blcr/
    ckpoint_blcr.c:313): cr_poll_checkpoint failed: No such process
    [proxy:0:0@hostname] ckpoint_thread (./tools/ckpoint/ckpoint.c:559):
    blcr checkpoint returned error
    [proxy:0:0@hostname] HYDT_ckpoint_finalize (./tools/ckpoint/ckpoint.c:878)
     : Error in checkpoint thread 0x7
  • Intel® MPI Library requires the presence of the /dev/shm device in the system. To avoid failures related to the inability to create a shared memory segment, make sure the /dev/shm device is set up correctly.
  • Intel® MPI Library uses TCP sockets to pass stdin stream to the application. If you redirect a large file, the transfer can take long and cause the communication to hang on the remote side. To avoid this issue, pass large files to the application as command line options.
  • DAPL auto provider selection mechanism and improved NUMA support require dapl-2.0.37 or newer.
  • If you set I_MPI_SHM_LMT=direct, the setting has no effect if the Linux* kernel version is lower than 3.2.
  • When using the Linux boot parameter isolcpus with an Intel® Xeon Phi™ processor using default MPI settings, an application launch may fail. If possible, change or remove the isolcpus Linux boot parameter. If it is not possible, you can try setting I_MPI_PIN to off.
  • In some cases, collective calls over the OFA fabric may provide incorrect results. Try setting I_MPI_ADJUST_ALLGATHER to a value between 1 and 4 to resolve the issue.

Technical Support

Every purchase of an Intel® Software Development Product includes a year of support services, which provides priority customer support at our Online Support Service Center web site, http://www.intel.com/supporttickets.

In order to get support you need to register your product in the Intel® Registration Center. If your product is not registered, you will not receive priority support.

Intel® Data Analytics Acceleration Library – Documentation

$
0
0

Documentation

Getting Started

Developer Guide and Reference

  • Intel® Data Analytics Acceleration Library 2017 Update 2 Developer Guide and Reference
    HTML | ZIP
  • Intel® Data Analytics Acceleration Library 2017 Update 2 Developer Guide
    HTML | PDF

Beta Documentation

Previous Versions of Documentation

Intel® Data Analytics Acceleration Library 2018 (Beta) API Reference

$
0
0

Intel® Data Analytics Acceleration Library (Intel® DAAL) is the library of Intel® architecture optimized building blocks covering all stages of data analytics: data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.

Algorithms implemented in the library include:

  • Moments of low order and quantiles
  • K-Means clustering
  • Classification algorithms, including boosting algorithms and Naïve Bayes, Support Vector Machine (SVM), and multi-class classifiers
  • Neural network algorithms

Intel DAAL provides application programming interfaces (APIs) for C++, Java*, and Python* languages.

C++ API Reference

Downloadapplication/zip (14.06 MB)

Java* API Reference

Downloadapplication/zip (10.32 MB)

Python* API Reference

Downloadapplication/zip (7.75 MB)

For the Developer Guide and previous versions of API reference, see Intel® Data Analytics Acceleration Library - Documentation.

Deep Learning Deployment Toolkit Release Notes

$
0
0

Release Notes include important information, such as

  1. pre-requisites,
  2. software compatibility,
  3. known issues

Beta - Release Notes

For additional information such as installation and user guides, visit Intel® Computer Vision SDK

Previous Versions

Initial Beta (Part of the Intel® Deep Learning SDK) - Release Notes

 

All files are in PDF format - Adobe Reader* (or compatible) required.

 
For more complete information about compiler optimizations, see our Optimization Notice.

Intel® MPI Library 2018 Beta - Documentation

$
0
0

The section below provides links to the Intel® MPI Library 2018 Beta documentation. You can find other documentation, including user guides and reference manuals for current and earlier Intel software product releases in the Intel® Software Documentation Library.

Visit this page for documentation pertaining to the latest stable Intel MPI Library release.

You can also download an offline version of the documentation from the Intel Registration Center > Product List > Intel® Parallel Studio XE Documentation Beta


Documentation

Intel® MPI Library for Linux*

TitleFormatVersionTypeDate
Developer Guide for Linux*Online | PDF2018 BetaDeveloper GuideApr 2017
Developer Reference for Linux*Online | PDF2018 BetaDeveloper ReferenceApr 2017

Intel® MPI Library for Windows*

TitleFormatVersionTypeDate
Developer Guide for Windows*Online | PDF2018 BetaDeveloper GuideApr 2017
Developer Reference for Windows*Online | PDF2018 BetaDeveloper ReferenceApr 2017

Intel® Trace Analyzer and Collector 2018 Beta - Documentation

$
0
0

The section below provides links to the Intel® Trace Analyzer and Collector 2018 Beta documentation.  You can find other documentation, including user guides and reference manuals for current and earlier Intel software product releases in the Intel® Software Documentation Library.

Visit this page for documentation pertaining to the latest stable Intel Trace Analyzer and Collector release.

You can also download an offline version of the documentation from the Intel Registration Center > Product List > Intel® Parallel Studio XE Documentation Beta


Documentation

TitleFormatVersionTypeDate
Intel® Trace Collector User and Reference GuideOnline | PDF2018 BetaUser/Reference GuideApr 2017
Intel® Trace Analyzer User and Reference GuideOnline | PDF2018 BetaUser/Reference GuideApr 2017

Get Started Installing Intel® Parallel Studio XE 2018 - macOS

$
0
0

‹ Back to Intel® Parallel Studio XE


Step 1. Select OS (macOS* Selected)

Select Another OS

Linux*  Windows*


Step 2. Before You Install

A. System Requirements

Processor Requirements

Intel Parallel Studio XE supports only IA 64-bit host.

Systems based on Intel® 64 architecture:

  • Intel® Core™ processor family or higher
  • Intel® Xeon® E5 v5 processor families recommended
  • Intel® Xeon® E7 v5 processor families recommended

Disk Space Requirements

12 GB of disk space (minimum) on a standard installation.

During the installation process, the installer may need up to 12 GB of additional temporary disk storage to manage the intermediate installation files.

Operating System Requirements

The operating systems listed below are supported by all components on Intel® 64 Architecture.

  • macOS 10.12

Memory Requirements

2 GB RAM (minimum)

Special Requirements

On macOS*, the Intel® C/C++ Compiler and Intel® Fortran Compiler require a version of Xcode* to be installed. The following versions are currently supported:

  • Xcode* 8

B. Prepare to Configure the Installation

What sort of things can I customize during the installation?

You can customize the items listed below during the installation. We will perform a system check to ensure you configuration will work correctly on you system, and help you solve any issues that may be detected. Here are the configurable items:

  • Target Architecture(s): Choose the targets you develop for: 64-bit, 32-bit, or both
  • Installation Directory: You can choose any that you have write access to, or use the default folder
  • Components: Choose which Components you wish to install. (In order to save space, some users elect to not install components they don’t plan on using.)

Step 3. Download and Install

A. Download Intel® Parallel Studio XE Composer Edition Trial Version

Register & Download Version for C++ Register & Download Version for Fortran

B. Install Intel® Parallel Studio XE Composer Edition Trial Version

Extract the contents of the installation package to a directory of your choice.

You can either install with the GUI or use the command line. You will find files for both methods in the main directory of the extracted files.

GUI Installer

Open a terminal window and run the file named install_GUI.sh.

CLI Installer

Open a terminal window and run the file named install.sh. Follow the prompts in the CLI to continue installation.

FAQs

Why do I need to register?

There are international regulations that require Intel to register users who download an application of this nature. Registering will also allow Intel to keep you up to date on the latest releases.


Step 4. Start Developing

When the installation is complete, you are ready to start developing with the Intel Parallel Studio XE. Start exploring what's available in the studio with the Getting Started Guide.


Steps to register a floating license

$
0
0

How you register your floating license depends on how it was issued.  Registration is the process of owning a particular serial number, while Activation is assigning the owned serial number to a license server.

Registration

If you have a serial number which has no owner, you may register it by following this process:

  1. Go to the Intel® Registration Center
  2. On the registration screen in the Register a Product section, enter your email address in the Email box.
  3. Enter the same email address in the Confirm Email box.
  4. Enter the serial number in the Serial Number box and then click Register Product.
  5. Follow the instructions on the web page to download and install your product.
  6. After registering, you will receive an email confirming the registration.  A license file will not provided at this point.

If you already have a registration center account, you may login and enter the unregistered serial number in the upper-right serial number box.

If the serial number is already registered, the above process will automatically add you as a user of the license.  This grants the ability to download the products available with the license.  If you expected to become the license owner, you can contact support to assist with determining the current owner and/or license transfer.

Activation

To activate your floating license, you must provide the host ID and host name of the server running the license manager.  This can be done in one of two ways:

  • If the server has connectivity with the registration center, you can provide the serial number during installation of the Intel Software License Manager and it will automatically provide the host information to generate the license file.
  • If the server cannot submit the host information, or if you want to manually activate the serial number, you must do the following:
  1. Login to the registration center
  2. Click the serial number under the "Serial Numbers" tab - if the serial number is greyed out, you are not the owner or administrator
  3. Enter the host information
  4. Click "Activate Serial Number" 

Serial number activation

After the serial number is activated, you may download the license file. 

The Ports of the Intel(R) Software License Manager

$
0
0

The Intel® Software License Manager uses two ports to serve licenses - one for lmgrd (the main license service) and one for the INTEL vendor daemon.  Both ports must be open and not blocked by a firewall.

lmgrd - FlexNet daemon

This is the main process that controls license management, and is provided by FlexNet Publisher, formerly Flexlm.  The Intel Software License Manager uses port 28518 as a default to avoid conflicts with other vendors.  This can be entered through the Intel Registration Center during activation, or changed for activated licenses by following these steps.

INTEL - vendor daemon

This is the vendor daemon that serves Intel licenses.  When lmgrd is started or restarted, it starts the vendor daemon which determines a port to use.  At this time, the selected port number is displayed in the startup output, which is either written to a log file or stdout.  There is no additional reporting by the license manager utilities on this port, causing it to be overlooked. 

As firewalls have become more common, so have reports of issues stemming from the INTEL vendor daemon port being blocked.  Even if the port was not previously blocked, restarting the license manager can cause the port number to change and be subsequently blocked.  To determine the port number, run a command such as netstat and look for the INTEL daemon.

The INTEL vendor daemon port can be specified by modifying the license file.  Change the second line as follows:

VENDOR INTEL port=<port>

Take care in changing the license file, or you may invalidate it. 

Be sure to restart the license manager after any license file changes.

Data Science is an Ocean of Information—Stay Focused!

$
0
0

A primer on how to become a data scientist

How do I become a good data scientist? Should I learn R* or Python*? Or both? Do I need to get a PhD? Do I need to take tons of math classes? What soft skills do I need to become successful? What about project management experience? What skills are transferable? Where do I start?

Become a data scientist

Data science is a popular topic in the tech world today. It is the science that powers many of the trends in this world, from machine learning to artificial intelligence.

Become a data scientistIn this article, we discuss our teachings about data science in a series of steps so that any product manager or business manager interested in exploring this science will be able take their first step toward becoming a data scientist or at least develop a deeper understanding of this science.

Step 1: Define a Problem Statement

We all have heard conversations that go sometime like this: "Look at the data and tell me what you find." This approach may work when the volume of data is small, structured, and limited. But when we are dealing with gigabytes or terabytes of data, it can lead to an endless, daunting detective hunt, which provides no answers because there were no questions to begin with.

As powerful as science is, it's not magic. Inventions in any field of science solve a problem. Similarly, the first step in using data science is to define a problem statement, a hypothesis to be validated, or a question to be answered. It may also focus on a trend to be discovered, an estimate, a prediction to be made, and so on.

For example, take MyFitnessPal*, which is a mobile app for monitoring health and fitness. A few of my friends and I downloaded it about a year ago, and then used it almost daily for a while. But over the past 6 months, most of us have completely stopped using it. If I were a product manager for MyFitnessPal, a problem I might want to solve would be: how can we drive customer engagement and retention for the app?

Step 2: Get the Data

Today's data scientists access data from several sources. This data may be structured or unstructured. The raw data that we often get is unstructured and/or dirty data, which needs to be cleaned and structured before it can be used for analysis. Most of the common sources of data now offer connectors to import the raw data in R or Python.

Common data sources include the following:

  • Databases
  • CSV files
  • Social media feeds like Twitter, Facebook, and so one (unstructured)
  • JSON
  • Web-scraping data (unstructured)
  • Web analytics
  • Sensor data driven by the Internet of Things
  • Hadoop*
  • Spark*
  • Customer interview data
  • Excel* analysis
  • Academic documents 
  • Government research documents and libraries like www.data.gov
  • Financial data; for example, from Yahoo Finance*

In the data science world, common vocabulary includes:

  • Observations or examples. These can be thought of as horizontal database records from a typical database.
  • Variables, signals, characteristics. These equate to the fields or columns in the database world. A variable could be qualitative or quantitative.
⇄ Observations or examples are like the rows in a database. For example: A customer record for Joe Allen.
 ⇅ Variables, signals, or characteristics 
  are like the columns 
  For example: Joe's Height. 

Step 3: Cleaning the Data

Several terms are used to refer to data cleaning, such as data munging, data preprocessing, data transformation, and data wrangling. These terms all refer to the process of preparing the raw data to be used for data analysis.

As much as 70–80 percent of the efforts in a data science analysis involve data cleansing.
A data scientist analyzes each variable in the data to evaluate whether it is worthy of being a feature in the model. If including the variable increases the model's predictive power, it is considered a predictor for the model. Such a variable is then considered a feature, and together all the features create a feature vector for the model. This analysis is called feature engineering.

Sometimes a variable may need to be cleaned or transformed to be used as a feature in the model. To do that we write scripts, which are also referred to as munging scripts. Scripts can perform a range of functions like:

  • Rename a variable (which helps with readability and code sharing)
  • Transform text (if "variable = "big" set variable = "HUGE")
  • Truncate data
  • Create new variables or transpose data (for example, given the birth date, calculate age)
  • Supplement existing data with additional data (for example, given the zip code, get the city and state)
  • Convert discrete numerical variables into a continuous range (for example: salary-to-salary range; age-to-age range)
  • Date and time conversions
  • Convert a categorical variable into multiple binary variables. For example, a categorical variable for region (with possible values being east, west, north, and south) could be converted into four binary variables, east, west, north, and south, with only one of them being true for an observation. This approach helps create easier joins in the data.

Sometimes the data has numerical values that vary in magnitude, making it difficult to visualize the information. We can resolve this issue using feature scaling. For example,consider the square footage and number of rooms in a house. If we normalize the square footage of a house by making it a similar magnitude as the number of bedrooms, our analysis becomes easier.

A series of scripts are applied to the data in an iterative manner until we get data that is clean enough for analysis. To get a continuous supply of data for analysis, the series of data munging scripts need to be rerun on the new raw data. Data pipeline is the term given to this series of processing steps applied to raw data to make it analysis ready.

Become a data scientist

Step 4: Data Analysis and Model Selection

Now we have clean data and we are ready for analysis. Our next goal is to become familiar with the data using statistical modeling, visualizations, discovery-oriented data analysis, and so on.

For simple problems, we can use simple statistical analysis using the mean, medium, mode, min, max, average, range, quartile, and so on.

Supervised Learning

We could also use supervised learning with data sets that gives us access to actual values of response variables (dependent variables) for a given set of feature variables (independent variables). For example, we could find trends based on the tenure, seniority, and title for employees who have left the company (resigned=true) from actual data, and then use those trends to predict whether other employees will resign too. Or we could use historic data to correlate a trend between the number of visitors (an independent variable or a predictor) and revenue generated (a dependent variable or response variable). This correlation could then be used to predict future revenue for the site based on the number of visitors.

The key requirement for supervised learning is the availability of ACTUAL Values and a clear question that needs to be answered. For example: Will this employee leave? How much revenue can we expect? Data scientists often refer to this as "Response variable is labeled for existing data."

Regression is a common tool used for supervised learning. A one-factor regression uses one variable; a multifactor regression uses many variables.
Linear regression assumes that the unknown relation between the factor and the response variable is a linear relation Y = a + bx, where b is the coefficient of x.

A part of the existing data is used as training data to calculate the value of this coefficient. Data scientists often use 60 percent, 80 percent, or at times 90 percent of the data for training. Once the value of the coefficient is calculated for the trained model, it is tested with the remaining data also referred to as the test data to predict thevalue of the response variable. The difference between the predicted response value and the actual value is the Holy Grail of metrics referred to as the test error metric.

Our quest in data science modeling is to minimize the test error metrics in order to increase the predictive power of the model by:

  • Selecting effective factor variables
  • Writing efficient data munging scripts
  • Selecting the appropriate statistical algorithms
  • Selecting the required amount of test and training data

Unsupervised Learning

Unsupervised learning is applied when we are trying to learn the structure of the underlying data itself. There is NO RESPONSE VARIABLE. Data sets are unlabeled and pre-existing insights are unclear. We are not clear about anything ahead of time so we are not trying to predict anything!
This technique is effective for exploratory analysis and can be used to answer questions like

  • Grouping. How many types of customer segments do we have?
  • Anomaly detection. Is this normal?

Analysis of variance (ANOVA) is a common technique used to compare the means of two or more groups. It's named ANOVA though since the "estimates of variance" is the main intermediate statistics calculated. The means of various groups are compared using various distance metrics, Euclidean distance being a popular one.

ANOVA is used to organize observations into similar groups, called clusters. The observations can be classified into these clusters based on their respective predictors.
http://www.statsdirect.com/help/content/analysis_of_variance/anova.htm

Two common clustering applications are:

  • Hierarchical clustering. A bottom-up approach. We start with individual observations and merge them with the closest one. We then calculate the means of these grouped observations and merge the groups with the means closest to each other. This is repeated until larger groups are formed. The distance metrics is defined ahead of time. This technique is complex and not advisable with a high-dimension data set.

    Hierarchical ANOVA
    Become a data scientist

  • K-means clustering. Uses a partitioning approach.
    • We assume that the data has a fixed number of clusters in advance based on our intuition.
    • We also assume the starting center for each cluster.
    • Each observation is then assigned to the cluster with the mean closest to the observation
    • The step is repeated until all observations have been assigned to a cluster.
    • Now we recalculate the mean for the clusters based on the average of all the observations assigned to the cluster.
    • Observations are reclassified to these new cluster and steps c, d, and e are repeated until they reach a stable state.

If a stable state is not achieved, we may need to refine the number of clusters (i.e., K) we assumed in the beginning or use a different distance metrics.

Step 5: Visualize and Communicate Effectively

The final clusters can be visualized for easy communication using tools like Tableau* or graphing libraries.

Tips from Data Science Practitioners

In my quest to understand data science, I met with practitioners working in companies, including Facebook, eBay, LinkedIn, Uber, and some consulting firms, that are effectively leveraging the power of data. Here are some powerful words of advice I received:

  • Know your data. It's important to fully understand the data and the assumptions behind it. Otherwise, the data may be ineffectively used, which may lead to arriving at the wrong answer, solving the wrong problem, or both.
  • Understand the domain and the problem. The data scientist must have a deep understanding of the business domain and the problem to be solved to be able to extract the appropriate insights from the data.
  • Ethics. Don't compromise data quality to fit a hypothesis. The problem often is not ignorance, but our preconceived notions!
  • It's a myth that a larger data set always offers better insights. Although an increased amount of data becomes statistically significant, a large data set also comes with higher noise. It's common to see the R-squared of a larger data set that is smaller than that of a smaller data set.
  • While data science is not a product by itself, it can power brilliant products that solve complex problems. Product managers and data scientists that communicate effectively can become strong partners:
    • The product manager initially brings to the conversation the business problem to be solved, questions to be answered, and constraints to be discovered and/or defined.
    • The data scientist, who brings deep expertise in machine learning and mathematics, focuses on the theoretical aspects of the business problem. Modern data sets are used to perform data analysis, transformations, model selections, and validations to establish the foundations of the theoryto be applied to the business problem.
    • The software engineer works to operationalize the theory and the solution. He or she needs a strong understanding of the mechanics of machine learning (Hadoop clusters, data storage hardware, writing production code, and so on).
  • Learn a programming language. Python is easiest to learn; R is considered the most powerful.

Commonly Used Data Science Tools

R

R is a favorite tool of many data scientists and holds a special place in the world of academia, where data science problems are worked on from a mathematician's and statistician's perspective. R is an open source and rich language, with about 9,000 additional packages available. The tool used to program in R is called R Studio*. R has a steep learning curve, though its footprint is steadily increasing in enterprise world and owes some of it popularity to the rich and powerful Regular Expression-based algorithms already available.

Python

Python is slowly becoming the most extensively used language in the data science community. Like R, it is also an open source language and is used primarily by software engineers who view data science as a tool to solve real customer-facing business problems using data. Python is easier to learn than R, because the language emphasizes readability and productivity. It is also more flexible and simpler.

SQL

SQL is the basic language used to interact with databases and is required for all tools.

Other Tools

  • Apache Spark offer Scala*
  • MATLAB* is a mathematical environment that academia has used for a long time. It offers an open source version called Octave*
  • Java* is used for Hadoop environments

What about the Soft Skills?

Below is a list of important soft skills to have, many of which you might already have in your portfolio.

  • Communication. A data scientist doesn't sit in a cube and code Python programs. The data science process requires that you mingle with your team. You need to connect and build rapport with executives, product owners, product managers, developers, big data engineers, NoSQL* experts, and more. Your goal is to understand what they are trying to build and how data science and machine learning can help.
  • Coaching. As a data scientist, your coaching skills will shine. You are not an individual contributor of the company; you are the CEO's best friend, who can help him or her shape the company—the product and the territory based on data science. For example, based on your data science results, you give perspective analysis results to the executive team recommending that the company launch dark-green shoes in Brazil; the same product will fail in Silicon Valley in the United States. Your findings can save millions of dollars for the company.
    Become a data scientist
  • Storyteller. A good data scientist is a good storyteller. During your data science project, you will have tons of data, tons of theories, and tons of results. Sometimes you'll feel as if you are lost in an ocean of data. If this happens, step back and think: What are we trying to achieve? For example, if your audience is a CEO and COO, they might need to make an executive decision in a couple of minutes based on your presentation. They aren't interested in learning about your ROC curve or in going through the 4 terabytes of data and 3,000 lines of your Python code.

    Your goal is to give them direct recommendations based on your solid prediction algorithm and accurate results. We recommend that you create four or five slides where you clearly tell this story—storytelling backed by solid data and solid research.Visualization. Good data scientist needs to communicate results and recommendations using visualization. You cannot give 200-page report for someone to read. You need to present using pictures, images, charts, and graphs.

  • Mindset. A good data scientist has a "hacker" mind—"hacker" being used here in a good way—and is relentlessly looking for patterns in the data set.
  • Love thy data. You need to live with your data and let it tell you the story. Of course, there are many tools you can use to more fully understand the data, but just a superficial glance at it will give you lots of information.

What Can I Become?

Now it's time to decide. What type of data scientist should I become?

Become a data scientist

  • Understand the pipeline. You need to start somewhere. You can be a Python developer working in a data science project. You can gather input data coming from logs, sensors, CSV file, and so on. You can write scripts to consume and ingest the incoming data. Data can be still or in-motion. You might decide to become a big data engineer working with technologies like Hadoop or Hive*. Or a machine learning algorithm specialist—someone who has mastered the skills and understands which algorithm works best under which problem. You can be a math genius, who can play with machine learning out-of-the-box algorithms and modify them according to your needs. You might become a data persistence expert. You might use SQL or NoSQL technologies to persist/serve data. Or you might become a data visualization expert and build dashboards and data stories using tools like Tableau. So check out the above pipeline one more time: from ingestion to visualization. Make an opportunity list. For example, "D3 expert, Python script expert, Spark master" and so on.
  • Become a data scientistCheck out the job market. Look at the various job portals to get an idea of the current demand. How many jobs are there? What jobs are in highest demand? What is the salary structure? A casual glance at data science jobs in the Bay Area shows promising opportunities ►
  • Understand yourself. You have explored the pipeline and type of jobs you can get. Now it's time to think about yourself and your skills. What do you enjoy most and what experience do you have? Do you love project management? Databases? Think about your previous success stories. Do you love writing complex scripts to correlate and manipulate data? Are you a visualization person, who is an expert at creating compelling presentations? Make a "love-to-do-list." For example, "love to code, love scripts, love Python."
  • Create a match. Match your opportunity list with your love-to-do list and get on with the program. Data science is an ocean of information. Stay focused!!

Recent Updates - Request Form

Intel® MKL 2018 Beta is now available - draft

$
0
0

Intel® MKL 2018 Beta is now available as part of the Parallel Studio XE 2018 Beta.
Check the Join the Intel® Parallel Studio XE 2018 Beta program post to learn how to join the Beta program, and the provide your feedback.

What's New in Intel® MKL 2018 Beta:

  • DNN:
    • Added initial convolution and inner product optimizations for Intel(R) Xeon Phi(TM) processors based on Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups.
    • Average pooling has an option to include padding into mean values computation
  • BLAS Features:
    • Introduced optimized integer matrix-matrix multiplication routines (GEMM_S16S16S16 and GEMM_S16S16S32) to work with quantized matrices for all architectures.
    • Introduced ?TRSM_BATCH to complement the batched BLAS for all architectures
  • BLAS Optimizations:
    • Optimized SGEMM, GEMM_S16S16S16 and GEMM_S16S16S32 for Intel(R) Xeon Phi(TM) processors based on Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups
    • Improved ?GEMM_BATCH performance for all architectures
    • Improved single and multi-threaded {D,S}SYMV performance for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and the Intel® Xeon Phi™ processor x200
  • Sparse BLAS:
    • Improved performance of CSRMV/BSRMV functionality for Intel® AVX-512 instruction set in Inspector-Executor mode
  • LAPACK:
    • Introduced factorization and solve routines based on Aasen's algorithm: ?sytrf_aa/?hetrf_aa, ?sytrs_aa/?hetrs_aa
  • Vector Mathematics:
    • Added 24 new functions: v?Fmod, v?Remainder, v?Powr, v?Exp2; v?Exp10; v?Log2; v?Logb; v?Cospi; v?Sinpi; v?Tanpi; v?Acospi; v?Asinpi; v?Atanpi; v?Atan2pi; v?Cosd; v?Sind; v?Tand; v?CopySign; v?NextAfter; v?Fdim; v?Fmax; v?Fmin; v?MaxMag; v?MinMag
  • Library Engineering:
    • Introduced support for Intel(R) Xeon Phi(TM) processors based on Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups.

Optimizations are not dispatched unless explicitly enabled with mkl_enable_instructions function call or MKL_ENABLE_INSTRUCTIONS environment variable.

  • Documentation: 
    • Starting with this version of Intel MKL, most of the documentation for Parallel Studio XE is only available online at https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation. You can also download it from the Intel Registration Center > Product List > Intel® Parallel Studio XE Documentation Beta.
  • Hardware Support for Intel® Xeon Phi™ Coprocessors (code name Knights Corner) is removed. Customers are recommended to stay on MKL 2017 given they continue to use and develop for Intel® Xeon Phi™  Coprocessors (aka Knight Corner)

Intel® XDK FAQs - App Designer [DEPRECATED]

$
0
0

[DEPRECATED] App Designer (UI layout tool) has been deprecated!

IMPORTANT: the Intel XDK App Designer component (aka the UI layout tool) has been deprecated. It will be retired in an upcoming release. Once retired, existing App Designer projects will continue to work, but you will not be able to create new App Designer projects.

No bug fixes will be implemented for the existing App Designer component nor for any of the UI frameworks that were supported by App Designer.

If you have designed your layout by hand or by using an external tool, there will be no changes to your project. This change ONLY affects projects that have been created using the App Designer UI layout tool. If you are just starting with the Intel XDK we recommend that you do NOT use App Designer to create your layout, since the editor will not be maintained and may eventually be discontinued.

There are many UI frameworks and tools available for creating UI layouts; too many to enumerate here. The vast majority of layout tools that generate standard HTML5 code (HTML/CSS/JavaScript) should work with no issue. The Intel XDK creates standard Cordova CLI (aka PhoneGap) applications, so any UI frameworks and tools that work in the Cordova CLI environment will work with your Intel XDK applications.

 

Which App Designer framework should I use? Which Intel XDK layout framework is best?

There is no "best" UI framework for your application. Each UI framework has pros and cons. You should choose that UI framework which serves your application needs the best. Using App Designer to create your UI is not a requirement to building a mobile app with the Intel XDK. You can create your layout by hand or using any UI framework (by hand) that is compatible with the Cordova CLI (aka PhoneGap) webview environment.

  • Twitter Bootstrap 3 -- This UI framework has been deprecated and will be retired from App Designer in a future release of the Intel XDK. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using Twitter Bootstrap please visit the Twitter Bootstrap website and the Twitter Bootstrap GitHub repo for documentation and help.

  • Framework7 -- This UI framework has been retired from App Designer. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using Framework7 please visit the Framework7 project page and the Framework7 GitHub repo for documentation and help.

  • Ionic -- This UI framework has been retired from App Designer. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using Ionic please visit the Ionic project page and the Ionic GitHub repo for documentation and help.

  • App Framework 3 -- This UI framework has been retired from App Designer. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using App Framework please visit the App Framework project page and the App Framework GitHub repo for documentation and help.

  • Topcoat -- This UI framework has been retired from App Designer. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using Topcoat please visit the Topcoat project page and the Topcoat GitHub repo for documentation and help.

  • Ratchet -- This UI framework has been retired from App Designer. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using Ratchet please visit the Ratchet project page and the Ratchet GitHub repo for documentation and help.

  • jQuery Mobile -- This UI framework has been retired from App Designer. You can always use this (or any mobile) framework with the Intel XDK, but you will have to do so manually, without the help of the Intel XDK App Designer UI layout tool. If you wish to continue using jQuery Mobile please visit the jQuery Mobile API page and jQuery Mobile GitHub page for documentation and help.

What does the Google* Map widget’s "center type" attribute and its values "Auto calculate,""Address" and "Lat/Long" mean?

The "center type" parameter defines how the map view is centered in your div. It is used to initialize the map as follows:

  • Lat/Long: center the map on a specific latitude and longitude (that you provide on the properties page)
  • Address: center the map on a specific address (that you provide on the properties page)
  • Auto Calculate: center the map on a collection of markers

This is just for initialization of the map widget. Beyond that you must use the standard Google maps APIs to move and/or modify the map. See the "google_maps.js" code for initialization of the widget and some calls to the Google maps APIs. There is also a pointer to the Google maps API at the beginning of the JS file.

To get the current position, you have to use the Geo API, and then push that into the Maps API to display it. The Google Maps API will not give you any device data, it will only display information for you. Please refer to the Intel XDK "Hello, Cordova" sample app for some help with the Geo API. There are a lot of useful comments and console.log messages.

How do I size UI elements in my project?

Trying to implement "pixel perfect" user interfaces with HTML5 apps is not recommended as there is a wide array of device resolutions and aspect ratios and it is impossible to insure you are sized properly for every device. Instead, you should use "responsive web design" techniques to build your UI so that it adapts to different sizes automatically. You can also use the CSS media query directive to build CSS rules that are specific to different screen dimensions.

Note:The viewport is sized in CSS pixels (aka virtual pixels or device independent pixels) and so the physical pixel dimensions are not what you will normally be designing for.

How do I create lists, buttons and other UI elements with the Intel XDK?

The Intel XDK provides you with a way to build HTML5 apps that are run in a webview on the target device. This is analogous to running in an embedded browser (refer to this blog for details). Thus, the programming techniques are the same as those you would use inside a browser, when writing a single-page client-side HTML5 app. You can use the Intel XDK App Designer tool to drag and drop UI elements.

Why is the user interface for Chrome on Android* unresponsive?

It could be that you are using an outdated version of the App Framework* files. You can find the recent versions here. You can safely replace any App Framework files that App Designer installed in your project with more recent copies as App Designer will not overwrite the new files.

How do I work with more recent versions of App Framework* since the latest Intel XDK release?

You can replace the App Framework* files that the Intel XDK automatically inserted with more recent versions that can be found here. App designer will not overwrite your replacement.

Is there a replacement to XPATH in App Framework* for selecting nodes from an XML document?

This FAQ applies only to App Framework 2. App Framework 3 no longer includes a replacement for the jQuery selector library, it expects that you are using standard jQuery.

App Framework is a UI library that implements a subset of the jQuery* selector library. If you wish to use jQuery for XPath manipulation, it is recommend that you use jQuery as your selector library and not App Framework. However, it is also possible to use jQuery with the UI components of App Framework. Please refer to this entry in the App Framework docs.

It would look similar to this:

<script src="lib/jq/jquery.js"></script><script src="lib/af/jq.appframework.js"></script><script src="lib/af/appframework.ui.js"></script>

Why does my App Framework* app that was previously working suddenly start having issues with Android* 4.4?

Ensure you have upgraded to the latest version of App Framework. If your app was built with the now retired Intel XDK "legacy" build system be sure to set the "Targeted Android Version" to 19 in the Android-Crosswalk build settings. The legacy build targeted Android 4.2.

How do I manually set a theme?

If you want to, for example, change the theme only on Android*, you can add the following lines of code:

  1. $.ui.autoLaunch = false; //Stop the App Framework* auto launch right after you load App Framework*
  2. Detect the underlying platform using either navigator.userAgent or intel.xdk.device.platform or window.device.platform. If the platform detected is Android*, set $.ui.useOSThemes=false todisable custom themes and set <div id=”afui” class=”android light”>
  3. Otherwise, set $.ui.useOSThemes=true;
  4. When device ready and document ready have been detected, add $.ui.launch();

How does page background color work in App Framework?

In App Framework the BODY is in the background and the page is in the foreground. If you set the background color on the body, you will see the page's background color. If you set the theme to default App Framework uses a native-like theme based on the device at runtime. Otherwise, it uses the App Framework Theme. This is normally done using the following:

<script>
$(document).ready(function(){ $.ui.useOSThemes = false; });</script>

Please see Customizing App Framework UI Skin for additional details.

What kind of templates can I use to create App Designer projects?

Currently, you can only create App Designer projects by selecting the blank 'HTML5+Cordova' template with app designer (select the app designer check box at the bottom of the template box) and the blank 'Standard HTML5' template with app designer. 

App Designer versions of the layout and user interface templates were removed in the Intel XDK 3088 version. 

My AJAX calls do not work on Android; I'm getting valid JSON data with an invalid return code.

The jQuery 1 library appears to be incompatible with the latest versions of the cordova-android framework. To fix this issue you can either upgrade your jQuery library to jQuery 2 or use a technique similar to that shown in the following test code fragment to check your AJAX return codes. See this forum thread for more details. 

The jQuery site only tests jQuery 2 against Cordova/PhoneGap apps (the Intel XDK builds Cordova apps). See the How to Use It section of this jQuery project blog > https://blog.jquery.com/2013/04/18/jquery-2-0-released/ for more information.

If you built your app using App Designer, it may still be using jQuery 1.x rather than jQuery 2.x, in which case you need to replace the version of jQuery in your project. Simply download and replace the existing copy of jQuery 1.x in your project with the equivalent copy of jQuery 2.x.

Note, in particular, the switch case that checks for zero and 200. This test fragment does not cover all possible AJAX return codes, but should help you if you wish to continue to use a jQuery 1 library as part of your Cordova application.

function jqueryAjaxTest() {

     /* button  #botRunAjax */
     $(document).on("click", "#botRunAjax", function (evt) {
         console.log("function started");
         var wpost = "e=132&c=abcdef&s=demoBASICA";
         $.ajax({
             type: "POST",
             crossDomain: true, //;paf; see http://stackoverflow.com/a/25109061/2914328
             url: "http://your.server.url/address",
             data: wpost,
             dataType: 'json',
             timeout: 10000
         })
         .always(function (retorno, textStatus, jqXHR) { //;paf; see http://stackoverflow.com/a/19498463/2914328
             console.log("jQuery version: " + $.fn.jquery) ;
             console.log("arg1:", retorno) ;
             console.log("arg2:", textStatus) ;
             console.log("arg3:", jqXHR) ;
             if( parseInt($.fn.jquery) === 1 ) {
                 switch (retorno.status) {
                    case 0:
                    case 200:
                        console.log("exit OK");
                        console.log(JSON.stringify(retorno.responseJSON));
                        break;
                    case 404:
                        console.log("exit by FAIL");
                        console.log(JSON.stringify(retorno.responseJSON));
                        break;
                    default:
                        console.log("default switch happened") ;
                        console.log(JSON.stringify(retorno.responseJSON));
                        break ;
                 }
             }
             if( (parseInt($.fn.jquery) === 2) && (textStatus === "success") ) {
                 switch (jqXHR.status) {
                    case 0:
                    case 200:
                        console.log("exit OK");
                        console.log(JSON.stringify(jqXHR.responseJSON));
                        break;
                    case 404:
                        console.log("exit by FAIL");
                        console.log(JSON.stringify(jqXHR.responseJSON));
                        break;
                    default:
                        console.log("default switch happened") ;
                        console.log(JSON.stringify(jqXHR.responseJSON));
                        break ;
                 }
             }
             else {
                console.log("unknown") ;
             }
         });
     });
 }

What do the data-uib and data-ver properties do in an App Designer project?

App Designer adds the data-uib and data-ver properties to many of the UI elements it creates. These property names only appear in the index.html file on various UI elements. There are other similar data properties, like data-sm, that only are required when you are using a service method.

The data-uib and data-ver properties are used only by App Designer. They are not needed by the UI frameworks supported by App Designer; they are used by App Designer to correctly display and apply widget properties when you are operating in the "design" view within App Designer. These properties are not critical to the functioning of your app; however, removing them will cause problems with the "design" view of App Designer.

The data-sm property is inserted by App Designer, and it may be used by data_support.js, along with other support libraries. The data-sm property is relevant to the proper functioning of your app.

Unable to select App Designer UI option when I create a new App Designer project.

If you previously created an App Designer project named 'ui-test' that you then delete and then create another App Designer project using the same name (e.g., 'ui-test'), you will not be given the option to select the UI framework for the new project named 'ui-test.' This is because the Intel XDK remembers a framework name for each project name that has been used and does not delete that entry from the global-settings.xdk file when you delete a project (e.g. if you chose "Framework 7" the first time you created an App Designer project with the name 'ui-test' then deleting 'ui-test' and creating a new 'ui-test' will result in another "Framework 7" project).

Because the UI framework name is not removed from the global-settings.xdk file when you delete the project, you must either use a new unique project name or edit the global-settings.xdk file to delete that old UI framework association. This is a bug that has been reported, but has not been fixed. Following is a workaround:

"FILE-/C/Users/xxx/Downloads/pkg/ui-test/www/index.html": {"canvas_width": 320,"canvas_height": 480,"framework": "framework 7"
}
  • Remove the last line ("framework": "framework 7") from the JSON object (remember to remove the comma at the end of the preceding line or you won't have a proper JSON file and your global-settings.xdk file will be considered corrupt).
  • Save and close the global-settings.xdk file.
  • Launch the Intel XDK.
  • Create a new project with old name you are reusing.

You should now see the list of App Designer framework UI selection options when you create the new project with a previously used project name that you have deleted.

Back to FAQs Main

Intel® XDK FAQs

$
0
0

Click the FAQ titles below to view specific FAQ pages.

General FAQs

Getting started as a new user; installing and updating the Intel XDK; questions related to the Brackets editor; differences between mobile platforms, etc.

Cordova FAQs

Using Cordova* APIs; adding and using third-party plugins; selecting plugins for your app; AdMob and in-app purchases; Intel App Security; image capture; camera, etc.

Crosswalk FAQs [RETIRED]

Using the Crosswalk* runtime with Android*; why are Crosswalk app packages so large; controlling audio playback rate; Crosswalk GPU support; Crosswalk options, etc.

Debug & Test FAQs

Enable testing via wifi with Intel App Preview; limitations of the Intel XDK simulator; debugging third-party plugins over USB, etc.

App Designer FAQs [DEPRECATED]

The App Designer layout editor; the App Framework library; creating and sizing UI elements; widget attributes; updating App Framework versions, etc.

IoT FAQs

Developing Internet of Things (IoT) NodeJS* apps using the Intel XDK; updating the MRAA library; connecting the Intel XDK to your IoT device; using the WebService API, etc.


Intel® XDK FAQs - Crosswalk [RETIRED]

$
0
0

[RETIRED] The Crosswalk Project has been retired!

IMPORTANT: on February, 2017, the Crosswalk Project was retired. Crosswalk 23 was the last version of the Crosswalk library produced by the Crosswalk team. You can continue to build with the Crosswalk library using Cordova CLI or PhoneGap Build, but no further updates to the Crosswalk library will occur.

No bug fixes will be implemented for Crosswalk components.

You can continue to use Crosswalk in your project, but there will be no new releases of the Crosswalk library and the Intel XDK will not add any new versions of Crosswalk to the build settings. If you are deploying your app to Android 5 or greater there is no reason to use the Crosswalk library, since those versions of Android include an upgradeable native Chromium webview that is up-to-date and is as capable and as performant as the Crosswalk webview. If you are still deploying to Android 4.x devices you may want to continue to use Crosswalk for those devices. Unlike the native webview in Android 5+ devices, the native webview in Android 4.x devices cannot be upgraded and is quite limited.

 

How do I play audio with different playback rates?

Here is a code snippet that allows you to specify playback rate:

var myAudio = new Audio('/path/to/audio.mp3');
myAudio.play();
myAudio.playbackRate = 1.5;

Why are Intel XDK Android Crosswalk build files so large?

When your app is built with Crosswalk it will be a minimum of 15-18MB in size because it includes a complete web browser (the Crosswalk runtime or webview) for rendering your app instead of the built-in webview on the device. Despite the additional size, this is the preferred solution for Android, because the built-in webviews on the majority of Android devices are inconsistent and poorly performing.

See these articles for more information:

Why is the size of my installed app much larger than the apk for a Crosswalk application?

This is because the apk is a compressed image, so when installed it occupies more space due to being decompressed. Also, when your Crosswalk app starts running on your device it will create some data files for caching purposes which will increase the installed size of the application.

Why does my Android Crosswalk build fail with the com.google.playservices plugin?

The Intel XDK Crosswalk build system used with CLI 4.1.2 Crosswalk builds does not support the library project format that was introduced in the "com.google.playservices@21.0.0" plugin. Use "com.google.playservices@19.0.0" instead.

Why does my app fail to run on some devices?

There are some Android devices in which the GPU hardware/software subsystem does not work properly. This is typically due to poor design or improper validation by the manufacturer of that Android device. Your problem Android device probably falls under this category.

How do I stop "pull to refresh" from resetting and restarting my Crosswalk app?

See the code posted in this forum thread for a solution: /en-us/forums/topic/557191#comment-1827376.

An alternate solution is to add the following lines to your intelxdk.config.additions.xml file:

<!-- disable reset on vertical swipe down --><intelxdk:crosswalk xwalk-command-line="--disable-pull-to-refresh-effect" />

Which versions of Crosswalk are supported and why do you not support version X, Y or Z?

The specific versions of Crosswalk that are offered via the Intel XDK are based on what the Crosswalk project releases and the timing of those releases relative to Intel XDK build system updates. This is one of the reasons you do not see every version of Crosswalk supported by our Android-Crosswalk build system.

With the September, 2015 release of the Intel XDK, the method used to build embedded Android-Crosswalk versions changed to the "pluggable" webview Cordova build system. This new build system was implemented with the help of the Cordova project and became available with their release of the Android Cordova 4.0 framework (coincident with their Cordova CLI 5 release). With this change to the Android Cordova framework and the Cordova CLI build system, we can now more quickly adapt to new version releases of the Crosswalk project. Support for previous Crosswalk releases required updating a special build system that was forked from the Cordova Android project. This new "pluggable" webview build system means that the build system can now use the standard Cordova build system, because it now includes the Crosswalk library as a "pluggable" component.

The "old" method of building Android-Crosswalk APKs relied on a "forked" version of the Cordova Android framework, and is based on the Cordova Android 3.6.3 framework and is used when you select CLI 4.1.2 in the Project tab's build settings page. Only Crosswalk versions 7, 10, 11, 12 and 14 are supported by the Intel XDK when using this build setting.

Selecting CLI 5.1.1 in the build settings will generate a "pluggable" webview built app. A "pluggable" webview app (built with CLI 5.1.1) results in an app built with the Cordova Android 4.1.0 framework. As of the latest update to this FAQ, the CLI 5.1.1 build system supported Crosswalk 15. Future releases of the Intel XDK and the build system will support higher versions of Crosswalk and the Cordova Android framework.

In both cases, above, the net result (when performing an "embedded" build) will be two processor architecture-specific APKs: one for use on an x86 device and one for use on an ARM device. The version codes of those APKs are modified to insure that both can be uploaded to the Android store under the same app name, insuring that the appropriate APK is automatically delivered to the matching device (i.e., the x86 APK is delivered to Intel-based Android devices and the ARM APK is delivered to ARM-based Android devices).

For more information regarding Crosswalk and the Intel XDK, please review these documents:

How do I prevent my Crosswalk app from auto-completing passwords?

Use the Ionic Keyboard plugin and set the spellcheck attribute to false.

How can I improve the performance of my Construct2 game build with Crosswalk?

Beginning with the Intel XDK CLI 5.1.1 build system you must add the --ignore-gpu-blacklist option to your intelxdk.config.additions.xml file if you want the additional performance this option provides to blacklisted devices. See this forum post for additional details.

If you are a Construct2 game developer, please read this blog by another Construct2 game developer regarding how to properly configure your game for proper Crosswalk performance > How to build optimized Intel XDK Crosswalk app properly?<

Also, you can experiment with the CrosswalkAnimatable option in your intelxdk.config.additions.xml file (details regarding the CrosswalkAnimatable option are available in this Crosswalk Project wiki post: Android SurfaceView vs TextureView).

<!-- Controls configuration of Crosswalk-Android "SurfaceView" or "TextureView" --><!-- Default is SurfaceView if >= CW15 and TextureView if <= CW14 --><!-- Option can only be used with Intel XDK CLI5+ build systems --><!-- SurfaceView is preferred, TextureView should only be used in special cases --><!-- Enable Crosswalk-Android TextureView by setting this option to true --><preference name="CrosswalkAnimatable" value="false" />

See Chromium Command-Line Options for Crosswalk Builds with the Intel XDK for some additional tools that can be used to modify the Crosswalk's webview runtime parameters, especially the --ignore-gpu-blacklist option.

Why does the Google store refuse to publish my Crosswalk app?

For full details, please read Android and Crosswalk Cordova Version Code Issues. For a summary, read this FAQ.

There is a change to the version code handling by the Crosswalk and Android build systems based on Cordova CLI 5.0 and later. This change was implemented by the Apache Cordova project. This new version of Cordova CLI automatically modifies the android:versionCode when building for Crosswalk and Android. Because our CLI 5.1.1 build system is now more compatible with standard Cordova CLI, this change results in a discrepancy in the way your android:versionCode is handled when building for Crosswalk (15) or Android with CLI 5.1.1 when compared to building with CLI 4.1.2.

If you have never published an app to an Android store this change will have little or no impact on you. This change might affect attempts to side-load an app onto a device, in which case the simplest solution is to uninstall the previously side-loaded app before installing the new app.

Here's what Cordova CLI 5.1.1 (Cordova-Android 4.x) is doing with the android:versionCode number (which you specify in the App Version Code field within the Build Settings section of the Projects tab):

Cordova-Android 4.x (Intel XDK CLI 5.1.1 for Crosswalk or Android builds) does this:

  • multiplies your android:versionCode by 10

then, if you are doing a Crosswalk (15) build:

  • adds 2 to the android:versionCode for ARM builds
  • adds 4 to the android:versionCode for x86 builds

otherwise, if you are performing a standard Android build (non-Crosswalk):

  • adds 0 to the android:versionCode if the Minimum Android API is < 14
  • adds 8 to the android:versionCode if the Minimum Android API is 14-19
  • adds 9 to the android:versionCode if the Minimum Android API is > 19 (i.e., >= 20)

If you HAVE PUBLISHED a Crosswalk app to an Android store this change may impact your ability to publish a newer version of your app! In that case, if you are building for Crosswalk, add 6000 (six with three zeroes) to your existing App Version Code field in the Crosswalk Build Settings section of the Projects tab. If you have only published standard Android apps in the past and are still publishing only standard Android apps you should not have to make any changes to the App Version Code field in the Android Builds Settings section of the Projects tab.

The workaround described above only applies to Crosswalk CLI 5.1.1 and later builds!

When you build a Crosswalk app with CLI 4.1.2 (which uses Cordova-Android 3.6) you will get the old Intel XDK behavior where: 60000 and 20000 (six with four zeros and two with four zeroes) are added to the android:versionCode for Crosswalk builds and no change is made to the android:versionCode for standard Android builds.

NOTE:

  • Android API 14 corresponds to Android 4.0
  • Android API 19 corresponds to Android 4.4
  • Android API 20 corresponds to Android 5.0
  • CLI 5.1.1 (Cordova-Android 4.x) does not allow building for Android 2.x or Android 3.x

Why is my Crosswalk app generating errno 12 Out of memory errors on some devices?

If you are using the WebGL 2D canvas APIs and your app crashes on some devices because you added the --ignore-gpu-blacklist flag to your intelxdk.config.additions.xml file, you may need to also add the --disable-accelerated-2d-canvas flag. Using the --ignore-gpu-blacklist flag enables the use of the GPU in some problem devices, but can then result in problems with some GPUs that are not blacklisted. The --disable-accelerated-2d-canvas flag allows those non-blacklisted devices to operate properly in the presence of WebGL 2D canvas APIs and the --ignore-gpu-blacklist flag.

You likely have this problem if your app crashes after running a few seconds with the an error like the following:

<gsl_ldd_control:364>: ioctl fd 46 code 0xc00c092f (IOCTL_KGSL_GPMEM_ALLOC) failed: errno 12 Out of memory <ioctl_kgsl_sharedmem_alloc:1176>: ioctl_kgsl_sharedmem_alloc: FATAL ERROR : (null).

See Chromium Command-Line Options for Crosswalk Builds with the Intel XDK for additional info regarding the --ignore-gpu-blacklist flag and other Chromium option flags.

Construct2 Tutorial: How to use AdMob and IAP plugins with Crosswalk and the Intel XDK.

See this tutorial on the Scirra tutorials site > How to use AdMob and IAP official plugins on Android-Crosswalk/XDK < written by Construct2 developer Kyatric.

Also, see this blog written by a Construct2 game developer regarding how to build a Construct2 app using the Appodeal ad plugin with your Construct2 app and the Intel XDK > How to fix the build error with Intel XDK and Appodeal? <.

What is the correct "Target Android API" value that I should use when building for Crosswalk on Android?

The "Target Android API" value (aka android-targetSdkVersion), found in the Build Settings section of the Projects tab, is the version of Android that your app and the libraries associated with your app are tested against, it DOES NOT represent the maximum level of Android onto which you can install and run your app. When building a Crosswalk app you should set to this value to that value recommend by the Crosswalk project.

The recommended "Target Android API" levels for Crosswalk on Android apps are:

  • 18 for Crosswalk 1 thru Crosswalk 4
  • 19 for Crosswalk 5 thru Crosswalk 10
  • 21 for Crosswalk 11 thru Crosswalk 18

As of release 3088 of the Intel XDK, the recommended value for your android-targetSdkVersion is 21. In previous versions of the Intel XDK the recommended value was 19. If you have it set to a higher number (such as 23), we recommend that you change your setting to 21.

Can I build my app with a version of Crosswalk that is not listed in the Intel XDK Build Settings UI?

As of release 3088 of the Intel XDK, it is possible to build your Crosswalk for Android app using versions of the Crosswalk library that are not listed in the Project tab's Build Settings section. You can override the value that is selected in the Build Settings UI by adding a line to the intelxdk.config.additions.xml file.

NOTE: The process described below is for experts only! By using this process you are effectively disabling the Crosswalk version that is selected in the Build Settings UI and you are overriding the version of Crosswalk that will be used when you build a custom debug module with the Debug tab.

When building a Crosswalk for Android application, with CLI 5.x and higher, the Cordova Crosswalk Webview Plugin is used to facilitate adding the Crosswalk webview library to the build package (the APK). That plugin effectively "includes" the specified Crosswalk library when the app is built. The version of the Crosswalk library selected in the Build Settings UI is effected by a line in the Android build config file, similar to the following:

<intelxdk:crosswalk version="16"/>

The line above is added automatically to the intelxdk.config.android.xml file by the Intel XDK. If you attempt to change lines in the Android build config file they will be overwritten by the Intel XDK each time you use the Build tab (perform a build) or the Test tab. In order to modify (or override) this line in the Android config file you need to add a line to the intelxdk.config.additions.xml file.

The precise line you include in the intelxdk.config.additions.xml file depends on the version of the Crosswalk library you want to include. 

<!-- Set the Crosswalk embedded library to something other than those listed in the UI. --><!-- In practice use only one, multiple examples are shown for illustration. --><preference name="xwalkVersion" value="17+"/><preference name="xwalkVersion" value="14.43.343.24" /><preference name="xwalkVersion" value="org.xwalk:xwalk_core_library_beta:18+"/>

The first example line in the code snippet above asks the Intel XDK to build with the "last" or "latest" version of the Crosswalk 17 release library (the '+' character means "last available" for the specified version). The second example requests an explicit version of Crosswalk 14 when building the app (e.g., version 14.43.343.24). The third example shows how to request the "latest" version of Crosswalk 18 from the Crosswalk beta Maven repository.

NOTE: only one such "xwalkVersion" preference tag should be used. If you include more than one "xwalkVersion" only the last one specified in the intelxdk.config.additions.xml file will be used.

The specific versions of Crosswalk that you can use can be determined by reviewing the Crosswalk Maven repositories: one for released Crosswalk libraries and one for beta versions of the Crosswalk library.

Not all Crosswalk libraries are guaranteed to work with your built app, especially the beta versions of the Crosswalk library. There may be library dependencies on the specific version of the Cordova Crosswalk Webview Plugin or the Cordova-Android framework. If a library does not work, select a different version.

Detailed instructions on the preference tag being used here are available in the Crosswalk Webview Plugin README.md documentation.

If you are curious when a specific version of Chromium will be supported by Crosswalk, please see the Crosswalk Release Dates wiki published by the Crosswalk Project.

My Construct2 Crosswalk app flashes a white box or white band after the splash screen.

The white box or white bands you see between the ending of the splash screen and the beginning of your app appears to be due to some webview initialization. It also appears in non-Crosswalk apps on Android, but does not show up as white. The white band that does appear can cause an initial "100% image" to bounce up and down momentarily. This issue is not being caused by the splash screen plugin or the Intel XDK; it appears to be interference caused by the Cordova webview initialization.

The following solution appears to work, although there may be some situations that it does not help. As this problem is better understood more information will be provided in this FAQ.

Add the following lines to your intelxdk.config.additions.xml file:

<platform name="android"><!-- set Crosswalk default background color --><!-- see http://developer.android.com/reference/android/graphics/Color.html --><preference name="BackgroundColor" value="0x00000000" /></platform>

The value 0x00000000 configures the webview background color to be "transparent black," according to the Cordova documentation and the Crosswalk webview plugin code. You should be able to set that color to anything you want. However, this color appears to work the best.

You may also want to add the following to your intelxdk.config.additions.xml file:

<platform name="android"><!-- following requires the splash screen plugin --><!-- see https://github.com/apache/cordova-plugin-splashscreen for details --><preference name="SplashScreen" value="screen" /><preference name="AutoHideSplashScreen" value="false" /><!-- <preference name="SplashScreenDelay" value="30000" /> --><preference name="FadeSplashScreen" value="false"/><!-- <preference name="FadeSplashScreenDuration" value="3000"/> --><preference name="ShowSplashScreenSpinner" value="false"/><preference name="SplashMaintainAspectRatio" value="false" /><preference name="SplashShowOnlyFirstTime" value="false" /></platform>

Testing of this fix was done with Crosswalk 17 on an Android 4.4, Android 5.0 and an Android 6.0 device.

Back to FAQs Main

DPDK-on-the-Go – Profile DPDK Applications on Your Windows Laptop

$
0
0

Introduction

Figure 1This article gets you started with hands-on development, execution, and profiling of the Data Plane Development Kit (DPDK) application on your own laptop. This enhances portability as well as sharing and teaching developers, customers, and students in a scalable way.

About the Author

M Jay

M Jay has worked with the DPDK team since 2009. He joined Intel in 1991 and has been in various roles and divisions: 64-bit CPU front side bus architect, 64 bit HAL developer, among others, before he joined the DPDK team. M Jay holds 21 US patents, both individually and jointly, all issued while working at Intel. M Jay was awarded the Intel Achievement Award in 2016, Intel's highest honor based on innovation and results.

Background

Figure 2To run and profile DPDK on the Linux* platform, please refer to the article Profiling DPDK Code with Intel® VTune™ Amplifier. If you don’t want to install Linux on your laptop, follow the steps in this article to learn how to configure your Intel® architecture-based Windows* laptop to develop, run and get started profiling DPDK applications.

Intel® VTune™ Amplifier, a performance profiler, will run natively on the Windows* OS so that it can access all the hardware performance registers. Developing and running DPDK applications will be done on an Oracle VM VirtualBox*.

The instructions in this article were tested on an Intel® Xeon® processor-based desktop, server, and laptop. Here we will use a laptop with the Windows OS.

If you have an Apple* laptop, the appendix provides information about systems based on the Mac OS*.

DPDK Application Building and Profiling – Components

  • Intel® Atom™ or Intel Xeon processor-based system
  • DPDK Applications
  • Oracle VM VirtualBox 
  • Intel VTune  Amplifier profiler

Figure 3

The platform can be any Intel® processor-based platform:  desktop, server, laptop, or embedded system.

This article covers the following steps:

  • Install and configure the Oracle VM VirtualBox.
  • Import, build, and run DPDK applications.
  • Install Intel VTune Amplifier and get started profiling.

Install and Configure the Oracle VM VirtualBox*

  • Step 1: Make sure Intel® Virtualization Technology for Directed I/O (Intel® VT-d) Intel® Virtualization Technology for IA32, Intel® 64 and Intel® Architecture (Intel® VT-x) are enabled in the UEFI firmware/BIOS.
  • Step 2: Download two images: the VirtualBox image and the extension packs for the same version.
  • Step 3: Install VirtualBox.
  • Step 4: Install the extension packs.
  • Step 5: Verify that 64-bit guest virtualization is enabled.

Step 1: Make sure Intel VT-d and Intel VT-x are enabled in UEFI firmware/BIOS. 

This is needed to ensure 64-bit guests can be run; VT-d and VT-x need to be on.

Intel VT-d and Intel VT-x enabling will be under Advanced CPU settings or Advanced Chipset settings as mentioned below. First we need to get into safe mode and look into BIOS setting.

  1. Windows buttonPress the Windows button. You will see the following screen Startup Screen
  2. Press the Power Switch icon.  Power Button   You will see the following drop-down menu.
    Menu
  3. MenuTo get into BIOS, press SHIFT+RESTART.

    If you have a laptop installed with Windows* 8, go to safe mode (SHIFT+RESTART).

    You will see the following settings. Note that depending on your computer, you may see different options.
     

  4. To use advanced tools, choose Troubleshoot.
     

  5. If the following screen displays, choose Enable Safe Mode to access the screen for the BIOS change.

    Menu
    Once you have selected safe mode, you will be able to access additional options, as shown below.

      Menu ;

  6. Select UEFI Firmware Settings

    Note: In your system, it may be referred to as BIOS setting.

    Depending on your vendor and BIOS, you will be able to access the Advanced setting or Advance Chipset Control or Advanced CPU Control. What you need to do is verify whether Intel VT is enabled. In certain BIOS models, it may display as VT-d and VT-x.

    Some systems will have both a CPU section (for Intel VT-x) and a chipset section (for Intel VT-d) so you may have to look at both sections to enable virtualization.

    Below are two screens: the CPU screen followed by the chipset screen. In this system, only the chipset screen has virtualization control.

    CPU screen

    Chipset screen

  7. Save and then exit.

    Now the OS and applications come up.

    Step 2: Download two images: the VirtualBox Image and the extension packs for the same version 

    To access the downloads, go to https://www.virtualbox.org/wiki/Downloads

    For Windows:

    1. Select VirtualBox5.1.8 (or the latest) for Windows hosts.
    2. Matching version number Extension Packs.

    For OS X*:

    1. Select VirtualBox5.1.8 (or the latest) for OS X hosts. Virtual Box

    2. Click and download the matching version extension. 

     

    Why install extension packs? What functionality do they provide?

    Extension packs complement the functionality of VirtualBox.Virtual Box

    1. Verify that both images downloaded in your system successfully. 

    Verify

Step 3. Install VirtualBox – Run As Administrator 

  1. To start the install, right-click VirtualBox, and then select Run as administrator.

     
    Virtual Box    
  2. Continue through the following screens. Virtual Box Setup Wizard
  3. For the screen above, press the left-arrow “<” to select INSTALL.

     Virtual Box Installation

Step 4: Install the Extension Packs 

  1. Click File, and then click Preferences. 

    Install the Extension Packs - Preferences

  2. Click Extensions.
  3. To select the extension packs download, browse, and then click the folder icon on the right, as shown by the arrow below.  
    : Install the Extension Packs Preferences 2
  4. Select the extension packs to install. 
    Install the Extension Packs - Preferences 


    You will see the following success message:    Message

Step 5: Verify that 64-bit guest virtualization is enabled. 

  1. Select New (shown by the arrow below) and choose OS Type “Linux”.
  2. Verify in the “Version” sub-menu whether 64-bit Ubuntu* is selectable. If so, the virtualization steps to enable Intel VT-d and Intel VT-x were successful.

Now you are ready to import the VMs. Import the VMs

Note: If you don’t see 64-bit versions and see only 32-bit version, you’ll need to enable Intel VT-d and Intel VT-x correctly. Return to the BIOS setting steps under “Step 1: “Make sure Intel VT-d and Intel VT-x are enabled in UEFI firmware/BIOS.”

Import, Build, and Run DPDK Applications

In this article, we assume that you have plugged in a thumb drive with a copy of an exported DPDK application virtual machine that was built on a native Linux platform running DPDK. When you have connected the thumb drive, follow these instructions to import the VM.

  1. Click File, and then click Import ApplianceImport, Build, and Run DPDK Applications
  2. Click the folder on the right (as shown by the arrow below) to select the VM to import.
  3. Select the VM.
    Example, as shown below: Ubuntu Nov 7 VTune DPDK.
     Select the VM Dialog

    The following screenshot shows verification of the virtual appliance getting imported.

     Verification of import
  4. Select Import.
    You will see the appliance being imported as shown below.

     Import appliance

    You have successfully imported the DPDK virtual appliance, as shown by the arrow in the screenshot below. Import appliance

    You have successfully launched DPDK running in the Ubuntu guest OS with VirtualBox on your laptop as shown below.

  5. Select the Imported DPDK appliance.

  6. To start the imported DPDK appliance, click StartOracle VM Manager

    You have successfully launched DPDK running in the Ubuntu guest OS with VirtualBox on your laptop as shown below

    DPDK running in the Ubuntu guest OS

Now you can start your own development by developing applications, building, and running. To get started, locate the README_FIRST file, as shown in the above screenshot. Click open and you’ll find instructions to run DPDK microbenchmarks and other applications. 

Quick Profile View using Windows Task Manager

Let’s say you want to know where cycles are being spent in the system. You can use Task Manager to have get a bird’s-eye view first. Then you can dig into Intel VTune Amplifier.

The screenshot below shows the CPU cycles and tasks running, with Windows Task Manager showing CPU utilization running the DPDK application as a guest with VirtualBox. 

Intel VTune Amplifier

Install Intel® VTune™ Amplifier

  1. Go to https://software.intel.com/en-us/intel-VTune-amplifier-xe
  2. Under Get Free Downloads & Trials, choose Windows*, and then click Download FREE Trial. Install and Profile Intel VTune Amplifier
  3. On the form, enter your email information to which the download link will be sent.
  4. Check your email inbox for an email titled “Thank you for evaluating Intel® VTune™ Amplifier XE for Windows*. (Note: Search in your e-mail’s trash folder in case you don’t see it)
  5. Click Download. Install and Profile Intel VTune Amplifier

    The Activation Acknowledgement page displays with the Download Now button, as shown below. 

    Install and Profile Intel VTune Amplifier
  6. Click Download Now.
  7. Print the “What’s New?” document

    Once the download has completed, as shown below you will have the VTune _Amplifier_XE_2017_update1_setup.exe image 
    Install and Profile Intel VTune Amplifier
  8. Double-click the extracted setup image. You will see the following confirmation. Confirmation
  9. On the welcome page, click NextInstall and Profile Intel VTune Amplifier
  10. In the following screen, select Evaluate this product (no serial number required). 
    Install and Profile Intel VTune Amplifier
  11. Click through the successive screen to complete the install. Install and Profile Intel VTune Amplifier
  12. On the following screen, leave “How do you want to open this file?” as the default and click OK. Install and Profile Intel VTune Amplifier
  13. Press the Windows button. You will see the installed Intel VTune Amplifier title as shown below. Install and Profile Intel VTune Amplifier

The next step is to open a terminal as an administrator. It is important to access Intel VTune Amplifier as an administrator.

  1. In the Search Windows box, type: cmd (shown below as the first step).
  2. Right-click Command Prompt (shown below as the second step).
  3. On the scroll-down menu, choose Run as administrator. Install and Profile Intel VTune Amplifier
  4. Verify that the terminal opened is titled Administrator. 
    Install and Profile Intel VTune Amplifier
    The next step is to verify that the system is ready and the install was successful.
  1. C:cd Program Files (x86)\IntelSWTools\VTune Amplifier XE 2017\bin32
  2. To verify whether your system meets the needs of hardware-event-based sampling, type: amplxe-sepreg.exe –c

    The following message screen should display. Terminal

    The screen above indicates that you have successfully verified the correct dependency checks required to install the sampling driver:

    • Platform, architecture, and OS environment
    • Availability of the sampling driver binaries: sepdrv4_0.syssep3drv.sys and sepdal.sys
    • Administrative privileges 
    • 32/64-bit installation 
  3. To check whether the sampling driver is loaded, type: amplxe-sepreg.exe –s 

    The following message screen should display. Install and Profile Intel VTune Amplifier

    The screen above indicates that the sampling driver loaded successfully.

    NOTE: If the sampling driver did NOT successfully load, refer to Appendix 3. Do NOT enter the command in Appendix 3 if you see the above success message.

    What’s next?

    The default installation path for the Intel VTune™ Amplifier XE is
    [Program Files (x86)]\IntelSWTools\VTune™ Amplifier XE

  4. cd \Progam Files (x86)\IntelSWTools\VTune™ Amplifier XE 2017 Install and Profile Intel VTune Amplifier
  5. amplxe-vars.bat         - run the batch file as shown below Install and Profile Intel VTune Amplifier

    You have set the needed environment variables successfully. You will get output as shown below Install and Profile Intel VTune Amplifier

    The final step is to run Intel VTune Amplifier.

  6. amplxe-gui            - run VTune; the GUI version is shown below Install and Profile Intel VTune Amplifier

You will see the welcome screen as shown below.

Be sure to print the items circles below: Getting Started and Discover Performance Snapshots.

Start practicing by clicking New Project (also circled)

To get hands-on practice, please refer to the sections after “Starting Intel VTune Amplifier” in the following article: Profiling DPDK Code with Intel® VTune™ Amplifier

Also refer to the resources given in the reference section of the above article for videos and articles. Welcome Screen

Next Steps

With the above hands-on exercise, you have successfully completed your “DPDK-On-The-Go” hands-on exercise.

As your first step, please register on the DPDK mailing list http://www.dpdk.org/ml/listinfo/dev

Also, we encourage you to play an active role in our meetups and DPDK community: www.dpdk.org

Please provide your feedback on this article to Muthurajan.Jayakumar@intel.com within 2 weeks after you go through your hands-on experience.

Exercises for the Readers

  1. How do you virtualize network devices in your host to present to VirtualBox appliances?
  2. How do you share your host file system with VirtualBox appliances?
  3. When you configure for sharing, what happens when you export the Virtual Appliance?
  4. What parameters in Intel VTune Amplifier do you look for in case you feel your application is compute bound?
  5. Same question as above, with your application being a) memory bound, b) I/O bound?

Appendix 1: How to Enable Intel® Virtualization Technology in a Mac* Computer

This article’s instructions were tested on a laptop with the Windows OS. Here are some references for the Mac regarding enabling Intel VT.

http://kb.parallels.com/en/5653

https://support.apple.com/en-us/HT203296 How to enable v-tune

Virtualization Technology

Appendix 2 – Potential Items to Watch Out for

  • Event-based sampling analysis: To install the drivers on Windows 7 and Windows* Server* 2008 R2 operating systems, you must enable the SHA-2 code signing support for these systems by applying Microsoft Security update 3033929: https://technet.microsoft.com/en-us/library/security/3033929. If the security update is not installed, event-based sampling analysis types will not work properly on your system.

Appendix 3 – In case the sampling driver is not installed

If the sampling driver is not installed but the system is supported by Intel VTune Amplifier, execute the following command with the administrative privileges to install the driver:

amplxe-sepreg.exe –I

Appendix 4 – Intel VTune Amplifier for Mac Computers

While Intel VTune Amplifier 2017 can run on Windows and Linux systems, the profiled results can be seen on OS X.

So you can run the DPDK applications with VirtualBox in Mac computers. For profiling you can use native tools that come with OS X.

And you can use the Viewer given below to view the output Intel VTune Amplifier generated on Windows or Linux machines.

Please refer to the article How to Download and Evaluate the VTune™ Amplifier OS X* Viewer

References

DPDK

DPDK-in-a-Box uses Minnowboard Turbot Single Board Computer.

Profiling DPDK Code with Intel® VTune™ Amplifier

Video: Intel® VTune™ and Performance Optimizations 

DPDK Performance Optimization Guidelines White Paper 

Article "Styles" In Action

$
0
0

The intro paragraph is a wonderful way to highlight the start of your article. This extra large copy helps you focus the developer on your main purpose. Simply select your copy and open the "Styles" dropdown in the WYSIWYG editor. Choose "Intro" and your copy should transform automatically.

This is the default for any copy you enter into the editor. Vestibulum id ligula porta felis euismod semper. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Maecenas sed diam eget risus varius blandit sit amet non magna. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.

Did You Want to Highlight Some Content?
Go to the "styles" dropdown and select the "outline" style.  You'll get a simple grey outline around your copy. A few things to note: bullets, header tags, and multiple paragraphs will create a strange multiple box experience and shouldn't be used with this style. Please use a soft return "Shift + Return" to create a break. Use "bold" styling to emulate a header.

Can you use images in this space? Yes, but... If an image is longer than the copy you have inside this box, the image will appear to "break out" of the box. Please ensure you have enough copy, or a short enough image.  You will only get the option to adjust the styling of your image when you import it, so, ensure you select your alignment, sizing and position wisely.

We have a number of versions of this highlight box style that are all found in the styles menu. This one is called "Note" and has a yellow highlight band that appears on the right. This one is great for alerts or information you want to draw your developer to. When you hit a hard enter, you'll notice that you've created another box. Simply type in something, select that copy and click the remove styles button above. remove text style

Another one is called "Grey Highlight". You can change the alignment of the text (left, center, right) for all three of these styles. You can't restrict the width of this style to a portion of the page and align right. It is required that this style be full content width. Don't use in a table. 

When adding your code snippet, make sure to select "Code Simple" from the styles dropdown to ensure for the best viewing experience.document.getElementById("demo").style.fontSize = "25px";document.getElementById('demo').style.fontSize = '25px';

Getting Started with Intel® Cluster Checker for Linux*

$
0
0

Intel® Cluster Checker verifies the configuration and performance of Linux based clusters and checks compliance with the Intel® Scalable System Framework architecture specification. If issues are found, Intel® Cluster Checker diagnoses the problems and may provide recommendations on how to repair the cluster.

Intel® Cluster Checker has the following features:

  • Dynamic detection of cluster configuration, operation, and performance issues.
  • Problem diagnoses with severity and confidence levels.
  • On-demand data collection.

Intel® Cluster Checker is installed as part of the following suites:

The following flowchart represents the usage model for working with the Intel® Cluster Checker.

Prerequisites

  1. Install Intel® Cluster Checker using the bundled installer.
  2. We recommend running the tool as a non-root user. Before using Intel® Cluster Checker for the first time, the runtime environment must be setup. Two files are included to setup the runtime environment, clckvars.sh for shells with Bourne syntax and clckvars.csh for shells with C-shell syntax. Source the appropriate file from the command line. For example:

    source /opt/intel/clck/2018.0/bin/clckvars.sh
     
  3. Create a text file that lists the compute nodes in the cluster using one hostname per line. In these examples, this file is named "nodefile". Here is an example for one head node and four compute nodes:

    frontend #role: head
    node1
    node2
    node3
    node4


    For detailed system requirements, see the "System Requirements" section in the Intel® Cluster Checker Release Notes

Step 1: Collect data

Run the following from a command line. nodefile should be in a shared & writeable location.

clck-collect -a -f nodefile

Step 2: Analyze the data

Run this from a command line:

clck-analyze -f nodefile

Resolve any issues reported in step 2 and repeat steps 1 and 2 until you are satisfied with the results.

By default, diagnosed signs are not included in the analyzer output. If the analyzer reports issues, then it will be beneficial to output diagnosed signs on subsequent runs. More data about signs and diagnoses can be found in the User's Guide. Run this from a command line to print diagnosed signs:

clck-analyze -f nodefile -p diagnosed_signs

There will be occasions where modifications of the default XML configuration file are needed. This can happen when more output is desired, test parameters need to be modified, the log level must be changed, etc. More information can be found in the User's Guide.

Troubleshooting/FAQ

Files will be installed into /opt/intel/clck/2018.0.

  • For help with the collector, run:

    clck-collect --help
     
  • For help with the analyzer, run:

    clck-analyze --help
     
  • To view collected data, use the database query tool.
    For help with the query tool, run:

    clckdb --help
     
  • To customize the analysis behavior:
    Make a copy of the default XML file.

    cp /opt/intel/clck/2018.0/etc/clck.xml ~

    Edit the XML file options.

    To use a custom XML file with the analyzer, run the following (if the custom XML file is named "~/clck.xml"):

    clck-analyze -f nodefile -c ~/clck.xml

Documentation and Resources

All of the following documents can be found at https://software.intel.com/en-us/intel-cluster-checker-support/documentation:

DocumentDescription
Intel® Cluster Checker Developer’s GuideContains a breakdown of the following components: the knowledge base, the connector, and the database schema.
Intel® Cluster Checker User's GuideContains a description of the product, including the following components and processes: the analyzer, knowledge base, connector, data collection, data providers, and the database schema.
Intel® Cluster Checker Release notesContains a brief overview of the product, new features, system requirements, installation notes, documentation, known limitations, technical support, and the disclaimer and legal information.

Performance of Classic Matrix Multiplication Algorithm on Intel® Xeon Phi™ Processor System

$
0
0

Contents

Introduction
An Overview of the Classic Matrix Multiplication Algorithm
Total Number of Floating Point Operations
Implementation Complexity
Optimization Techniques
Memory Allocation Schemes
Loop Processing Schemes
Compute Schemes
Error Analysis
Performance on Intel® Xeon Phi™ Processor System
OpenMP* Product Thread Affinity Control
Recommended Intel® C++ Compiler Command-Line Options
Conclusion
References
Downloads
Abbreviations
Appendix A - Technical Specifications of Intel Xeon Phi Processor System
Appendix B - Comparison of Processing Times for MMAs vs. MTA
Appendix C - Error Analysis (Absolute Errors for SP FP Data Type)
Appendix D - Performance of MMAs for Different MASs
About the Author

Introduction

Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language, and its performance significantly improves when different optimization techniques are applied.

Several versions of the classic matrix multiplication algorithm (CMMA) to compute a product of square dense matrices are evaluated in four test programs. Performance of these CMMAs is compared to a highly optimized 'cblas_sgemm' function of the Intel® Math Kernel Library (Intel® MKL)7. Tests are completed on a computer system with Intel® Xeon Phi™ processor 72105 running the Linux Red Hat* operating system in 'All2All' Cluster mode and for 'Flat', 'Hybrid 50-50', and 'Cache' MCDRAM modes.

All versions of CMMAs for single and double precision floating point data types described in the article are implemented in the C programming language and compiled with Intel® C++ Compiler versions 17 and 16 for Linux*6.

The article targets experienced C/C++ software engineers and can be considered as a reference on application optimization techniques, analysis of performance, and accuracy of computations related to MMAs.

If needed, the reader may review the contents of References1 or2 for a description of mathematical fundamentals of MM, because theoretical topics related to MM are not covered in this article.

An Overview of the Classic Matrix Multiplication Algorithm

A fundamental property of any algorithm is its asymptotic complexity (AC)3.

In generic form, AC for MMA can be expressed as follows:

MMA AC = O(N^Omega)

where O stands for operation on a data element, also known in computer science as a Big O; N is one dimension of the matrix, and omega is a matrix exponent which equals 3.0 for CMMA. That is:

CMMA AC = O(N^3)

In order to compute a product of two square matrices using CMMA, a cubic number of floating point (FP) multiplication operations is required. In other words, the CMMA runs in O(N^3) time.

An omega lower than 3.0 is possible, and it means that an MMA computes a product of two matrices faster because an optimization technique, mathematical or programming, is applied and fewer FP multiplication operations are required to compute the product.

A list of several MMAs with different values of omega is as follows:

AlgorithmOmegaNote
Francois Le Gall2.3728639(1)
Virginia Vassilevska Williams2.3728642 
Stothers2.3740000 
Coppersmith-Winograd2.3760000 
Bini2.7790000 
Pan2.7950000 
Strassen2.8070000(2)
Strassen-Winograd2.8070000 
Classic3.0000000(3)

Table 1. Algorithms are sorted by omega in ascending order.

Total Number of Floating Point Operations

Let's assume that:

M x N is a dimension of a matrix A, or A[M,N]
N x P is a dimension of a matrix B, or B[N,P]
M x P is a dimension of a matrix C, or C[M,P]

There are three relations between M, N and P:

Relation #1: A[...,N] = B[N,...]
Relation #2: A[M,...] = C[M,...]
Relation #3: B[...,P] = C[...,P]

If one of these three relations is not met, the product of two matrices cannot be computed.

In this article only square matrices of dimension N, where M = N = P, will be considered. Therefore:

A[N,N] is the same as A[M,N]
B[N,N] is the same as B[N,P]
C[N,N] is the same as C[M,P]

The following table shows how many multiplications are needed to compute a product of two square matrices of different Ns for three algorithms from Table 1 with omega = 2.3728639 (1), omega = 2.807 (2) and omega = 3.0 (3).

NOmega = 2.3728639 (1)Omega = 2.807 (2)Omega = 3.0 (3)
128100,028822,1262,097,152
256518,1145,753,46616,777,216
5122,683,66840,264,358134,217,728
102413,900,553281,781,1761,073,741,824
204872,000,4651,971,983,0428,589,934,592
4096372,939,61113,800,485,78068,719,476,736
81921,931,709,09196,579,637,673549,755,813,888
1638410,005,641,390675,891,165,0934,398,046,511,104
3276851,826,053,9654,730,074,351,66235,184,372,088,832
65536268,442,548,03433,102,375,837,652281,474,976,710,656

Table 2.

For example, to compute a product of two square dense matrices of dimension N equal to 32,768, Francois Le Gall (1) MMA needs ~51,826,053,965 multiplications and Classic (3) MMA needs ~35,184,372,088,832 multiplications.

Imagine the case of the product of two square matrices where N equals 32,768 needs to be computed on a very slow computer system. It means that if the Francois Le Gall MMA completes the processing in one day, then the classic MMA will need ~679 days on the same computer system, or almost two years. This is because the Francois Le Gall MMA needs ~679x fewer multiplications to compute a product:

~35,184,372,088,832 / ~51,826,053,965 = ~678.9

In the case of using a famous Strassen (2) MMA, ~91 days would be needed:

~4,730,074,351,662 / ~51,826,053,965 = ~91.3

In many software benchmarks the performance of an algorithm, or some processing, is measured in floating point operations per second (FLOPS), and not in elapsed time intervals, like days, hours, minutes, or seconds. That is why it is very important to know an exact total number (TN) of FP operations completed to calculate a FLOPS value.

With modern C++ compilers, it is very difficult to estimate an exact TN of FP operations per unit of time completed at run time due to extensive optimizations of generated binary codes. It means that an analysis of binary codes could be required, and this is outside of the scope of this article.

However, an estimate value of the TN of FP operations, multiplications and additions, for CMMA when square matrices are used can be easily calculated. Here are two simple examples:

Example 1: N = 2

	Multiplications	= 8				// 2 * 2 * 2 = 2^3
	Additions	= 4				// 2 * 2 * 1 = 2^2*(2-1)
	TN FP Ops	= 8 + 4 = 12

Example 2: N = 3

	Multiplications	= 27				// 3 * 3 * 3 = 3^3
	Additions	= 18				// 3 * 3 * 2 = 3^2*(3-1)
	TN FP Ops	= 27 + 18 = 45

It is apparent that the TN of FP operations to compute a product of two square matrices can be calculated using a simple formula:

TN FP Ops = (N^3) + ((N^2) * (N-1))

Note: Take into account that in the versions of the MMA used for sparse matrices, no FP operations are performed if the matrix element at position (i,j) is equal to zero.

Implementation Complexity

In the C programming language only four lines of code are needed to implement a core part of the CMMA:

for( i = 0; i < N; i += 1 )
		for( j = 0; j < N; j += 1 )
			for( k = 0; k < N; k += 1 )
				C[i][j] += A[i][k] * B[k][j];

Therefore, CMMA's implementation complexity (IC) could be rated as very simple.

Declarations of all intermediate variables, memory allocations, and initialization of matrices are usually not taken into account.

More complex versions of MMA, like Strassen or Strassen-Winograd, could have several thousands of code lines.

Optimization Techniques

In computer programming, matrices could be represented in memory as 1-D or 2-D data structures.

Here is a static declaration of matrices A, B, and C as 1-D data structures of a single precision (SP) FP data type (float):

	float fA[N*N];
	float fB[N*N];
	float fC[N*N];

and this is what a core part of the CMMA looks like:

	for( i = 0; i < N; i += 1 )
		for( j = 0; j < N; j += 1 )
			for( k = 0; k < N; k += 1 )
				C[N*i+j] += A[N*i+k] * B[N*k+j];

Here is a static declaration of matrices A, B, and C as 2-D data structures of a single precision (SP) FP data type (float):

	float fA[N][N];
	float fB[N][N];
	float fC[N][N];

and this is what the core part of CMMA looks like:

	for( i = 0; i < N; i += 1 )
		for( j = 0; j < N; j += 1 )
			for( k = 0; k < N; k += 1 )
				C[i][j] += A[i][k] * B[k][j];

Many other variants of the core part of CMMA are possible and they will be reviewed.

Memory Allocation Schemes

In the previous section of this article, two examples of a static declaration of matrices A, B, and C were given. In the case of dynamic allocation of memory for matrices, explicit calls to memory allocation functions need to be made. In this case, declarations and allocations of memory can look like the following:

Declaration of matrices A, B, and C as 1-D data structures:

	__attribute__( ( aligned( 64 ) ) ) float *fA;
	__attribute__( ( aligned( 64 ) ) ) float *fB;
	__attribute__( ( aligned( 64 ) ) ) float *fC;

and this is how memory needs to be allocated:

	fA = ( float * )_mm_malloc( N * sizeof( float ), 64 );
	fB = ( float * )_mm_malloc( N * sizeof( float ), 64 );
	fC = ( float * )_mm_malloc( N * sizeof( float ), 64 );

Note: Allocated memory blocks are 64-byte aligned, contiguous, and not fragmented by an operating system memory manager; this improves performance of processing.

Declaration of matrices A, B, and C as 2-D data structures:

	__attribute__( ( aligned( 64 ) ) ) float **fA;
	__attribute__( ( aligned( 64 ) ) ) float **fB;
	__attribute__( ( aligned( 64 ) ) ) float **fC;

and this is how memory needs to be allocated:

	fA = ( float ** )calloc( N, sizeof( float * ) );
	fB = ( float ** )calloc( N, sizeof( float * ) );
	fC = ( float ** )calloc( N, sizeof( float * ) );
	for( i = 0; i < N; i += 1 )
	{
		fA[i] = ( float * )calloc( N, sizeof( float ) );
		fB[i] = ( float * )calloc( N, sizeof( float ) );
		fC[i] = ( float * )calloc( N, sizeof( float ) );
	}

Note: Allocated memory blocks are not contiguous and can be fragmented by an operating system memory manager, and fragmentation can degrade performance of processing.

In the previous examples, a DDR4-type RAM memory was allocated for matrices. However, on an Intel Xeon Phi processor system5 a multichannel DRAM (MCDRAM)-type RAM memory could be allocated as well, using functions from a memkind library11 when MCDRAM mode is configured to 'Flat' or 'Hybrid'. For example, this is how an MCDRAM-type RAM memory can be allocated:

	fA = ( float * )hbw_malloc( N * sizeof( float ) );
	fB = ( float * )hbw_malloc( N * sizeof( float ) );
	fC = ( float * )hbw_malloc( N * sizeof( float ) );

Note: An 'hbw_malloc' function of the memkind library was used instead of an '_mm_malloc' function.

On an Intel Xeon Phi processor system, eight variants of memory allocation for matrices A, B, and C are possible:

Matrix AMatrix BMatrix CNote
DDR4DDR4DDR4(1)
DDR4DDR4MCDRAM(2)
DDR4MCDRAMDDR4 
DDR4MCDRAMMCDRAM 
MCDRAMDDR4DDR4 
MCDRAMDDR4MCDRAM 
MCDRAMMCDRAMDDR4 
MCDRAMMCDRAMMCDRAM 

Table 3.

It is recommended to use MCDRAM memory as much as possible because its bandwidth is ~400 GB/s, and it is ~5 times faster than the ~80 GB/s bandwidth of DDR4 memory5.

Here is an example of how 'cblas_sgemm' MMA performs for two memory allocation schemes (MASs) (1) and (2):

	Matrix multiplication C=A*B where matrix A (32768x32768) and matrix B (32768x32768)
	Allocating memory for matrices A, B, C: MAS=DDR4:DDR4:DDR4
	Initializing matrix data
	Matrix multiplication started
	Matrix multiplication completed at 50.918 seconds
	Allocating memory for matrices A, B, C: MAS=DDR4:DDR4:MCDRAM
	Initializing matrix data
	Matrix multiplication started
	Matrix multiplication completed at 47.385 seconds

It is clear that there is a performance improvement of ~7 percent when an MCDRAM memory was allocated for matrix C.

Loop Processing Schemes

A loop processing scheme (LPS) describes what optimization techniques are applied to the 'for' statements of the C language of the core part of CMMA. For example, the following code:

	for( i = 0; i < N; i += 1 )						// loop 1
		for( j = 0; j < N; j += 1 )					// loop 2
			for( k = 0; k < N; k += 1 )				// loop 3
				C[i][j] += A[i][k] * B[k][j];

corresponds to an LPS=1:1:1, and it means that loop counters are incremented by 1.

Table 4 below includes short descriptions of different LPSs:

LPSNote
1:1:1Loops not unrolled
1:1:23rd loop unrolls to 2-in-1 computations
1:1:43rd loop unrolls to 4-in-1 computations
1:1:83rd loop unrolls to 8-in-1 computations
1:2:12nd loop unrolls to 2-in-1 computations
1:4:12nd loop unrolls to 4-in-1 computations
1:8:12nd loop unrolls to 8-in-1 computations

Table 4.

For example, the following code corresponds to an LPS=1:1:2, and it means that counters 'i' and 'j' for loops 1 and 2 are incremented by 1, and counter 'k' for loop 3 is incremented by 2:

	for( i = 0; i < N; i += 1 )						// :1
	{
		for( j = 0; j < N; j += 1 )					// :1
		{
			for( k = 0; k < N; k += 2 )				// :2 (unrolled loop)
			{
				C[i][j] += A[i][k  ] * B[k   ][j];
				C[i][j] += A[i][k+1] * B[k+1][j];
			}
		}
	}

Note: A C++ compiler could unroll loops as well if command line-options for unrolling are used. A software engineer should prevent such cases when source code unrolling is used at the same time, because it prevents vectorization of inner loops, and degrades performance of processing.

Another optimization technique is the loop interchange optimization technique (LIOT). When the LIOT is used, a core part of CMMA looks as follows:

	for( i = 0; i < N; i += 1 )						// loop 1
		for( k = 0; k < N; k += 1 )					// loop 2
			for( j = 0; j < N; j += 1 )				// loop 3
				C[i][j] += A[i][k] * B[k][j];

It is worth noting that counters 'j' and 'k' for loops 2 and 3 were exchanged.

The loops unrolling and LIOT allow for improved performance of processing because elements of matrices A and B are accessed more efficiently.

Compute Schemes

A compute scheme (CS) describes the computation of final or intermediate values and how elements of matrices are accessed.

In a CMMA an element (i,j) of the matrix C is calculated as follows:

	C[i][j] += A[i][k] * B[k][j]

and its CS is ij:ik:kj.

However, elements of matrix B are accessed in a very inefficient way. That is, the next element of matrix B, which needs to be used in the calculation, is located at a distance of (N * sizeof (datatype)) bytes. For very small matrices this is not critical because they can fit into CPU caches. However, for larger matrices it affects performance of computations, which can be significantly degraded, due to cache misses.

In order to solve that problem and improve performance of computations, a very simple optimization technique is used. If matrix B is transposed, the next element that needs to be used in the calculation will be located at a distance of (sizeof (datatype)) bytes. Thus, access to the elements of matrix B will be similar to the access to the elements of matrix A.

In a transpose-based CMMA, an element (i,j) of the matrix C is calculated as follows:

	C[i][j] += A[i][k] * B[j][k]

and its CS is ij:ik:jk. Here B[j][k] is used instead of B[k][j].

It is very important to use the fastest possible algorithm for the matrix B transposition before processing is started. In Appendix B an example is given on how much time is needed to transpose a square matrix of 32,768 elements, and how much time is needed to compute the product on an Intel Xeon Phi processor system.

Another optimization technique is the loop blocking optimization technique (LBOT) and it allows the use of smaller subsets of A, B, and C matrices to compute the product. When the LBOT is used, a core part of CMMA looks as follows:

	for( i = 0; i < N; i += BlockSize )
	{
		for( j = 0; j < N; j += BlockSize )
		{
			for( k = 0; k < N; k += BlockSize )
			{
				for( ii = i; ii < ( i+BlockSize ); ii += 1 )
					for( jj = j; jj < ( j+BlockSize ); jj += 1 )
						for( kk = k; kk < ( k+BlockSize ); kk += 1 )
							C[ii][jj] += A[ii][kk] * B[kk][jj];
			}
		}
	}

Note: A detailed description of LBOT can be found at10.

Table 5 shows four examples of CSs:

CSNote
ij:ik:kjDefault
ij:ik:jkTransposed
iijj:iikk:kkjjDefault LBOT
iijj:iikk:jjkkTransposed LBOT

Table 5.

Error Analysis

In any version of MMA many FP operations need to be done in order to compute values of elements of matrix C. Since FP data types SP or DP have limited precision4, rounding errors accumulate very quickly. A common misconception is that rounding errors can occur only in cases where large or very large matrices need to be multiplied. This is not true because, in the case of floating point arithmetic (FPA), a rounding error is also a function of the range of an input value, and it is not a function of the size of input matrices.

However, a very simple optimization technique allows improvement in the accuracy of computations.

If matrices A and B are declared as an SP FP data type, then intermediate values could be stored in a variable of DP FP data type:

	for( i = 0; i < N; i += 1 )
	{
		for( j = 0; j < N; j += 1 )
		{
			double sum = 0.0;
			for( k = 0; k < N; k += 1 )
			{
				sum += ( double )( A[i][k] * B[k][j] );
			}
			C[i][j] = sum;
		}
	}

The accuracy of computations will be improved, but performance of processing can be lower.

An error analysis (EA) is completed using the mmatest4.c test program for different sizes of matrices of SP and DP FP data types (see Table 6 in Appendix C with results).

Performance on the Intel® Xeon Phi™ Processor System

Several versions of the CMMA to compute a product of square dense matrices are evaluated in four test programs. Performance of these CMMAs is compared to a highly optimized 'cblas_sgemm' function of the Intel MKL7. Also see Appendix D for more evaluations.

performance evaluation
Figure 1. Performance tests for matrix multiply algorithms on Intel® Xeon Phi™ processor using mmatest1.c with KMP_AFFINITY environment variable set to 'scatter', 'balanced', and 'compact'. A lower bar height means faster processing.

Here are the names of source files with a short description of tests:

mmatest1.c - Performance tests matrix multiply algorithms on an Intel Xeon Phi processor.
mmatest2.c - Performance tests matrix multiply algorithms on an Intel Xeon Phi processor in one MCDRAM mode ('Flat') for DDR4:DDR4:DDR4 and DDR4:DDR4:MCDRAM MASs.
mmatest3.c - Performance tests matrix multiply algorithms on an Intel Xeon Phi processor in three MCDRAM modes ('All2All', 'Flat', and 'Cache') for DDR4:DDR4:DDR4 and MCDRAM:MCDRAM:MCDRAM MASs. Note: In 'Cache' MCDRAM mode, MCDRAM:MCDRAM:MCDRAM MAS cannot be used.
mmatest4.c - Verification matrix multiply algorithms accuracy of computations on an Intel Xeon Phi processor.

OpenMP* Product Thread Affinity Control

OpenMP* product compiler directives can be easily used to parallelize processing and significantly speed up processing. However, it is very important to execute OpenMP threads on different logical CPUs of modern multicore processors in order to utilize their internal resources as best as possible.

In the case of using the Intel C++ compiler and Intel OpenMP run-time libraries, the KMP_AFFINITY environment variable provides flexibility and simplifies that task. Here are three simple examples of using the KMP_AFFINITY environment variable:

	KMP_AFFINITY = scatter
	KMP_AFFINITY = balanced
	KMP_AFFINITY = compact

These two screenshots of the Htop* utility12 demonstrate how OpenMP threads are assigned (pinned) to Intel Xeon Phi processor 72105 logical CPUs during processing of an MMA using 64 cores of the processor:

KMP
Screenshot 1. KMP_AFFINITY = scatter or balanced. Note: Processing is faster when compared to KMP_AFFINITY = compact.

KMP
Screenshot 2. KMP_AFFINITY = compact. Note: Processing is slower when compared to KMP_AFFINITY = scatter or balanced.

Recommended Intel® C++ Compiler Command-Line Options

Here is a list of Intel C++ Compiler command-line options that a software engineer should consider, which can improve performance of processing of CMMAs:

O3
fp-model
parallel
unroll
unroll-aggressive
opt-streaming-stores
opt-mem-layout-trans

Os
openmp
ansi-alias
fma
opt-matmul
opt-block-factor
opt-prefetch

The reader can use 'icpc -help' or 'icc -help' to learn more about these command-line options.

Conclusion

Application of different optimization techniques to the CMMA were reviewed in this article.

Three versions of CMMA to compute a product of square dense matrices were evaluated in four test programs. Performance of these CMMAs was compared to a highly optimized 'cblas_sgemm' function of the Intel MKL7.

Tests were completed on a computer system with an Intel® Xeon Phi processor 72105 running the Linux Red Hat operating system in 'All2All' Cluster mode and for 'Flat', 'Hybrid 50-50', and 'Cache' MCDRAM modes.

It was demonstrated that CMMA could be used for cases when matrices of small sizes, up to 1,024 x 1,024, need to be multiplied.

It was demonstrated that performance of MMAs is higher when MCDRAM-type RAM memory is allocated for matrices with sizes up to 16,384 x 16,384 instead of DDR4-type RAM memory.

Advantages of using CMMA to compute the product of two matrices are as follows:

  • In any programming language, simple to implement to run on CPUs or GPUs9
  • Highly portable source codes when implemented in C, C++, or Java programming languages
  • Simple to integrate with existing software for a wide range of computer platforms
  • Simple to debug and troubleshoot
  • Predictable memory footprint at run time
  • Easy to optimize using parallelization and vectorization techniques
  • Low overheads and very good performance for matrices of sizes ranging from 256 x 256 to 1,024 x 1,024 (see Figures 1 through 5)
  • Very good accuracy of computations for matrices of sizes ranging from 8 x 8 to 2,048 x 2,048 (see Table 6 in Appendix C)

Disadvantages of using CMMA to compute a product of two matrices are as follows:

  • Poor performance for large matrices with sizes greater than 2048 x 2048
  • Poor performance when implemented using high-level programming languages due to processing overheads
  • Reduced accuracy of computations for matrices of sizes ranging from 2,048 x 2,048 to 65,536 x 65,536 (see Table 6 in Appendix C)

References

1. Matrix Multiplication on Mathworld

http://mathworld.wolfram.com/MatrixMultiplication.html

2. Matrix Multiplication on Wikipedia

https://en.wikipedia.org/wiki/Matrix_multiplication

3. Asymptotic Complexity of an Algorithm

https://en.wikipedia.org/wiki/Time_complexity

4. The IEEE 754 Standard for Floating Point Arithmetic

http://standards.ieee.org/

5. Intel® Many Integrated Core Architecture

https://software.intel.com/en-us/xeon-phi/x200-processor
http://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core
https://software.intel.com/en-us/forums/intel-many-integrated-core

6. Intel® C++ Compiler

https://software.intel.com/en-us/c-compilers
https://software.intel.com/en-us/forums/intel-c-compiler

7. Intel® MKL

https://software.intel.com/en-us/intel-mkl
https://software.intel.com/en-us/intel-mkl/benchmarks
https://software.intel.com/en-us/forums/intel-math-kernel-library

8. Intel® Developer Zone Forums

https://software.intel.com/en-us/forum

9. Optimizing Matrix Multiply for Intel® Processor Graphics Architecture Gen 9

https://software.intel.com/en-us/articles/sgemm-ocl-opt

10. Performance Tools for Software Developers Loop Blocking

https://software.intel.com/en-us/articles/performance-tools-for-software-developers-loop-blocking

11. Memkind library

https://github.com/memkind/memkind

12. Htop* monitoring utility

https://sourceforge.net/projects/htop

Downloads

Performance_CMMA_system.zip

List of all files (sources, test reports, and so on):

Performance_CMMA_system.pdf - Copy of this paper.

mmatest1.c - Performance tests for matrix multiply algorithms on Intel® Xeon Phi processors.

dataset1.txt - Results of tests.

mmatest2.c - Performance tests for matrix multiply algorithms on Intel® Xeon Phi processors for DDR4:DDR4:DDR4 and DDR4:DDR4:MCDRAM MASs.

dataset2.txt - Results of tests.

mmatest3.c - Performance tests for matrix multiply algorithms on Intel® Xeon Phi processors in three MCDRAM modes for DDR4:DDR4:DDR4 and MCDRAM:MCDRAM:MCDRA MASs.

dataset3.txt - Results of tests.

mmatest4.c - Verification of matrix multiply algorithms accuracy of computations on Intel® Xeon Phi processors.

dataset4.txt - Results of tests.

Note:   Intel C++ Compiler versions used to compile tests:
17.0.1 Update 132 for Linux*
16.0.3 Update 210 for Linux*

Abbreviations

CPU - Central processing unit
GPU - Graphics processing unit
ISA - Instruction set architecture
MIC - Intel® Many Integrated Core Architecture
RAM - Random access memory
DRAM - Dynamic random access memory
MCDRAM - Multichannel DRAM
HBW - High bandwidth memory
DDR4 - Double data rate (generation) 4
SIMD - Single instruction multiple data
SSE - Streaming SIMD extensions
AVX - Advanced vector extensions
FP - Floating point
FPA - Floating point arithmetic4
SP - Single precision4
DP - Double precision4
FLOPS - Floating point operations per second
MM - Matrix multiplication
MMA - Matrix multiplication algorithm
CMMA - Classic matrix multiplication algorithm
MTA - Matrix transpose algorithm
AC - Asymptotic complexity
IC - Implementation complexity
EA - Error analysis
MAS - Memory allocation scheme
LPS - Loop processing scheme
CS - Compute scheme
LIOT - Loop interchange optimization technique
LBOT - Loop blocking optimization technique
ICC - Intel C++ Compiler6
MKL - Math kernel library7
CBLAS - C basic linear algebra subprograms
IDZ - Intel® Developer Zone8
IEEE - Institute of Electrical and Electronics Engineers4
GB - Gigabytes
TN - Total number

Appendix A - Technical Specifications of the Intel® Xeon Phi™ Processor System

Summary of the Intel Xeon Phi processor system used for testing:

Process technology: 14nm
Processor name: Intel Xeon Phi processor 7210
Frequency: 1.30 GHz
Packages (sockets): 1
Cores: 64
Processors (CPUs): 256
Cores per package: 64
Threads per core: 4
On-Package Memory: 16 GB high bandwidth MCDRAM (bandwidth ~400 GB/s)
DDR4 Memory: 96 GB 6 Channel (Bandwidth ~ 80 GB/s)
ISA: Intel® AVX-512 (Vector length 512-bit)

Detailed processor specifications:

http://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core

Summary of a Linux operating system:

[guest@... ~]$ uname -a

Linux c002-n002 3.10.0-327.13.1.el7.xppsl_1.4.0.3211.x86_64 #1 SMP
Fri Jul 8 11:44:24 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[guest@... ~]$ cat /proc/version

Linux version 3.10.0-327.13.1.el7.xppsl_1.4.0.3211.x86_64 (qb_user@89829b4f89a5)
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)) #1 SMP Fri Jul 8 11:44:24 UTC 2016

Appendix B - Comparison of Processing Times for MMAs versus MTA

Comparison of processing times for Intel MKL 'cblas_sgemm' and CMMA vs. MTA:

[Intel MKL & CMMA]

Matrix A [32768 x 32768] Matrix B [32768 x 32768]
Number of OpenMP threads: 64
MKL - Completed in: 51.2515874 seconds
CMMA - Completed in: 866.5838490 seconds

[MTA]

Matrix size: 32768 x 32768
Transpose Classic - Completed in: 1.730 secs
Transpose Diagonal - Completed in: 1.080 secs
Transpose Eklundh - Completed in: 0.910 secs

When compared processing time of MTA to:
MKL 'cblas_sgemm'. the transposition takes ~2.42 percent of the processing time.
CMMA, the transposition takes ~0.14 percent of the processing time.

Appendix C - Error Analysis (Absolute Errors for SP FP Data Type)

NMMACalculated SP ValueAbsolute Error
8MKL8.0000800.000000
8CMMA8.0000800.000000
16MKL16.0001600.000000
32CMMA16.0001600.000000
32MKL32.000309-0.000011
32CMMA32.0003200.000000
64MKL64.0006710.000031
128CMMA64.0006410.000001
128MKL128.001160-0.000120
128CMMA128.0012820.000002
256MKL256.002319-0.000241
512CMMA256.0025630.000003
512MKL512.004639-0.000481
512CMMA512.005005-0.000115
1024MKL1024.009521-0.000719
2048CMMA1024.009888-0.000352
2048MKL2048.019043-0.001437
2048CMMA2048.0214840.001004
4096MKL4096.038574-0.002386
8192CMMA4096.037109-0.003851
8192MKL8192.074219-0.007701
8192CMMA8192.0996090.017689
16384MKL16384.14648-0.017356
32768CMMA16384.09961-0.064231
32768MKL32768.335940.008258
32768CMMA32768.10156-0.226118
65536MKL65536.718750.063390
65536CMMA65536.10156-0.553798

Table 6.

Appendix D - Performance of MMAs for Different MASs

MKL Performance
Figure 2. Performance of Intel® MKL 'cbals_sgemm'. KMP_AFFINITY environment variable set to 'scatter'. Cluster mode: 'All2All'. MCDRAM mode: 'Flat'. Test program mmatest2.c. A lower bar height means faster processing.

MKL Performance
Figure 3. Performance of Intel® MKL 'cbals_sgemm' vs. CMMA. KMP_AFFINITY environment variable set to 'scatter'. Cluster mode: 'All2All'. MCDRAM mode: 'Flat'. Test program mmatest3.c. A lower bar height means faster processing.

MKL Performance
Figure 4. Performance of Intel® MKL 'cbals_sgemm' vs. CMMA. KMP_AFFINITY environment variable set to 'scatter'. Cluster mode: 'All2All'. MCDRAM mode: 'Hybrid 50-50'. Test program mmatest3.c. A lower bar height means faster processing.

MKL Performance
Figure 5. Performance of Intel® MKL 'cbals_sgemm' vs. CMMA. KMP_AFFINITY environment variable set to 'scatter'. Cluster mode: 'All2All'. MCDRAM mode: 'Cache'. Test program mmatest3.c. A lower bar height means faster processing.

About the Author

Sergey Kostrov is a highly experienced C/C++ software engineer and Intel® Black Belt Developer. He is an expert in design and implementation of highly portable C/C++ software for embedded and desktop platforms, scientific algorithms, and high performance computing of big data sets.

Viewing all 3384 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>