Quantcast
Channel: Intel Developer Zone Articles
Viewing all 3384 articles
Browse latest View live

Classical Molecular Dynamics Simulations with LAMMPS Optimized for Knights Landing

$
0
0

LAMMPS is an open-source software package that simulates classical molecular dynamics. As it supports many energy models and simulation options, its versatility has made it a popular choice. It was first developed at Sandia National Laboratories to use large-scale parallel computation. As multi-core is now ubiquitous compared to when LAMMPS was first developed 20 years ago, LAMMPS is perfect for optimizing. The book (Intel® Xeon Phi™ Processor High Performance Programming, 2nd Edition – Knights Landing Edition) describes, among other things, how to optimize LAMMPS to take advantage of Intel® Xeon Phi™ Processor x200 (codenamed Knights Landing) as well as recent generation multicore Intel® Xeon® processors.

The LAMMPS code now exceeds half a million lines. For manageability it is organized into packages and the core codebase has limited functionality. Each package is installed separately, and as needed for simulations.  

At the core, LAMMPS is parallelized using a spatial decomposition with the Message Passing Interface (MPI). Additional hybrid parallelization options are available in packages combining shared-memory parallelization with MPI using OpenMP* or CUDA*/OpenCL. As the number of cores sharing the memory subsystem increased, more developers exploited hybrid parallelism with the MPI+X programming model (where X represents a shared memory parallelization using OpenMP, POSIX threads, et al). LAMMPS supports hybrid parallelism with OpenMP for most important routines, and there is active development to improve shared memory performance. However, one MPI task per core typically performs best at the time of writing.

Several workloads were evaluated for performance, including protein, water, and liquid crystal benchmarks, and performance results from production simulations studying molecular alignment in organic solar cells and complex hydrocarbon thermodynamic properties and transport properties. The best performance that can be obtained in LAMMPS without the Intel package was used as the baseline for comparison.

With several optimizations to LAMMPS routines, there were significant performance improvements on Intel Xeon processors and Knights Landing multi-core processors. Optimizations on Knights Landing supporting vectorization and improving data layout resulted in much faster simulations ranging from 1.82x to 7.62x improvement over the un-optimized code. These same optimizations on Intel Xeon processors improved performance 1.19x to 4.35x. Simulations are now performed over 8x faster on Knights Landing compared to the best that could be performed a year ago on Haswell processors.

The optimizations described in this book chapter are available in LAMMPS as an optional package. This approach can give scientists access to improved performance now, while still allowing the developers to experiment with code modernization strategies to improve future performance and converge on models and algorithms that will perform best and can eventually be adopted as the default in LAMMPS. Developers should maximize overlap of internode communications with computation and avoid collective synchronization. These optimizations do not necessarily require changing the programming model, libraries, and directives for parallelization, but encourage   careful attention to synchronization and data sharing and communication between the software processes and threads.

Intel® Xeon Phi™ Processor High Performance Programming, 2nd Edition – Knights Landing Edition

 


Enabling IBM* Bluemix* on Intel® Edison using MongoDB* by Compose

$
0
0

This article explains how to establish a connection with IBM* Bluemix* cloud services using Node.js* API. This include creating a Bluemix application, adding a mongoDB* connection, as well as storing and retrieving data.

Create a Bluemix application

  1. Log in to Bluemix console, select DASHBOARD and click CREATE APP.
  2. Click on WEB and select SDK for Node.js. Click CONTINUE.
  3. Give a name for the app and click FINISH.
  4. On the top, you can see the status “Your app is staging”.
  5. Once the staging is done, click on overview on the left panel to view the dashboard.
  6. Now from the application dashboard click ADD A SERVICE OR API.
  7. In the services page, click on the MongoDB by Compose service in the Data and Analytics section.
  8. On the right side, you can see options to enter values for Username, Password, Host and Port.
  9. If you don’t have an account with Compose, you may need to create one. Click Register at Compose.
  10. Once registered, login to Compose.io and create a MongoDB deployment.
  11. Using the default values, click Create Deployment.

    It will take few minutes for the deployment to be created; you can see the status as below.

    Once the deployment is finished, you will be redirected to the getting started page where you can create a database.

  12. Create a database by clicking Add Database at the top right corner. Give a name for the database and click Run.
  13. Add a user for the database in order to gain access to the database using connection string.
  14. Click on Admin Settings to obtain the hostname and port details.
  15. On your Bluemix add service page, enter the details for hostname, port, username and password.
  16. Once you click create, click RESTAGE on the popup window that appears.

    After the restaging is finished, you should see a status that reads "Your app is running." In the top-right corner.

Setting up the Development Environment

Install mongodb npm module into your project.

npm install mongodb

Setup mongodb connection

Create a node reference variable for the module and client object for establishing a database connection.

var mongodb = require('mongodb');
var MongoClient = mongodb.MongoClient

Create a mongodb connection

The connect function returns a db object, which contains the collection object. The collection object is used to insert and retrieve data from cloud.

The connection url can be obtained from Bluemix console. Select the MongoDB by Compose Service from application dashboard and click Show Credentials.

You can create the connection uri using these credentials. Form the uri as shown below, to be used in the node application:

mongodb://<user>:<password>@<uri>:<port>/iot-compose?ssl=true

Example:

var uri = mongodb://iot-kona:intel123@aws-us-east-1-portal.11.dblayer.com:27832/iot-compose?ssl=true

Copy the uri under credentials and pass it to connect function

db = MongoClient.connect(uri, function(err, db) {});

Store Data

Data can be stored as JSON objects or an array of JSON objects.

data = {‘sensor-id’ : ‘sens341’, ‘value’ : 65.5}
db = MongoClient.connect(config.url, function(err, db) {
	collection = db.collection(config.db);
        collection.insert(data, function(err, result) {});
});

Query Data

Timestamp based query

 

dataQuery = { "timestamp": { $gt: readQuery.timestamp } };

Sensorid based query

dataQuery = { "sensor_id": { $eq: readQuery.sensor_id } }
Run query
collection = db.collection(self.config.db);
collection.find(dataQuery).toArray( function(err, items) {
if(!err)
console.log(JSON.stringify(items, null, ''));
});

References

Floating License Upgrade

$
0
0

2016 License Upgrade

We have made changes to our licensing model and feature codes. In order to install the current and future releases and take advantage of new features you will need to upgrade your product and use a new serial number.

Required Product License Upgrade for Intel® Parallel Studio XE 2016

You will need to update the Intel® Software License Manager with the new 2016 license on the server(s). Make sure you have the latest supported version of the Intel® Software License Manager. If the license was set up correctly, no change is necessary on the clients. However, it may be the case, that client licenses haven't been set up correctly and the user must apply some hand-tuning.

How to upgrade to 2016 Floating License

For additional information check out the Licensing FAQ

Named-User License Upgrade

Named-User System-Locked License Upgrade

$
0
0

2016 License Upgrade

We have made changes to our licensing model and feature codes. In order to install the current and future releases and take advantage of new features you will need to upgrade your product and use a new serial number.

Required Product License Upgrade for Intel® Parallel Studio XE 2016

Named-user system-locked licenses are no longer shipped with a generic license file. A license file will be created during installation and is unique to the system it was installed on. The license file will not work on another system. See our End User License Agreement for maximum number of activations.

If you install the product on a system with Internet connection, use the SN during installation. If you need to install the product on a system without Internet connection you will need to generate the license file manually.

How to get a license file for an offline installation of Intel Parallel Studio XE 2016

For additional information check out the Licensing FAQ

Intel® Math Kernel Library (Intel® MKL) 11.3 Update 3 for OS X*

$
0
0

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance. Intel MKL 11.3 Update 3 packages are now ready for download. Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio . Please visit the Intel® Math Kernel Library Product Page.

Intel® MKL 11.3 Update 3 Bug fixes

New Features in MKL 11.3 Update 3

  • Improved Intel Optimized MP LINPACK Benchmark performance for Clusters on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 
  • BLAS:
    • Improved small matrix [S,D]GEMM performance on Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Xeon® product family and Intel® AVX-512
    • Improved threading (OpenMP) performance of xGEMMT, xHEMM, xHERK, xHER2K, xSYMM, xSYRK and xSYR2K on Intel® AVX-512
    • Improved [C,Z]GEMV, [C,Z]TRMV, and [C,Z]TRSV performance on Intel® AVX2, Intel® AVX512 and Intel® Xeon® product family
    • Fixed CBLAS_?GEMMT interfaces to correctly call underlying Fortran interface for row-major storage
  • LAPACK:
    • Updated Intel MKL LAPACK functionality to latest Netlib version 3.6. New features introduced in this version are:
      • SVD by Jacobi ([CZ]GESVJ) and preconditioned Jacobi ([CZ]GEJSV) algorithms
      • SVD via EVD allowing computation of a subset of singular values and vectors (?GESVDX)
      • Level 3 BLAS versions of generalized Schur (?GGES3), generalized EVD (?GGEV3), generalized SVD (?GGSVD3) and reduction to generalized upper Hessenberg form (?GGHD3)
      • Multiplication of general matrix by a unitary/orthogonal matrix possessing 2x2 structure ( [DS]ORM22/[CZ]UNM22)
    • Improved performance of LU (?GETRF) and QR(?GEQRF) on Intel® AVX-512 
    • Improved check of parameters for correctness in all LAPACK routines to enhance security
  • SCALAPACK:
    • Improved hybrid (MPI + OpenMP) performance of ScaLAPACK/PBLAS by increasing default block size returned by pilaenv
  • SparseBlas:
    • Added examples that cover spmm and spmmd functionality
    • Improved performance of parallel mkl_sparse_d_mv for general BSR matrices on Intel® AVX2
  • Parallel Direct Sparse Solver for Clusters:
    • Improved performance of solving step for small matrices (less than 10000 elements)
    • Added mkl_progress support in Parallel Direct sparse solver for Clusters and fixed mkl_progress in Intel MKL PARDISO
  • Vector Mathematical Functions:
    • Improved implementation of Thread Local Storage (TLS) allocation/de-allocation, which helps with thread safety for DLLs in Windows when they are custom-made from static libraries

Check out the latest Release Notes for more updates

Contents

  • File: m_mkl_online_11.3.3.170.dmg

    Online Installer for OS X*

  • File: m_mkl_11.3.3.170.dmg

    A File containing the complete product installation for OS X* (32-bit/x86-64bit development)

Box Blur Filter Using Intel Subgroup Extensions in OpenCL™

$
0
0

Table of Contents

Abstract

This paper highlights the OpenCL™ application for Box Blur filter, an image processing and filtering algorithm, and it describes how to optimize and accelerate the performance of a naïve OpenCL application using Intel OpenCL Subgroup extensions. The paper focuses on the concept of block read and write calls. Intel Subgroup extensions offer built-in APIs that provide benefit to the OpenCL application to perform bulk read/write and thereby reducing the overall number of read/write calls. By taking advantage of hardware capabilities, OpenCL application developers can read/write blocks of data and process more work items in a workgroup by creating subgroups. The work items within the subgroup can share data without the use of shared local memory and use of barriers. This paper also provides the performance observed on 5th generation Intel® Core™ processors with Intel® Graphics. Using Intel® VTune™ Amplifier tool and analyzing the profiles of the workload, developers can observe better GPU utilization.

OpenCL Overview

OpenCL is an open industry standard maintained by Khronos Group and is a framework for parallel programming across heterogeneous systems for faster and more efficient processing. OpenCL is widely used in applications such as Image processing, video processing, gaming, and more. It improves the performance of the applications. OpenCL portability allows applications to run across multiple platforms and multiple devices within a platform. With the help of the OpenCL™ standard, optimization techniques, concepts of heterogeneous compute and set of extensions offered by Intel, developers can take the benefits and enhance their application to improve the performance significantly 1.

Intel Subgroup Extensions in OpenCL

The concept of subgroups was introduced in OpenCL™ 2.0 where the workgroup consists of one or more subgroups. Two sets of subgroup extensions are offered: Khronos Subgroup extensions and Intel Subgroup extensions. There are different set of APIs offered in both cases. Please refer to the reference link for detailed specification 2. Note that the Intel subgroups extension can also be used with OpenCL™ 1.2.

In this article, we focus on the cl_Intel_subgroups extension. The motivation of this extension is to enhance OpenCL applications by benefiting from the fact that work items execute together in a subgroup. The work items in a subgroup can take advantage of the hardware features. This feature enables work items in a subgroup to share data without implementing shared local memory or using barriers. This advantage is not available to work items in a work group.

The Intel subgroup extension adds a set of subgroup “block read and write” functions to take advantage of specialized hardware to read and write blocks of data from/to buffers or images. In this article, we optimize the OpenCL application for Box Blur filter using the block read/write APIs offered by cl_Intel_subgroup extensions.

Block read API calls for buffers: Reads 1, 2, 4, or 8 unsigned integers (uints - 32 bits each) of data for each work item in the subgroup from the specified pointer as a block operation:

uint  intel_sub_group_block_read(const __global uint* p)
uint2 intel_sub_group_block_read2(const __global uint* p)
uint4 intel_sub_group_block_read4(const __global uint* p)
uint8 intel_sub_group_block_read8(const __global uint* p)

Block write API calls for buffers: Writes 1, 2, 4, or 8 uints of data for each work item in the subgroup to the specified pointer as a block operation:

void  intel_sub_group_block_write(__global uint* p, uint data)
void  intel_sub_group_block_write2(__global uint* p, uint2 data)
void  intel_sub_group_block_write4(__global uint* p, uint4 data)
void  intel_sub_group_block_write8(__global uint* p, uint8 data)

Box Blur Filter Algorithm

Box Blur is an image processing and filtering algorithm 5. It is a simple algorithm of a filter where each pixel in the output image is equivalent to the average of the neighboring pixel in the input image. The input pixels are unpacked to get the RGB components, and the filter is applied on each component followed by packing it back into the pixel. The diagram of the algorithm is shown in Figure 1, and the mathematical representation of the algorithm is shown in Figure 2.

Box Blur filter for a diameter of 3
Figure 1: Box Blur filter for a diameter of 3, computed using the value for (1,1) using pixel value (1,1) and 8 neighboring pixels.

For example, to calculate the Box Blur of pixel (1,1) for a Box Blur of a diameter of 3, the value of the current pixel and all the 8 neighboring pixels (the shaded pixels in the diagram) are used to compute the output of the pixel (1,1).

Mathematical formula of a Box Blur filter
Figure 2: Mathematical formula of a Box Blur filter.

The radius is derived from the Box Blur size. For example, a Box of size 3x3 has a diameter of 3 and a radius of floor (3/2) = 1. Whereas, Factor = 1 / (diameter*diameter); for example, for a diameter of 3, the factor is 1/9 (this takes the average). If the value of x and y go out of bounds, clamp the values between 0 and the image size (not shown in the formula).

OpenCL Application For Box Blur Filter

OpenCL™ kernel development for Box Blur filter was done using the Intel® Code Builder for OpenCl™ API tool 4. The Box Blur filter was implemented using OpenCL 1.2 with buffer memory objects. Zero copy buffers were created using CL_MEM_USE_HOST_PTR. Input and output buffers were created using the “unsigned char” datatype, and the size of each buffer is (width × height × 4). For test cases, two image sizes were used: 1920x1080 and 4256x2832 resolution. The test cases included application of a Box Blur filter for various diameter sizes: 3, 5, 7, 9, and 11. The host code steps included zero copy buffer creation and allocation for both input and output buffers. A global workgroup size was assigned as {width, height}.

After setting up the arguments and kernel dispatch, the output is mapped to the buffers. The kernel code reads “uint data” and extracts the Red, Green, and Blue (RGB) byte components of each pixel. To apply the Box Blur filter, each color component is averaged with the corresponding color components of the neighboring pixels. The resulting RGB component bytes are packed into the uint pixel value again before writing it to the output buffer (refer to Section 5 for more details on the Box Blur Filter algorithm). This kernel implementation processes one pixel per work item. 

Diagram showing computation of one pixel at a time
Figure 3: Diagram showing computation of one pixel at a time. The neighboring pixel values (orange squares) are also read to compute the output pixel (green square).

Host code: Global workgroup size:

For input buffer and output buffer of type unsigned char and size (width × height × 4)
Global_size[] = {width, height};

Kernel pseudo code: Processing Box Blur for one pixel per work item:

  1. Get x and y using get_global_id(0) and get_global_id(1).
  2. Declare temporary variables Temp_R, Temp_G, Temp_B as float and initialize to 0.
  3. Create a for loop to read the value of the main pixel and neighboring pixels based on the radius of Box Blur (see the formula in Section 5  for reference.)
    for (int i = -radius; i <= radius; i++)
    {
        for (int j = -radius; j <= radius; j++)
        {
        a.	Using the value of i and j, calculate the offset and the index of a Pixel.
        b.	Read one Pixel value of type uint using the index value.
        c.	Unpack each Pixel to get R, G, B byte components.
        d.	Apply Box Blur Filter on each RGB component. (See the formula in Section 5  for reference)
          i.	Temp_R  += R * Factor;
          ii.	Temp_G  += G * Factor;
          iii.	Temp_B  += B * Factor;
          }
    }
  4. Pack Temp_R, Temp_G, Temp_B components to a uint pixel value and write pixel value to the output buffer.

OpenCL Application For Box Blur Filter Using Intel Subgroup Extensions

The naïve OpenCL application for Box Blur filter is improved using Intel Subgroup extensions. Here, Intel Subgroup extensions is used for block read and write functions The test case chosen to showcase the feature implements a kernel that computes 16 pixels per work item. In the current example, we read a block of “4 uint data” at once as a block read operation and similarly write a block of “4 uint data” to the output buffer as a block write operation. The new global workgroup size to compute 16 pixels is {width/4, height/4}. The rest of the host code remains the same, and the kernel code is modified to calculate the output for the entire block of data, that is, for 16 pixels. The number of times the kernel is dispatched is less; the work item handles more workload as the kernel now computes for 16 pixels.

Diagram showing computation of 16 pixels in a work item
Figure 4: Diagram showing computation of 16 pixels in a work item. The extra pixel values (orange squares) read to compute the output of 16 pixels (green squares). 

Host code: Global workgroup size:

For input buffer and output buffer of type unsigned char and size (width × height × 4)
Global_size[] = {width/4, height/4};

Kernel pseudo code: Processing Box Blur for 16 pixels per work item:

  1. Get x and y using get_global_id(0) and get_global_id(1).
    1. int x = 4 * get_global_id(0);
    2. int y = 4 * get_global_id(1);
  2. Initialize temporary vector variables TempR_rt, TempG_rt, TempB_rt as float4 and initialize to 0, where t ϵ {1, 2, 3, 4}.
  3. Create a for loop to read the value of the main pixel and neighboring pixel based on the radius of Box Blur (see the formula in Section 5 for reference).
    for (int i = -radius; i <= radius; i++)
    {
         for (int j = -radius; j <= radius; j++)
         {
          a.	Using the value of i and j, calculate the offset and the index.
          b.	Read blocks of data – read 4 uints.
                       // Reading for each row
    	           uint4 r1 = intel_sub_group_block_read4(src + index);
    	           uint4 r2 = intel_sub_group_block_read4(src + index + width);
                       uint4 r3 = intel_sub_group_block_read4(src + index + 2*width);
    	           uint4 r4 = intel_sub_group_block_read4(src + index + 3*width);
          c.	Unpack rt to get R, G, B component for each row where t ϵ {1, 2, 3, 4}.
          d.	Apply Box Blur Filter on RGB component for each row (see the formula in Section 5  for reference).
                        i.	TempR_rt += Rt * Factor;
                        ii.	TempG_rt += Gt * Factor;
                        iii. TempB_rt += Bt * Factor
                        Where t ϵ {1, 2, 3, 4}
                   }
    }
  4. Pack TempR_rt, TempG_rt, TempB_rt component for each row into variable Outputt, where t ϵ {1, 2, 3, 4} and Outputt is of type uint4.
  5. Write 16 pixels to the output buffer:
    intel_sub_group_block_write4(dst + out_index, Output1);
    
    intel_sub_group_block_write4(dst + out_index + width, Output2);
    
    intel_sub_group_block_write4(dst + out_index + width*2, Output3);
    
    intel_sub_group_block_write4(dst + out_index + width*3, Output4);

     

Performance Data And Graph

The performance of OpenCL buffers and OpenCL buffers using Intel® Subgroup extensions were measured on a BDW Lenovo Yoga* system. Its specifications are four cores, Intel® Graphics GT2 system. The performance numbers were collected for two different image resolutions: 1920x1080 and 4256x2832 bitmap images. Box Blur filter of different diameter was used for the performance number collection: 3, 5, 7, 9, and 11. The graphs below show kernel times in ms (Figures 5 and 7) and total time (kernel time + host time) in ms (Figures 6 and 8). The lower the time, the better the performance. The average speed-up in the kernel time is 1.52x and average speed-up in total time is 1.36x.

Box Blur filter performance comparison
Figure 5: Box Blur filter performance comparison. Kernel time of naïve OpenCL™ application versus Intel Subgroup Extensions for an image size of 1920x1080 on 5th generation Intel® Core™ processors with Intel® Processor Graphics.

Box Blur filter performance comparison - Total time of naïve OpenCL™ application versus Intel Subgroup Extensions for an image size of 1920x1080 on 5th generation Intel® Core™ processors with Intel® Processor Graphics
Figure 6: Box Blur filter performance comparison. Total time of naïve OpenCL™ application versus Intel Subgroup Extensions for an image size of 1920x1080 on 5th generation Intel® Core™ processors with Intel® Processor Graphics.

Box Blur filter performance comparison for image size of 4256x2832
Figure 7: Box Blur filter performance comparison. Kernel time of naïve OpenCL™ application versus Intel Subgroup Extensions for image size of 4256x2832 on 5th generation Intel® Core™ processors with Intel® Processor Graphics.

Box Blur filter performance comparison - Total time of naïve OpenCL™ application
Figure 8: Box Blur filter performance comparison. Total time of naïve OpenCL™ application versus Intel Subgroup Extensions for an image size of 4256x2832 on 5th generation Intel® Core™ processors with Intel® Processor Graphics.

Intel® VTune™ Amplifier Tool Profiles

The Intel VTune Amplifier performance tool was used to collect the profile of the workloads. Intel VTune Amplifier profiles for the Box Blur application using OpenCL buffer and OpenCL buffer with Intel Subgroup extensions were collected for 4K images and a box blur diameter of 11. The profiles from both the implementation were analyzed to track GPU usage and EU utilization: EU Active%, EU Stall%, and EU Idle%. The GPU metrics were used to compare the performance of both implementations on the hardware.

The Graphics/Platform tab showed the EU utilization. Figure 9 shows the Intel VTune Amplifier profile of naïve OpenCL application for Box Blur filter, and Figure 10 shows the Subgroup implementation of the Box Blur filter. For naïve OpenCL application, the EU Active% is 90.6 percent, and EU Stall% is 9.4 percent. For the Subgroup implementation of the Box Blur filter, EU Active% is 99.7 percent, and EU Stall% is 0.3 percent. The EU Active% in the case of the OpenCL buffer with Intel Subgroup extensions increased by 10 percent. Overall, the EU utilization shown for the kernel using the subgroup extension is better.

: Intel® VTune™ Amplifier tool profile of naïve OpenCL™ application
Figure 9: Intel® VTune™ Amplifier tool profile of naïve OpenCL™ application for a Box Blur filter with an image size of 4256x2832 and a diameter of 11.

Intel® VTune™ Amplifier tool profile of a Box Blur filter using Intel Subgroup extensions
Figure 10: Intel® VTune™ Amplifier tool profile of a Box Blur filter using Intel Subgroup extensions, an image size of 4256x2832, and a diameter of 11.

Conclusion

The paper presented a basic Box Blur filter OpenCL application and optimization technique using OpenCL Intel subgroup extensions. The test case showed how to optimize an OpenCL application and enhance its performance. The Subgroup example used for experimentation was 4x4, that is, computing 16 pixels per work item. It showed the benefit of using subgroups to increase the workload per work item for better EU utilization. Performance data graph showed a speedup of 1.52x for the kernel time and 1.36x for total time. Profiles from the Intel VTune Amplifier tool showed better EU utilization in the case of using subgroups. OpenCL application developers can experiment with Intel subgroup extensions using different subgroup sizes and optimize their application to the best based on their system specifications.

About The Author

Sonal Sharma is a software application engineer working at Intel in California. Her work responsibility includes OpenCL enabling for applications running on Intel® platforms. She does performance profiling and GPU optimization for media applications and is well acquainted with Intel® performance tools like Intel VTune Amplifier, Intel Code Builder for OpenCL API, and Intel® Graphics Performance Analyzers. 

References

1 OpenCL™

https://www.khronos.org/opencl/

2 OpenCL™ Intel Subgroups Extensions

https://www.khronos.org/registry/cl/extensions/intel/cl_intel_subgroups.txt

https://software.intel.com/en-us/articles/sgemm-for-intel-processor-graphics

3 Intel® VTune™ Amplifier Tool

https://software.intel.com/en-us/intel-vtune-amplifier-xe

4 Intel Code Builder for OpenCL API

https://software.intel.com/en-us/code-builder-user-manual

5 Box Blur filter

https://en.wikipedia.org/wiki/Box_blur

Intel® Parallel Computing Center at Princeton University's Institute

$
0
0

Princeton University’s Institute

Principal Investigators:

Professor William Tang

Prof. William Tang, the PI of this project, originally began the Princeton Gyrokinetic Toroidal Code (GTC-P) project in 2008 with the goal of producing a modern HPC application code capable of delivering discovery science for increasing problem size by effective utilization of the most advanced supercomputing platforms. He was also the U.S. PI for the National Science Foundation-supported G8 Exascale Computing for Global Scale Projects Program in Fusion Energy that successfully ported GTC-P to leading HPC systems in Europe and Japan as well as in the US. This activity has currently been extended to top supercomputing systems worldwide to carry out comparative performance studies with “time to solution” and “energy to solution” as the relevant metrics.

Dr. Bei Wang is the current lead developer for the GTC-P code and has extensive experience in porting and optimizing the code on a variety of multi-core and many-core systems worldwide. Most recently, she has successfully ported the code to Stampede’s Intel® Xeon Phi™ Coprocessor system at NSF’s Texas Advanced Computing Center and at the world-leading Tianhe – 2 system in China. Significant results operating in symmetric mode have been obtained, and active development of a more efficient offload mode implementation is currently in progress. More recently, she has actively collaborated on GTC-P performance studies with the Intel® PCC in ETH-Zurich to significantly advance progress in this key area. 

Dr. Khaled Ibrahim, a Computer Science expert in performance modeling and simulation acceleration in the computer science division of the University of California Lawrence Berkeley National Lab (LBNL), has been the lead member of the CS team there engaged specifically in active collaborations with Princeton on modernizing the GTC-P code.  In particular, he has led the R&D efforts that have enabled GTC-P to exploit the optimization of “scatter” and “gather” operations on modern multi-core and many-core systems. He will also explore the best way to effectively use the cache and memory hierarchy in the Xeon Phi architectures.

Dr. Carlos Rosales is Co-Director of the Advanced Computing Evaluation Laboratory at the TACC, where his main responsibility is the evaluation of new computer architectures relevant to High Performance Computing. His areas of expertise are benchmarking, code optimization, and computational fluid dynamics. Dr. Rosales has worked on code optimization for the Intel® Xeon Phi™ coprocessor since its pre-production days, and works closely with Intel engineers in several areas related to performance and stability of codes deployed on the Intel® Architectures.

Description:

The Intel® PCC at Princeton University’s Institute for Computational Science & Engineering in partnership with the TACC and LBNL will focus on conducting a systematic collaborative case study on the Intel® Xeon Phi™ coprocessor of a discovery-science-capable particle-in-cell (PIC) production code named Gyrokinetic Toroidal Code -Princeton (GTC-P).  This work will involve exploiting vectorization and determining the best strategy for dealing with the last level of the cache used in Intel® Xeon Phi™ coprocessors. In particular, the associated R&D will explore the best ways to use the memory hierarchy in the Knights Landing (KNL) architecture. Additionally, improved efficiency of the offload programming model on the Knights Corner (KNC) architecture will also be addressed. Overall, the aim is to produce a successful case study to demonstrate the performance of advanced PIC algorithms on Intel® Architectures.

In order to more efficiently utilize the full power of Intel® Xeon Phi™ coprocessors, it is important that the applications utilize all cores and vector units effectively.  This will accordingly involve investigation of optimization opportunities on data parallelism for two key kernels in GTC-P featuring algorithmic level “scatter” and gather” operations.  Specifically, the optimizations will include careful examination of data layouts (Array of Structure and Structure of Array), data alignment, data prefetching, intrinsics, and auto-vectorization.  In addition, the R&D will involve exploring the best strategy for dealing with the last level of the cache hierarchy that is used in the Intel® Xeon Phi™ coprocessor series.  Since the KNL architecture soon to be accessible on “Cori” at NERSC/LBNL, on “Theta” at ALCF/ANL, and on “Stampede II” at TACC will feature a hierarchy of dynamic memory capabilities, this Intel® PCC has special interest in analyzing the access pattern of different data structures to guide the allocation to the various dynamic memories.  For the current generation KNC architecture featured on “Stampede,” we plan to add an “offload pragma” with the goal of improving offloading of the loops in these key kernels, while keeping nearly the same performance as the native version.  Deploying an efficient offload programming model is necessary for properly performing application production runs on leadership-class computing facilities (such as Stampede and TH-2) where supporting direct MPI communication involving Intel® Xeon Phi™ coprocessors is quite challenging.

Related websites:

http://extremescaleglobalpic.princeton.edu


Support of Visual Studio 2015* RTM, Update 1 or Higher

$
0
0

Visual Studio 2015 updates have introduced many changes in Visual C++ compiler. Because of its late change it's very challenge for Intel C++ compiler to be 100% compatible. We have seen some incompatible issues reported, and those issues are addressed in the sub sequential update of Intel C++ compiler. But not all the issues can be addressed in the version 15.0 of Intel C++ compiler because of the complexity involved.

See following table for the detail support of Visual Studio 2015* (VS2015) RTM, Update1, Update 2 releases:

 

 Intel C++ Compiler 15.0Intel C++ Compiler 16.0Know issues
VS2015 RTM SupportYesYesNone

VS2015 U1 Support

No

Yes with 16.0 Update 1 or higher

limits(1120): error : identifier "__builtin_nanf" is undefined 

VS2015 U2 SupportNoYes with 16.0 Update 2 or higher
"__declspec(allocator)" not supported as reported on this forum thread (DPD200382118)

 

Intel® RealSense™ Gesture Playground

$
0
0

The Intel® RealSense™ Gesture Playground Tool enables easy development and customization of gesture on Intel® RealSense™ SDK, which involves a series of tasks including sample capture, data analysis, algorithm design/implementation, testing/debugging, code integration and so on.

The tool provides the following key features for gesture customization and development based on Intel®RealSense™SDK:

  1. Depth video record & replay
  2. Frame by Frame navigation and tagging
  3. Data manipulation with scripts
  4. Code generation
  5. Debug & Test

Debugging Intel® XDK Cordova Apps Built for iOS* Using a Mac* and Safari*

$
0
0

There are some debugging situations that simply cannot be satisfied by the features found on the Intel® XDK Debug tab. In those cases, if you own a Mac* and an Apple* developer account, you can utilize Web Inspector in Safari* to remotely debug an iOS* Cordova app in a way that is analogous to using Remote Chrome* DevTools with an Android* device running a Cordova app. This process requires that you use a Mac and that your iOS device is attached to the Mac via a USB cable. Some configuration of Safari* on your Mac* and on your iOS* device is required to make this work, and is described below.

Additional instructions regarding the use of Web Inspector with a USB-attached iOS device can be found in this Apple document titled Safari Web Inspector Guide. The instructions below will only get you connected and started, they do not attempt to explain how to use Web Inspector to debug your application.

Enable "web inspector" on your iOS device

  • open "Settings" on the iOS device
  • choose "Safari"
  • choose "Advanced"
  • set the "Web Inspector" button to "on" (make it green)

Enable "device debug" in Safari on your Mac

  • select "Preferences" from the Safari menu
  • choose the "Advanced" tab
  • check the "Show Develop menu in menu bar" option (as shown in the image below)

Build your app using a development provisioning file and install the IPA onto your iOS device

It is important that you build your app with a "Development" certificate and a matching "Development" mobile provisioning file. You need to import your "Development" p12 certificate into the Intel XDK certificate management tool. Make sure you then select that certificate from the pulldown in the iOS tab in the Build Settings (on the Projects tab) before you build your app. Also, be sure to select the matching mobile provisioning file in the same Build Settings section for your project.

An example of a "Development" mobile provisioning file is shown below. Note that in this example the provisioning file is a "wildcard" provisioning file, which means that it can be used with any App ID (because the provisioning file's App ID is '*'). The app you build with this "Development" provisioning file (and the matching "Development" certificate) can only be installed onto those devices that are part of the authorized "Device" list (there are seven authorized devices for the certificate shown below).

Alternatively, you can re-sign a previously built app with a development provisioning file

If you choose this route, the easiest way to re-sign your app is with a free open-source app named iReSign. You can download it directly from this GitHub repo and move the iReSign.app folder to your /Applications folder (or wherever you like to store such utilities...). See the README.md in the iResign GitHub repo for basic instructions. This app does require that Xcode is installed on your Mac.

Install the Built App onto Your iOS Device

If you imported a "Development" certificate into your Intel XDK certificate management console and used that certificate and the matching "Development" mobile provisioning file to build your app, you can most easily install your app by sharing the app via the build tile. Using the share feature on the iOS build tile from the Build tab allows you to email your built app to your device. From the iOS device, open the shared email with the native email app on your iOS device and select the OTA link to download and install the built app onto your device.

If you downloaded the IPA file directly to your Mac, use iTunes and the following instructions to install the app onto your device.

Attach the iOS device to your Mac via a USB cable

Open iTunes* and select the USB-attached iOS device so you can see the apps that are on the device.

Drag the IPA file you re-signed to the iTunes icon on your Mac's "Dock."

Make sure you select "install" next to the name of the app you dragged to the iTunes Dock icon. It should appear in the list of apps shown for the iOS device in iTunes. Once you select "install" the button should change so it says "will install." In the image below we are going to install the app named "HelloCordova" for remote debugging over USB.

Sync iTunes with your iOS device

This is necessary to install the built app onto your device with iTunes. You "sync" iTunes with your device by pushing the "Apply" button in the lower right corner of iTunes. Do not remove the USB cable from your Mac, the next step requires that your iOS device remain connected to your Mac via the USB cable.

Start Debugging the App on Your iOS Device

First, start the app on your iOS device; an icon for the app to be debugged should have appeared if you used the sync step above or if you successfully installed using the OTA link sent via the share link on the Build tab.

To start debugging with Safari on the Mac:

  • select "Develop" from the Safari menu
  • select the "name of the attached device"
  • select the "name of the app to be debugged"
  • select "index.html" (see the image below)

Safari Web Inspector will startup on your Mac and you will have a full debug environment (similar to Chrome DevTools) for the app that is running on your USB-connected iOS device. For difficult debugging situations, especially those where the app crashes immediately on start, you may have to change the logic in your app to pause any action until you, for example, start a function manually from the JavaScript* console.

Notice in the Web Inspector image above, the <p> tag is highlighted. In the screenshot below you can see the highlighted <p> element, it is similar to debugging a browser window with Safari Web Inspector. The screenshot below was taken from an iPhone* that was attached via USB to a Mac running the Web Inspector session shown above connected to the app running on the iPhone, below.

Recognizing and Measuring Vectorization Performance

$
0
0

Vectorization promises to deliver as much as 16 times faster performance by operating on more data with each instruction issued. The code modernization effort aims to get all software running faster by scaling software to use more threads, processes, and just as importantly effectively vectorize, that is, effectively use single instruction, multiple data (SIMD) execution units. This article provides some background on vectorization and discusses techniques to evaluate its effectiveness.

Introduction

For the purposes of this article, SIMD and vectorization are used synonymously. Today’s computers are doing more and solving more problems than ever before. Each new generation of computers changes significantly—today’s computers look very different from those delivered 20 years ago. As computers continue to change, software needs to change as well. Just as modern Intel® processors build on top of the core instruction set architecture (ISA), software can also continue to build on top of traditional C/C++ and Fortran*. Intel’s instruction set expanded to add Intel® Streaming SIMD Instructions 4 (Intel® SSE4), Intel® Advanced Vector Extensions (Intel® AVX), and Intel® Advanced Vector Extensions 512 (Intel® AVX-512); as such, software needs to use vectorization efficiently.

Modern processors continue to add more cores to each processor and to widen the SIMD extensions. Software is threaded to take advantage of the numerous cores on Intel® platforms. Just as the number of cores in a processor increased from two full cores to 16, and up to over 50 cores per processor on the Intel® Xeon Phi™ processor product family, the width of the SIMD data registers increased. The original Intel SSE instructions operated on registers 128-bits wide. The Intel AVX instructions were introduced with the addition of the 256-bit registers and Intel AVX-512 brings 512-bit registers to Intel platforms. With 512-bit registers the processor can operate on 8 to 16 times the amount of data with a single instruction compared to the original 64-bit registers.

This is the importance of vectorization. If software doesn’t embrace vectorization, it is not keeping up with the new instructions available, which is somewhat similar to operating on old Intel platforms. Table 1 shows Intel AVX-512 instructions performing eight data operations with one instruction.

zmm2a[0]a[1]a[2]a[3]a[4]a[5]a[6]a[7]
zmm3b[0]b[1]b[2]b[3]b[4]b[5]b[6]b[7]
zmm1a[0]+b[0]a[1]+b[1]a[2]+b[2]a[3]+b[3]a[4]+b[4]a[5]+b[5]a[6]+b[6]a[7]+b[7]

Table 1: An example of Intel® Advanced Vector Extensions 512 (VADDPD) and data operations.

The SIMD registers may be fully packed as shown above, or it may operate on only a single lane (packed versus scalar instructions). Table 1 shows a packed operation or all eight lanes full and being operated on simultaneously. When only one lane of a SIMD register is used, this is just a scalar operation. The goal of vectorization is to move software from scalar mode to full width—or packed vectorization—mode.

So how does a software developer go from scalar code to vector mode? Consider these vectorization opportunities:

  • Use vectorized libraries
  • Think SIMD
  • Performance analysis and compiler reports

Use Vectorized Libraries

This is the simplest method to benefit from vectorization or SIMD operations. There are many libraries optimized that take advantage of the wide SIMD registers. The most popular of these is the Intel® Math Kernel Library (Intel® MKL). Intel MKL includes all of the BLAS (Basic Linear Algebra Subprograms) routines as well as many mathematical operations, including solvers such as LAPACK, FFT, and more. These packages are built to take advantage of the SIMD execution units. Many other commercial libraries do as well. If you use other third-party software, ask whether the company supports the wider SIMD instructions and how much gain it gets from using them. Performance gains vary based on workloads and data sets, but the company should have some data showing what is achievable for a particular data set.

You should also check whether you are using the best library interface to deliver performance. In at least one case, a software vendor discovered that there were different interfaces to a library it was using that provided far better performance. The vendor was making hundreds of thousands of calls into the library. It modified its code to use the API that passed in blocks of work to do and reduce the number of calls. The performance gains were significant and well worth the changes. So check that you are using the best interface to third-party libraries.

Think SIMD

As you write code, always consider what can be done simultaneously. While compilers do a great job of recognizing opportunities for parallelism, the developer still needs to step in and help. When you can think of how operations can be done in a SIMD fashion, then it becomes easier for you to express it so that the compiler will generate SIMD code. So when you think about the computation of your code think, “How could more of this be done simultaneously?”

Taking the time to write ideas on a whiteboard can help. Don’t instantly change your code—the compiler might already be vectorizing several parts of it now. Think about what operations are repeated across different sets of data and keep them in mind as you work on software performance. Compilers and tools give you great tips, but they only go so far. The expert knowledge the developer has about the operations can be critical for the big breakthroughs. Consider what can be done to express the code to expose SIMD or vector operations.

Unfortunately, proper expression for the compiler is part skill, part art. The compilers weigh multiple possible code generation alternatives and have better knowledge of instruction latency and throughput than most developers. If the average developer tries to do too much and breaks things down excessively, it may obscure or prevent the compiler from producing the best possible code. Other times the developer needs to break things down so it is easier for the compiler. Although there is no perfect delineation on what the developer should do and what the compiler should do, there are specific things the developer can do. Future articles will provide more guidelines.

Performance Analysis

Performance analysis has several steps. One of the first steps is to determine where the compute time is consumed and whether that section consuming is threaded and vectorized well. This article covers only vectorization. Before spending time to modify code, make sure that the areas modified will impact performance.

Several popular performance analysis tools include Intel® VTune™ Amplifier XE, TAU Performance System*, Gprof*, and ThreadSpotter*. Each of these reveals important information about which sections of code consume the most computing resources and time. It is just as important to understand what the compiler is doing. A computationally expensive section of code may already be vectorized—it may just be expensive—so you need to know where time is spent and whether it is justified.

The compiler optimization reports provide a great deal of important and helpful information about optimizations applied. If you are starting on new code, it may be a good place to start. The compiler optimization reports for the Intel compiler are invoked with the -qopt-report compiler option (I usually use -qopt-report=3; see compiler documentation to understand the different report levels). As a developer, you recognize the need to understand performance and compiler decisions and know how to act on this information. When working on software performance of existing code, it is often better to begin with analysis tools rather than sort through the thousands of lines of compiler optimization reports that may be generated. You can take performance analysis data and then match or correlate the hotspots to the compiler optimization reports and begin performance tuning.

This article first displays performance analysis using Intel® VTune™ Amplifier XE and focuses on using hardware events. The value of hardware events you can collect varies based on processors. Historically, each Intel® processor had a different set of hardware event counters. Intel processor hardware events are moving toward fewer variations.

This example uses matrix multiply code available at https://github.com/drmackay/vectorization-exercises-1.git.

Alternatively, the Intel VTune Amplifier EX sample code shows similar results for three of the seven cases. On Windows* the default installation location of the Intel VTune Amplifier XE code is in: c:/Program Files(x86)/IntelSWTools/VtuneAmplifierXE/samples/en/C++/matrix_vtune_amp_xe as a compressed zip file. The same document is available on Linux* installations. If you use the Intel VTune Amplifier XE sample code edit multiply.h so that MAXTHREADS is set to 1.

  • In Windows, select the release build, and then modify the project properties. Under C/C++/ All Options set Optimization Diagnostic Level to Level 3 (/Qopt-report:3).
  • On Linux* modify the makefile by adding -qopt-report=3 to the ICFLAGS definition line (to build enter make icc).

The data collected here comes from a 6th generation Intel® Core™ processor (code-named Skylake). This is advantageous because Skylake generation cores have excellent floating-point hardware event counters. The counters on Skylake distinguish between packed and scalar operations as well as Intel SSE versus Intel AVX floating-point operations. The Intel® Xeon Phi™ coprocessor (code-named Knights Corner (KNC)) has floating-point counters, but they are not as meaningful. The floating-point counters on Intel Xeon Phi coprocessors are prone to overcounting the number of floating-point operations. This overcount does not mean the data is worthless. The counters may overcount, but they do not undercount. This means that low floating-point counts on KNC are a definitive indicator of poor vectorization or poor floating-point performance. On the other hand, high floating-point counts on KNC are indicators that the code is probably pretty good, but that is not definitive. There are other things occurring that may be inflating the reported floating-point operations. On the 4th generation Intel® Xeon® processor generation (code-named Haswell) no floating point events were included. Skylake has clean counters that are easier to interpret.

This sample code available from github* runs seven variations of matrix multiply. The performance of the software is analyzed using the preconfigured general exploration hardware event collection in Intel VTune Amplifier XE. The resulting data is shown in the Event Count tab of the Intel VTune Amplifier XE data collection area (see Figure 1). The columns were reordered by dragging and dropping to place the double-precision floating-point events adjacent to the cycles per instruction (CPI) rate.

Figure 1: Event Count tab of the Intel® VTune™ Amplifier XE data collection

The clock ticks are largely in subroutine abasicmm1 (this correlates to multiply1 using the Intel VTune Amplifier XE sample code). Gains in performance will come from reducing the runtime of abasicmm1. Second, the FP_ARITH_INST_RETIRED.SCALAR_DOUBLE event has numerous counts, while the corresponding 128B_PACKED and the 256B_PACKED events are zero. This indicates that there is no vectorization achieved for this routine. Notice the high number for the CPI.

Each processor core has multiple execution units that can be active on each cycle. Ideally there would be at least three instructions issued and retired every clock cycle, and the CPI would be a low number. It is not realistic to expect to fill every available execution unit with productive work every cycle, but it is also unrealistic to accept a high clock per instruction ratio significantly greater than 1. Here, the data shows that abasicmm1 consumes the most time and is not vectorized, even though it does significant double-precision floating-point operations. Now let’s look at the optimization reports for subroutine abasicmm1.

Figure 2: Compiler optimization reports within Visual Studio*.

Figure 2 shows a screen capture of the compiler optimization reports within Visual Studio*. On Linux, the submm.optrpt file has the same information (make sure you used -qopt-report=3). The compiler reports show that the loop is not vectorized and further explains that it did not vectorize the inner loop because it could not determine the independence of all variables being operated on. The report also suggests the developer consider adding a directive or pragma for the compiler to inform the compiler that there are no dependencies.

The process followed above was to use performance analysis data and compiler reports to determine vectorization. There is another method that does all of the above using one utility: Intel® Advisor XE.

Intel Advisor XE combines runtime performance data and the compiler optimization reports, correlates the data, and then sorts it. This combines hotspot analysis along with runtime information, which can include the number of loop iterations, number of subroutine calls, and memory access stride lengths in addition to the compiler report about its actions for the code such as inverting loops, vectorizing sections, and more. Some codes have specific regions or hotspots to focus on, and focusing developer effort in these areas yields significant performance improvements. Some codes have a flatter profile, and effort must be extended across many files to get large performance gains. Some changes can be embedded into common files or modules that are included so that the changes propagate too many files and their subroutines and functions, but the work is not trivial. Intel Advisor XE can help track progress as well as help determine where to begin the code modernization work. Figure 3 shows the survey results of applying Intel Advisor XE to the sample code.

Intel Advisor XE initially sorts data by self time (time actually spent in the routine, excluding calls to other routines from within it), and abasicmm1 is at the top. The information from the compiler report about assumed dependency as a reason preventing vectorization is listed as well as information showing that the loop was not vectorized. Additional helpful information can be collected such as the number of loop trip counts. Intel Advisor XE conveniently combines much of the information from compiler reports as well as runtime and performance analysis into one location.

Figure 3: Results of matrix.icc in Intel® Advisor XE.

Intel Advisor XE shows that the loop was unrolled by two by the compiler so there are only 1536 loop counts instead of 3072 (the matrix size). Intel Advisor XE’s organization simplifies the tuning and vectorization process.

Vectorizing Routines Overview

You can add the OpenMP* pragma contruct (#pragma omp simd) above the innermost loop to let the compiler know it is safe to vectorize this loop. This is done in routine abasicmm2 and improves performance by 9.3 percent. Notice that the Intel VTune Amplifier XE report still shows a high CPI for routine abasicmm2 (see Figure 1). While it is vectorized, it is not an efficient vectorization. Routines abettermm1 and abettermm2 swap the order of the second and third nested loops. This changes the operations from dot product orientation to a daxpy orientation (eliminates a reduction). The performance for routine abettermm1 is better than a 10x improvement over abasicmm1.

Both Intel Advisor XE and Intel VTune Amplifier XE report that abettermm1 is not vectorized. Notice that the Intel VTune Amplifier XE report shows all the floating-point operations as scalar operations. The routine abettermm1 does not use the OpenMP simd pragma. The routine abettermm2 adds a #pragma omp simd to the code and vectorizes the output. Advisor XE shows abettermm2 is vectorized using Intel® AVX operations, and Intel VTune Amplifier XE shows the use of packed 128-bit floating-point operations. This improves performance as well as vectorization (the routine abbettermm2 is similar to routine multiply2 in the Intel VTune Amplifier XE sample code).

This article previously mentioned that a high CPI indicates inefficiency. The article acknowledges that like floating-point counters on KNC, CPI is an indicator providing a hint, not a definitive notification that something is bad. The order of magnitude is still important. Notice that the CPI for abasicmm1 and abasicmm2 is greater than three for both routines. This is very high and in this case it is indicative of poor performance; routine abettermm1 shows a CPI of 0.415 while routine abettermm2 shows a CPI of 0.708. Routine abettermm2 provides better performance and is vectorized, while abettermm1 is not vectorized. It is better to use fewer instructions that to do more work (packed SIMD operations) per instruction than it is to use more instructions that retire a few more instructions faster but overall completes less work per cycle (but provides a lower CPI). Using CPI as a metric does not capture this principal and thus many criticize its use as a performance metric. The example case between abettermm1 and abettermm2 illustrates this limitation. Note: CPI should never be between three and four for performance-critical routines.

Once you are aware of this limitation, CPI can be used to indicate places to explore for optimizations. Table 2 lists the performance of the different matrix multiply examples.

RoutineExecution time (secs)
abasicmm1276.9
abasicmm2253.2
abettermm125.9
abettermm221.4
ablockmm114.9
ablockmm210.0
MKL cblas_dgemm2.0

Table 2: Matrix routine performance.

The routine abettermm1 is more than 10 times faster than the original abasicmm1 routine. It is well known that blocking algorithms improve performance for matrix multiply (see http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/lecture-3-multiplication-and-inverse-matrices/ and http://networks.cs.ucdavis.edu/~amitabha/optimizingMatrixMultiplication.pdf).

When blocking is added to the nested loops in ablockmm1, performance improves even more, reducing to 14.9 seconds. Intel Advisor XE reports that the code, ablockmm1, is now vectorized as well as the instruction set used (Intel SSE2) in Figure 3. This is certainly good news, but this is a Skylake processor and Intel AVX 256B_PACKED SIMD operations are available and the 256B_PACKED counter is still 0. This is shown in Intel VTune Amplifier XE as well as in Intel Advisor XE.

The code is still using vector operations one half the width of the registers available on this platform. The next step is to instruct the compiler to generate Intel AVX2 code. This can be done using the compiler option /QxCORE-AVX2. Because we wanted to show the difference between instruction sets, instead of using the compiler option for the full file, a pragma was used to apply the compiler option to only the ablockmm2 routine.

This pragma is: #pragma intel optimization_parameter target_arch=CORE-AVX2. Looking at the Intel VTune Amplifier XE data showing clock ticks or looking at the measured time, doubling the width of the SIMD operations did not reduce execution time in half. There are still many other operations going on: controlling the loop, loading and stores, as well as data movement. Do not always expect performance improvement to match the ratio between the SIMD register width.

The Intel VTune Amplifier XE results now show 256B_PACKED_DOUBLE being executed for routine ablockmm2. It’s worthwhile noting that the preconfigured analysis time general exploration was used to collect this data. The general exploration is excellent for qualitative comparison. General exploration uses multiplexing to sample a wide range of hardware events. As a result, the numbers are not valid quantitative results. Data collection for this article showed a wide variation in floating-point operations retired using general exploration. If you need to get an idea of the amount of floating-point work being done or a ratio of floating-point work, you can create a custom analysis type in Intel VTune Amplifier XE and explicitly collect the desired events using Intel VTune Amplifier XE customer analysis or a package like TAU which builds on top of PAPI* (an open source processor event counter). A custom analysis project in Intel VTune Amplifier XE collecting counts for FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE does not multiplex and produce consistent results from run to run. You can collect these events and divide by clock ticks to obtain a flops rate, which is the best metric to measure vectorization efficiency.

Vectorizing Results with the Intel® Xeon Phi™ Coprocessor

Great performance gains await those willing to improve the vectorization of their code. KNC is particularly sensitive to good vectorization. This has motivated many to improve the vectorization and optimization for KNC. The developers typically saw that the same software changes provided performance gains on the more common Intel® Xeon® processor platforms. Regardless of the platform you target, performance will improve with better vectorization. The KNC generation is unique in the simplicity of its cores with the wide SIMD registers. It is designed with four-way symmetric threading, and its performance shines when three to four of the SMTs are utilized by code that also uses the 512-bit-wide SIMD operations. The sample code tested in this example is sequential, that is, not threaded. The KNC cores did not include the many advances in prefetch technology in other lines of Intel® Xeon® family of processors, and the compiler makes extensive use of explicit prefetch instructions to deliver good performance. The prefetch instructions generated by the compiler for KNC greatly impact performance. When this test code was run, abettermm2 provided the best performance over the block matrix multiplies. This is probably the result of better prefetch.

Unexpectedly, abasicmm2 (SIMD dot product oriented matrix multiply) performed an order of magnitude worse than abasicmm1 (scalar dot product oriented matrix multiply). This is currently unexplained. As the compiler reported this loop as vectorized, it reinforces the admonition to not rely purely on compiler reports, but to combine performance data and compiler reports together. The timings are shown in the clock ticks column of Figure 5. The KNC data was collected with both TAU Performance System and Intel VTune Amplifier XE. The TAU data is displayed graphically using ParaProf (see Figure 4).

Figure 4: ParaProf displays of vector intensity and time (hotspots).

ParaProf allows the user to define a derived metric based on the events collected by TAU. I created the derived metric VPU_ACTIVE_ELEMENTS/VPU_INSTRUCTIONS_EXECUTED as recommended in the KNC tuning guide and ParaProf created the associated bar chart (tuning guide – see https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding). Clock time for the routines is in the ParaProf display on the top window of Figure 4, vector intensity is in the lower window.

As expected, abasicmm1 and abettermm1 have low vector intensity ratios of about 1.0. Routine abasicmm2 shows a mediocre ratio, 3.7, yet this had the worst performance time indicating that this is one of the times there was an overcount of Vector Processing Unit (VPU) operations. Routines abettermm2 as well as both block matrix multiples ablockmm1 and ablockmm2 report a ratio of eight missing performance differences between the three routines. Remember this is for double precision; for single-precision floats you would expect something close to 16. The ratios in the range of one to three of VPU active/VPU instructions retired correspond to poor performance. The Intel VTune Amplifier XE data for KNC are also shown in Figure 5.

To collect data for vector intensity (the ratio of VPU active/VPU instructions) use a command line that includes the options:

-collect general-exploration -knob enable-vpu-metrics=true

The full command used for this collection was

amplxe-cl -collect general-exploration -knob enable-vpu-metrics=true -target-system=mic-native:mic0 -- /home/drm/runscript

where runscript set the environment and invoked the binary. The first column in the Vectorization Usage region is vector intensity (VPU active/VPU executed).

Figure 5: Intel® VTune™ Amplifier XE general exploration viewpoint.

Intel VTune Amplifier XE automatically captured the call to dgemm in the Intel MKL, the TAU Performance Systems instrumentation was not configured to capture metrics for Intel MKL. The Intel VTune Amplifier XE screen capture here is after using the Intel VTune Amplifier XE filter capabilities to filter out rows showing events collected for the OS and other services running on the system. Notice the same low vector intensity for abasicmm1 and abettermm1 (see Figure 5).

The high vector intensity for the call to dgemm in Intel MKL (over eight) shows the impact of counting all vector instructions, not just the floating-point operations. Notice that the CPI for all the routines on KNC is greater than one (see Figure 5).

It is important to remember that unlike other Intel® processors and future Intel® Xeon Phi™ processors, the KNC generation lacks out-of-order execution. These lead to a higher CPI (greater than one). Regardless a CPI of 4.9 or even 60 are reasons to explore. The KNC relies on symmetric multi-processing where multiple threads share execution units, and when one thread stalls another will utilize the execution unit. These samples are single threaded to show the focus purely on vectorization/SIMD.

Summary

Vectorization is important for performance; top performance was obtained only with vectorization. The compiler reports that the code was vectorized is only one indicator; the compiler always vectorized routine abasicmm2, but this routine did not deliver good performance. CPI greater than 1.0 is an indicator of possible low efficiency. CPI on the order of three to four is low efficiency. An examination of the efficiency by considering the ratio of packed SIMD instructions provided a much better calibration of how well code vectorizes. The Skylake generation and newer processors provide the best platforms for evaluating floating-point vectorization. These platforms allow an exact calculation of floating-point operations per cycle (or second) and allow a precise calculation of efficiency. Other metrics are helpful and when combined allow a developer to consider whether the vectorization is efficient.

Intel® XDK FAQs - Cordova

$
0
0

How do I set app orientation?

You set the orientation under the Build Settings section of the Projects tab.

To control the orientation of an iPad you may need to create a simply plugin that contains a single plugin.xml file like the following:

<config-file target="*-Info.plist" parent="UISupportedInterfaceOrientations~ipad" overwrite="true"><string></string></config-file><config-file target="*-Info.plist" parent="UISupportedInterfaceOrientations~ipad" overwrite="true"><array><string>UIInterfaceOrientationPortrait</string></array></config-file> 

Then add the plugin as a local plugin using the plugin manager on the Projects tab.

HINT: to import the plugin.xml file you created above, you must select the folder that contains the plugin.xml file; you cannot select the plugin.xml file itself, using the import dialg, because a typical plugin consists of many files, not a single plugin.xml. The plugin you created based on the instructions above only requires a single file, it is an atypical plugin.

Alternatively, you can use this plugin: https://github.com/yoik/cordova-yoik-screenorientation. Import it as a third-party Cordova* plugin using the plugin manager with the following information:

  • cordova-plugin-screen-orientation
  • specify a version (e.g. 1.4.0) or leave blank for the "latest" version

Or, you can reference it directly from its GitHub repo:

To use the screen orientation plugin referenced above you must add some JavaScript code to your app to manipulate the additional JavaScript API that is provided by this plugin. Simply adding the plugin will not automatically fix your orientation, you must add some code to your app that takes care of this. See the plugin's GitHub repo for details on how to use that API.

Is it possible to create a background service using Intel XDK?

Background services require the use of specialized Cordova* plugins that need to be created specifically for your needs. Intel XDK does not support development or debug of plugins, only the use of them as "black boxes" with your HTML5 app. Background services can be accomplished using Java on Android or Objective C on iOS. If a plugin that backgrounds the functions required already exists (for example, this plugin for background geo tracking), Intel XDK's build system will work with it.

How do I send an email from my App?

You can use the Cordova* email plugin or use web intent - PhoneGap* and Cordova* 3.X.

How do you create an offline application?

You can use the technique described here by creating an offline.appcache file and then setting it up to store the files that are needed to run the program offline. Note that offline applications need to be built using the Cordova* or Legacy Hybrid build options.

How do I work with alarms and timed notifications?

Unfortunately, alarms and notifications are advanced subjects that require a background service. This cannot be implemented in HTML5 and can only be done in native code by using a plugin. Background services require the use of specialized Cordova* plugins that need to be created specifically for your needs. Intel XDK does not support the development or debug of plugins, only the use of them as "black boxes" with your HTML5 app. Background services can be accomplished using Java on Android or Objective C on iOS. If a plugin that backgrounds the functions required already exists (for example, this plugin for background geo tracking) the Intel XDK's build system will work with it.

How do I get a reliable device ID?

You can use the Phonegap/Cordova* Unique Device ID (UUID) plugin for Android*, iOS* and Windows* Phone 8.

How do I implement In-App purchasing in my app?

There is a Cordova* plugin for this. A tutorial on its implementation can be found here. There is also a sample in Intel XDK called 'In App Purchase' which can be downloaded here.

How do I install custom fonts on devices?

Fonts can be considered as an asset that is included with your app, not shared among other apps on the device just like images and CSS files that are private to the app and not shared. It is possible to share some files between apps using, for example, the SD card space on an Android* device. If you include the font files as assets in your application then there is no download time to consider. They are part of your app and already exist on the device after installation.

How do I access the device's file storage?

You can use HTML5 local storage and this is a good article to get started with. Alternatively, there is aCordova* file plugin for that.

Why isn't AppMobi* push notification services working?

This seems to be an issue on AppMobi's end and can only be addressed by them. PushMobi is only available in the "legacy" container. AppMobi* has not developed a Cordova* plugin, so it cannot be used in the Cordova* build containers. Thus, it is not available with the default build system. We recommend that you consider using the Cordova* push notification plugin instead.

How do I configure an app to run as a service when it is closed?

If you want a service to run in the background you'll have to write a service, either by creating a custom plugin or writing a separate service using standard Android* development tools. The Cordova* system does not facilitate writing services.

How do I dynamically play videos in my app?

  1. Download the Javascript and CSS files from https://github.com/videojs and include them in your project file.
  2. Add references to them into your index.html file.
  3. Add a panel 'main1' that will be playing the video. This panel will be launched when the user clicks on the video in the main panel.

     
    <div class="panel" id="main1" data-appbuilder-object="panel" style=""><video id="example_video_1" class="video-js vjs-default-skin" controls="controls" preload="auto" width="200" poster="camera.png" data-setup="{}"><source src="JAIL.mp4" type="video/mp4"><p class="vjs-no-js">To view this video please enable JavaScript*, and consider upgrading to a web browser that <a href=http://videojs.com/html5-video-support/ target="_blank">supports HTML5 video</a></p></video><a onclick="runVid3()" href="#" class="button" data-appbuilder-object="button">Back</a></div>
  4. When the user clicks on the video, the click event sets the 'src' attribute of the video element to what the user wants to watch.

     
    Function runVid2(){
          Document.getElementsByTagName("video")[0].setAttribute("src","appdes.mp4");
          $.ui.loadContent("#main1",true,false,"pop");
    }
  5. The 'main1' panel opens waiting for the user to click the play button.

NOTE: The video does not play in the emulator and so you will have to test using a real device. The user also has to stop the video using the video controls. Clicking on the back button results in the video playing in the background.

How do I design my Cordova* built Android* app for tablets?

This page lists a set of guidelines to follow to make your app of tablet quality. If your app fulfills the criteria for tablet app quality, it can be featured in Google* Play's "Designed for tablets" section.

How do I resolve icon related issues with Cordova* CLI build system?

Ensure icon sizes are properly specified in the intelxdk.config.additions.xml. For example, if you are targeting iOS 6, you need to manually specify the icons sizes that iOS* 6 uses.

<icon platform="ios" src="images/ios/72x72.icon.png" width="72" height="72" /><icon platform="ios" src="images/ios/57x57.icon.png" width="57" height="57" />

These are not required in the build system and so you will have to include them in the additions file.

For more information on adding build options using intelxdk.config.additions.xml, visit: /en-us/html5/articles/adding-special-build-options-to-your-xdk-cordova-app-with-the-intelxdk-config-additions-xml-file

Is there a plugin I can use in my App to share content on social media?

Yes, you can use the PhoneGap Social Sharing plugin for Android*, iOS* and Windows* Phone.

Iframe does not load in my app. Is there an alternative?

Yes, you can use the inAppBrowser plugin instead.

Why are intel.xdk.istablet and intel.xdk.isphone not working?

Those properties are quite old and is based on the legacy AppMobi* system. An alternative is to detect the viewport size instead. You can get the user's screen size using screen.width and screen.height properties (refer to this article for more information) and control the actual view of the webview by using the viewport meta tag (this page has several examples). You can also look through this forum thread for a detailed discussion on the same.

How do I enable security in my app?

We recommend using the App Security API. App Security API is a collection of JavaScript API for Hybrid HTML5 application developers. It enables developers, even those who are not security experts, to take advantage of the security properties and capabilities supported by the platform. The API collection is available to developers in the form of a Cordova plugin (JavaScript API and middleware), supported on the following operating systems: Windows, Android & iOS.
For more details please visit: https://software.intel.com/en-us/app-security-api.

For enabling it, please select the App Security plugin on the plugins list of the Project tab and build your app as a Cordova Hybrid app. After adding the plugin, you can start using it simply by calling its API. For more details about how to get started with the App Security API plugin, please see the relevant sample app articles at: https://software.intel.com/en-us/xdk/article/my-private-photos-sample and https://software.intel.com/en-us/xdk/article/my-private-notes-sample.

Why does my build fail with Admob plugins? Is there an alternative?

Intel XDK does not support the library project that has been newly introduced in the com.google.playservices@21.0.0 plugin. Admob plugins are dependent on "com.google.playservices", which adds Google* play services jar to project. The "com.google.playservices@19.0.0" is a simple jar file that works quite well but the "com.google.playservices@21.0.0" is using a new feature to include a whole library project. It works if built locally with Cordova CLI, but fails when using Intel XDK.

To keep compatible with Intel XDK, the dependency of admob plugin should be changed to "com.google.playservices@19.0.0".

Why does the intel.xdk.camera plugin fail? Is there an alternative?

There seem to be some general issues with the camera plugin on iOS*. An alternative is to use the Cordova camera plugin, instead and change the version to 0.3.3.

How do I resolve Geolocation issues with Cordova?

Give this app a try, it contains lots of useful comments and console log messages. However, use Cordova 0.3.10 version of the geo plugin instead of the Intel XDK geo plugin. Intel XDK buttons on the sample app will not work in a built app because the Intel XDK geo plugin is not included. However, they will partially work in the Emulator and Debug. If you test it on a real device, without the Intel XDK geo plugin selected, you should be able to see what is working and what is not on your device. There is a problem with the Intel XDK geo plugin. It cannot be used in the same build with the Cordova geo plugin. Do not use the Intel XDK geo plugin as it will be discontinued.

Geo fine might not work because of the following reasons:

  1. Your device does not have a GPS chip
  2. It is taking a long time to get a GPS lock (if you are indoors)
  3. The GPS on your device has been disabled in the settings

Geo coarse is the safest bet to quickly get an initial reading. It will get a reading based on a variety of inputs, but is usually not as accurate as geo fine but generally accurate enough to know what town you are located in and your approximate location in that town. Geo coarse will also prime the geo cache so there is something to read when you try to get a geo fine reading. Ensure your code can handle situations where you might not be getting any geo data as there is no guarantee you'll be able to get a geo fine reading at all or in a reasonable period of time. Success with geo fine is highly dependent on a lot of parameters that are typically outside of your control.

Is there an equivalent Cordova* plugin for intel.xdk.player.playPodcast? If so, how can I use it?

Yes, there is and you can find the one that best fits the bill from the Cordova* plugin registry.

To make this work you will need to do the following:

  • Detect your platform (you can use uaparser.js or you can do it yourself by inspecting the user agent string)
  • Include the plugin only on the Android* platform and use <video> on iOS*.
  • Create conditional code to do what is appropriate for the platform detected

You can force a plugin to be part of an Android* build by adding it manually into the additions file. To see what the basic directives are to include a plugin manually:

  1. Include it using the "import plugin" dialog, perform a build and inspect the resulting intelxdk.config.android.xml file.
  2. Then remove it from your Project tab settings, copy the directive from that config file and paste it into the intelxdk.config.additions.xml file. Prefix that directive with <!-- +Android* -->.

More information is available here and this is what an additions file can look like:

<preference name="debuggable" value="true" /><preference name="StatusBarOverlaysWebView" value="false" /><preference name="StatusBarBackgroundColor" value="#000000" /><preference name="StatusBarStyle" value="lightcontent" /><!-- -iOS* --><intelxdk:plugin intelxdk:value="nl.nielsad.cordova.wifiscanner" /><!-- -Windows*8 --><intelxdk:plugin intelxdk:value="nl.nielsad.cordova.wifiscanner" /><!-- -Windows*8 --><intelxdk:plugin intelxdk:value="org.apache.cordova.statusbar" /><!-- -Windows*8 --><intelxdk:plugin intelxdk:value="https://github.com/EddyVerbruggen/Flashlight-PhoneGap-Plugin" />

This sample forces a plugin included with the "import plugin" dialog to be excluded from the platforms shown. You can include it only in the Android* platform by using conditional code and one or more appropriate plugins.

How do I display a webpage in my app without leaving my app?

The most effective way to do so is by using inAppBrowser.

Does Cordova* media have callbacks in the emulator?

While Cordova* media objects have proper callbacks when using the debug tab on a device, the emulator doesn't report state changes back to the Media object. This functionality has not been implemented yet. Under emulation, the Media object is implemented by creating an <audio> tag in the program under test. The <audio> tag emits a bunch of events, and these could be captured and turned into status callbacks on the Media object.

Why does the Cordova version number not match the Projects tab's Build Settings CLI version number, the Emulate tab, App Preview and my built app?

This is due to the difficulty in keeping different components in sync and is compounded by the version numbering convention that the Cordova project uses to distinguish build tool versions (the CLI version) from platform framework versions (the Cordova framework version) and plugin versions.

The CLI version you specify in the Projects tab's Build Settings section is the "Cordova CLI" version that the build system uses to build your app. Each version of the Cordova CLI tools come with a set of "pinned" Cordova platform framework versions, which are tied to the target platform.

NOTE: the specific Cordova platform framework versions shown below are subject to change without notice.

Our Cordova CLI 4.1.2 build system was "pinned" to: 

  • cordova-android@3.6.4 (Android Cordova framework version 3.6.4)
  • cordova-ios@3.7.0 (iOS Cordova framework version 3.7.0)
  • cordova-windows@3.7.0 (Cordova Windows framework version 3.7.0)

Our Cordova CLI 5.1.1 build system is "pinned" to:

  • cordova-android@4.1.1 (as of March 23, 2016)
  • cordova-ios@3.8.0
  • cordova-windows@4.0.0

Our Cordova CLI 5.4.1 build system is "pinned" to: 

  • cordova-android@5.0.0
  • cordova-ios@4.0.1
  • cordova-windows@4.3.1

Our CLI 5.4.1 build system really should be called "CLI 5.4.1+" because the platform framework versions it uses are closer to the "pinned" versions in the Cordova CLI 6.0.0 release than those "pinned" in the original CLI 5.4.1 release.

The Cordova platform framework version you get when you build an app does not equal the CLI version number in the Build Settings section of the Projects tab; it equals the Cordova platform framework version that is "pinned" to our build system's CLI version (see the list of pinned versions, above).

Technically, the target-specific Cordova frameworks can be updated [independently] for a given version of CLI tools. In some cases, our build system may use a Cordova platform framework version that is later than the version that was "pinned" to the CLI when it was originally released by the Cordova project (that is, the Cordova framework versions originally specified by the Cordova CLI x.y.z links above).

The reasons you may see Cordova framework version differences between the Emulate tab, App Preview and your built app are:

  • The Emulate tab has one specific Cordova framework version built into it. We try to make sure that version of the Cordova framework closely matches the default Intel XDK version of Cordova CLI.

  • App Preview is released independently of the Intel XDK and, therefore, may use a different version than what you will see reported by the Emulate tab or your built app. Again, we try to release App Preview so it matches the version of the Cordova framework that is considered the default version for the Intel XDK at the time App Preview is released; but since the various tools are not always released in perfect sync, that is not always possible.

  • Your app is built with a "pinned" Cordova platform framework version, which is determined by the Cordova CLI version you specified in the Projects tab's Build Settings section. There are always at least two different CLI versions available in the Intel XDK build system.

  • For those versions of Crosswalk that were built with the Intel XDK CLI 4.1.2 build system, the cordova-android framework version was determined by the Crosswalk project, not by the Intel XDK build system.

  • For those versions of Crosswalk that are built with Intel XDK CLI 5.1.1 and later, the cordova-android framework version equals the "pinned" cordova-android platform version for that CLI version (see lists above).

Do these Cordova framework version numbers matter? Occasionally, yes, but normally, not that much. There are some issues that come up that are related to the Cordova framework version, but they tend to be rare. The majority of the bugs and compatibility issues you will experience in your app have more to do with the versions and mix of Cordova plugins you choose to use and the HTML5 webview runtime on your test devices. See this blog for more details about what a webview is and why the webview matters to your app: When is an HTML5 Web App a WebView App?.

The "default version" of the CLI that the Intel XDK uses is rarely the most recent version of the Cordova CLI tools distributed by the Cordova project. There is always a lag between Cordova project releases and our ability to incorporate those releases into our build system and the various Intel XDK components. We are not able to provide every release that is made by the Cordova project.

How do I add a third party plugin?

Please follow the instructions on this doc page to add a third-party plugin: Adding Plugins to Your Intel® XDK Cordova* App -- this plugin is not being included as part of your app. You will see it in the build log if it was successfully added to your build.

How do I make an AJAX call that works in my browser work in my app?

Please follow the instructions in this article: Cordova CLI 4.1.2 Domain Whitelisting with Intel XDK for AJAX and Launching External Apps.

I get an "intel is not defined" error, but my app works in Test tab, App Preview and Debug tab. What's wrong?

When your app runs in the Test tab, App Preview or the Debug tab the intel.xdk and core Cordova functions are automatically included for easy debug. That is, the plugins required to implement those APIs on a real device are already included in the corresponding debug modules.

When you build your app you must include the plugins that correspond to the APIs you are using in your build settings. This means you must enable the Cordova and/or XDK plugins that correspond to the APIs you are using. Go to the Projects tab and insure that the plugins you need are selected in your project's plugin settings. See Adding Plugins to Your Intel® XDK Cordova* App for additional details.

How do I target my app for use only on an iPad or only on an iPhone?

There is an undocumented feature in Cordova that should help you (the Cordova project provided this feature but failed to document it for the rest of the world). If you use the appropriate preference in theintelxdk.config.additions.xml file you should get what you need:

<preference name="target-device" value="tablet" />     <!-- Installs on iPad, not on iPhone --><preference name="target-device" value="handset" />    <!-- Installs on iPhone, iPad installs in a zoomed view and doesn't fill the entire screen --><preference name="target-device" value="universal" />  <!-- Installs on iPhone and iPad correctly -->

If you need info regarding the additions.xml file, see the blank template or this doc file: Adding Intel® XDK Cordova Build Options Using the Additions File.

Why does my build fail when I try to use the Cordova* Capture Plugin?

The Cordova* Capture plugin has a dependency on the File Plugin. Please make sure you both plugins selected on the projects tab.

How can I pinch and zoom in my Cordova* app?

For now, using the viewport meta tag is the only option to enable pinch and zoom. However, its behavior is unpredictable in different webviews. Testing a few samples apps has led us to believe that this feature is better on Crosswalk for Android. You can test this by building the Hello Cordova sample app for Android and Crosswalk for Android. Pinch and zoom will work on the latter only though they both have:

<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=yes, minimum-scale=1, maximum-scale=2">.

Please visit the following pages to get a better understanding of when to build with Crosswalk for Android:

http://blogs.intel.com/evangelists/2014/09/02/html5-web-app-webview-app/

https://software.intel.com/en-us/xdk/docs/why-use-crosswalk-for-android-builds

Another device oriented approach is to enable it by turning on Android accessibility gestures.

How do I make my Android application use the fullscreen so that the status and navigation bars disappear?

The Cordova* fullscreen plugin can be used to do this. For example, in your initialization code, include this function AndroidFullScreen.immersiveMode(null, null);.

You can get this third-party plugin from here https://github.com/mesmotronic/cordova-fullscreen-plugin

How do I add XXHDPI and XXXHDPI icons to my Android or Crosswalk application?

The Cordova CLI 4.1.2 build system will support this feature, but our 4.1.2 build system (and the 2170 version of the Intel XDK) does not handle the XX and XXX sizes directly. Use this workaround until these sizes are supported directly:

  • copy your XX and XXX icons into your source directory (usually named www)
  • add the following lines to your intelxdk.config.additions.xml file
  • see this Cordova doc page for some more details

Assuming your icons and splash screen images are stored in the "pkg" directory inside your source directory (your source directory is usually named www), add lines similar to these into yourintelxdk.config.additions.xml file (the precise name of your png files may be different than what is shown here):

<!-- for adding xxhdpi and xxxhdpi icons on Android --><icon platform="android" src="pkg/xxhdpi.png" density="xxhdpi" /><icon platform="android" src="pkg/xxxhdpi.png" density="xxxhdpi" /><splash platform="android" src="pkg/splash-port-xhdpi.png" density="port-xhdpi"/><splash platform="android" src="pkg/splash-land-xhdpi.png" density="land-xhdpi"/>

The precise names of your PNG files are not important, but the "density" designations are very important and, of course, the respective resolutions of your PNG files must be consistent with Android requirements. Those density parameters specify the respective "res-drawable-*dpi" directories that will be created in your APK for use by the Android system. NOTE: splash screen references have been added for reference, you do not need to use this technique for splash screens.

You can continue to insert the other icons into your app using the Intel XDK Projects tab.

Which plugin is the best to use with my app?

We are not able to track all the plugins out there, so we generally cannot give you a "this is better than that" evaluation of plugins. Check the Cordova plugin registry to see which plugins are most popular and check Stack Overflow to see which are best supported; also, check the individual plugin repos to see how well the plugin is supported and how frequently it is updated. Since the Cordova platform and the mobile platforms continue to evolve, those that are well-supported are likely to be those that have good activity in their repo.

Keep in mind that the XDK builds Cordova apps, so whichever plugins you find being supported and working best with other Cordova (or PhoneGap) apps would likely be your "best" choice.

See Adding Plugins to Your Intel® XDK Cordova* App for instructions on how to include third-party plugins with your app.

What are the rules for my App ID?

The precise App ID naming rules vary as a function of the target platform (eg., Android, iOS, Windows, etc.). Unfortunately, the App ID naming rules are further restricted by the Apache Cordova project and sometimes change with updates to the Cordova project. The Cordova project is the underlying technology that your Intel XDK app is based upon; when you build an Intel XDK app you are building an Apache Cordova app.

CLI 5.1.1 has more restrictive App ID requirements than previous versions of Apache Cordova (the CLI version refers to Apache Cordova CLI release versions). In this case, the Apache Cordova project decided to set limits on acceptable App IDs to equal the minimum set for all platforms. We hope to eliminate this restriction in a future release of the build system, but for now (as of the 2496 release of the Intel XDK), the current requirements for CLI 5.1.1 are:

  • Each section of the App ID must start with a letter
  • Each section can only consist of letters, numbers, and the underscore character
  • Each section cannot be a Java keyword
  • The App ID must consist of at least 2 sections (each section separated by a period ".").

iOS /usr/bin/codesign error: certificate issue for iOS app?

If you are getting an iOS build fail message in your detailed build log that includes a reference to a signing identity error you probably have a bad or inconsistent provisioning file. The "no identity found" message in the build log excerpt, below, means that the provisioning profile does not match the distribution certificate that was uploaded with your application during the build phase.

Signing Identity:     "iPhone Distribution: XXXXXXXXXX LTD (Z2xxxxxx45)"
Provisioning Profile: "MyProvisioningFile"
                      (b5xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxe1)

    /usr/bin/codesign --force --sign 9AxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA6 --resource-rules=.../MyApp/platforms/ios/build/device/MyApp.app/ResourceRules.plist --entitlements .../MyApp/platforms/ios/build/MyApp.build/Release-iphoneos/MyApp.build/MyApp.app.xcent .../MyApp/platforms/ios/build/device/MyApp.app
9AxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxA6: no identity found
Command /usr/bin/codesign failed with exit code 1

** BUILD FAILED **


The following build commands failed:
    CodeSign build/device/MyApp.app
(1 failure)

The excerpt shown above will appear near the very end of the detailed build log. The unique number patterns in this example have been replaced with "xxxx" strings for security reasons. Your actual build log will contain hexadecimal strings.

iOS Code Sign error: bundle ID does not match app ID?

If you are getting an iOS build fail message in your detailed build log that includes a reference to a "Code Sign error" you may have a bad or inconsistent provisioning file. The "Code Sign" message in the build log excerpt, below, means that the bundle ID you specified in your Apple provisioning profile does not match the app ID you provided to the Intel XDK to upload with your application during the build phase.

Code Sign error: Provisioning profile does not match bundle identifier: The provisioning profile specified in your build settings (MyBuildSettings) has an AppID of my.app.id which does not match your bundle identifier my.bundleidentifier.
CodeSign error: code signing is required for product type 'Application' in SDK 'iOS 8.0'

** BUILD FAILED **

The following build commands failed:
    Check dependencies
(1 failure)
Error code 65 for command: xcodebuild with args: -xcconfig,...

The message above translates into "the bundle ID you entered in the project settings of the XDK does not match the bundle ID (app ID) that you created on Apples developer portal and then used to create a provisioning profile."

What are plugin variables used for? Why do I need to supply plugin variables?

Some plugins require details that are specific to your app or your developer account. For example, to authorize your app as an app that belongs to you, the developer, so services can be properly routed to the service provider. The precise reasons are dependent on the specific plugin and its function.

What happened to the Intel XDK "legacy" build options?

On December 14, 2015 the Intel XDK legacy build options were retired and are no longer available to build apps. The legacy build option is based on three year old technology that predates the current Cordova project. All Intel XDK development efforts for the past two years have been directed at building standard Apache Cordova apps.

Many of the intel.xdk legacy APIs that were supported by the legacy build options have been migrated to standard Apache Cordova plugins and published as open source plugins. The API details for these plugins are available in the README.md files in the respective 01.org GitHub repos. Additional details regarding the new Cordova implementations of the intel.xdk legacy APIs are available in the doc page titled Intel XDK Legacy APIs.

Standard Cordova builds do not require the use of the "intelxdk.js" and "xhr.js" phantom scripts. Only the "cordova.js" phantom script is required to successfully build Cordova apps. If you have been including "intelxdk.js" and "xhr.js" in your Cordova builds they have been quietly ignored. You should remove references to these files from your "index.html" file; leaving them in will do no harm, it simply results in a warning that the respective script file cannot be found at runtime.

The Emulate tab will continue to support some legacy intel.xdk APIs that are NOT supported in the Cordova builds (only those intel.xdk APIs that are supported by the open source plugins are available to a Cordova built app, and only if you have included the respective intel.xdk plugins). This Emulate tab discrepancy will be addressed in a future release of the Intel XDK.

More information can be found in this forum post > https://software.intel.com/en-us/forums/intel-xdk/topic/601436.

Which build files do I submit to the Windows Store and which do I use for testing my app on a device?

There are two things you can do with the build files generated by the Intel XDK Windows build options: side-load your app onto a real device (for testing) or publish your app in the Windows Store (for distribution). Microsoft has changed the files you use for these purposes with each release of a new platform. As of December, 2015, the packages you might see in a build, and their uses, are:

  • appx works best for side-loading, and can also be used to publish your app.
  • appxupload is preferred for publishing your app, it will not work for side-loading.
  • appxbundle will work for both publishing and side-loading, but is not preferred.
  • xap is for legacy Windows Phone; works for both publishing and side-loading.

In essence: XAP (WP7) was superseded by APPXBUNDLE (Win8 and WP8.0), which was superseded by APPX (Win8/WP8.1/UAP), which has been supplemented with APPXUPLOAD. APPX and APPXUPLOAD are the preferred formats. For more information regarding these file formats, see Upload app packages on the Microsoft developer site.

Side-loading a Windows Phone app onto a real device, over USB, requires a Windows 8+ development system (see Side-Loading Windows* Phone Apps for complete instructions). If you do not have a physical Windows development machine you can use a virtual Windows machine or use the Window Store Beta testing and targeted distribution technique to get your app onto real test devices.

Side-loading a Windows tablet app onto a Windows 8 or Windows 10 laptop or tablet is simpler. Extract the contents of the ZIP file that you downloaded from the Intel XDK build system, open the "*_Test" folder inside the extracted folder, and run the PowerShell script (ps1 file) contained within that folder on the test machine (the machine that will run your app). The ps1 script file may need to request a "developer certificate" from Microsoft before it will install your test app onto your Windows test system, so your test machine may require a network connection to successfully side-load your Windows app.

The side-loading process may not over-write an existing side-loaded app with the same ID. To be sure your test app properly side-loads, it is best to uninstall the old version of your app before side-loading a new version on your test system.

How do I implement local storage or SQL in my app?

See this summary of local storage options for Cordova apps written by Josh Morony, A Summary of Local Storage Options for PhoneGap Applications.

How do I prevent my app from auto-completing passwords?

Use the Ionic Keyboard plugin and set the spellcheck attribute to false.

Why does my PHP script not run in my Intel XDK Cordova app?

Your XDK app is not a page on a web server; you cannot use dynamic web server techniques because there is no web server associated with your app to which you can pass off PHP scripts and similar actions. When you build an Intel XDK app you are building a standalone Cordova client web app, not a dynamic server web app. You need to create a RESTful API on your server that you can then call from your client (the Intel XDK Cordova app) and pass and return data between the client and server through that RESTful API (usually in the form of a JSON payload).

Please see this StackOverflow post and this article by Ray Camden, a longtime developer of the Cordova development environment and Cordova apps, for some useful background.

Following is a lightly edited recommendation from an Intel XDK user:

I came from php+mysql web development. My first attempt at an Intel XDK Cordova app was to create a set of php files to query the database and give me the JSON. It was a simple job, but totally insecure.

Then I found dreamfactory.com, an open source software that automatically creates the REST API functions from several databases, SQL and NoSQL. I use it a lot. You can start with a free account to develop and test and then install it in your server. Another possibility is phprestsql.sourceforge.net, this is a library that does what I tried to develop by myself. I did not try it, but perhaps it will help you.

And finally, I'm using PouchDB and CouchDB"A database for the web." It is not SQL, but is very useful and easy if you need to develop a mobile app with only a few tables. It will also work with a lot of tables, but for a simple database it is an easy place to start.

I strongly recommend that you start to learn these new ways to interact with databases, you will need to invest some time but is the way to go. Do not try to use MySQL and PHP the old fashioned way, you can get it work but at some point you may get stuck.

Why doesn’t my Cocos2D game work on iOS?

This is an issue with Cocos2D and is not a reflection of our build system. As an interim solution, we have modified the CCBoot.js file for compatibility with iOS and App Preview. You can view an example of this modification in this CCBoot.js file from the Cocos2d-js 3.1 Scene GUI sample. The update has been applied to all cocos2D templates and samples that ship with Intel XDK. 

The fix involves two lines changes (for generic cocos2D fix) and one additional line (for it to work on App Preview on iOS devices):

Generic cocos2D fix -

1. Inside the loadTxt function, xhr.onload should be defined as

xhr.onload = function () {
    if(xhr.readyState == 4)
        xhr.responseText != "" ? cb(null, xhr.responseText) : cb(errInfo);
    };

instead of

xhr.onload = function () {
    if(xhr.readyState == 4)
        xhr.status == 200 ? cb(null, xhr.responseText) : cb(errInfo);
    };

2. The condition inside _loadTxtSync function should be changed to 

if (!xhr.readyState == 4 || (xhr.status != 200 || xhr.responseText != "")) {

instead of 

if (!xhr.readyState == 4 || xhr.status != 200) {

 

App Preview fix -

Add this line inside of loadTxtSync after _xhr.open:

xhr.setRequestHeader("iap_isSyncXHR", "true");

How do I change the alias of my Intel XDK Android keystore certificate?

You cannot change the alias name of your Android keystore within the Intel XDK, but you can download the existing keystore, change the alias on that keystore and upload a new copy of the same keystore with a new alias.

Use the following procedure:

  • Download the converted legacy keystore from the Intel XDK (the one with the bad alias).

  • Locate the keytool app on your system (this assumes that you have a Java runtime installed on your system). On Windows, this is likely to be located at %ProgramFiles%\Java\jre8\bin (you might have to adjust the value of jre8 in the path to match the version of Java installed on your system). On Mac and Linux systems it is probably located in your path (in /usr/bin).

  • Change the alias of the keystore using this command (see the keytool -changealias -help command for additional details):

keytool -changealias -alias "existing-alias" -destalias "new-alias" -keypass keypass -keystore /path/to/keystore -storepass storepass
  • Import this new keystore into the Intel XDK using the "Import Existing Keystore" option in the "Developer Certificates" section of the "person icon" located in the upper right corner of the Intel XDK.

What causes "The connection to the server was unsuccessful. (file:///android_asset/www/index.html)" error?

See this forum thread for some help with this issue. This error is most likely due to errors retrieving assets over the network or long delays associated with retrieving those assets.

How do I manually sign my Android or Crosswalk APK file with the Intel XDK?

To sign an app manually, you must build your app by "deselecting" the "Signed" box in the Build Settings section of the Android tab on the Projects tab:

Follow these Android developer instructions to manually sign your app. The instructions assume you have Java installed on your system (for the jarsigner and keytool utilities). You may have to locate and install the zipalign tool separately (it is not part of Java) or download and install Android Studio.

These two sections of the Android developer Signing Your Applications article are also worth reading:

Why should I avoid using the additions.xml file? Why should I use the Plugin Management Tool in the Intel XDK?

Intel XDK (2496 and up) now includes a Plugin Management Tool that simplifies adding and managing Cordova plugins. We urge all users to manage their plugins from existing or upgraded projects using this tool. If you were using intelxdk.config.additions.xml file to manage plugins in the past, you should remove them and use the Plugin Management Tool to add all plugins instead.

Why you should be using the Plugin Management Tool:

  • It can now manage plugins from all sources. Popular plugins have been added to the the Featured plugins list. Third party plugins can be added from the Cordova Plugin Registry, Git Repo and your file system.

  • Consistency: Unlike previous versions of the Intel XDK, plugins you add are now stored as a part of your project on your development system after they are retrieved by the Intel XDK and copied to your plugins directory. These plugin files are delivered, along with your source code files, to the Intel XDK cloud-based build server. This change ensures greater consistency between builds, because you always build with the plugin version that was retrieved by the Intel XDK into your project. It also provides better documentation of the components that make up your Cordova app, because the plugins are now part of your project directory. This is also more consistent with the way a standard Cordova CLI project works.

  • Convenience: In the past, the only way to add a third party plugin that required parameters was to include it in the intelxdk.config.additions.xml file. This plugin would then be added to your project by the build system. This is no longer recommended. With the new Plugin Management Tool, it automatically parses the plugin.xml file and prompts to add any plugin variables from within the XDK.

    When a plugin is added via the Plugin Management Tool, a plugin entry is added to the project file and the plugin source is downloaded to the plugins directory making a more stable project. After a build, the build system automatically generates config xml files in your project directory that includes a complete summary of plugins and variable values.

  • Correctness of Debug Module: Intel XDK now provides remote on-device debugging for projects with third party plugins by building a custom debug module from your project plugins directory. It does not write or read from the intelxdk.config.additions.xml and the only time this file is used is during a build. This means the debug module is not aware of your plugin added via the intelxdk.config.additions.xml file and so adding plugins via intelxdk.config.additions.xml file should be avoided. Here is a useful article for understanding Intel XDK Build Files.

  • Editing Plugin Sources: There are a few cases where you may want to modify plugin code to fix a bug in a plugin, or add console.log messages to a plugin's sources to help debug your application's interaction with the plugin. To accomplish these goals you can edit the plugin sources in the plugins directory. Your modifications will be uploaded along with your app sources when you build your app using the Intel XDK build server and when a custom debug module is created by the Debug tab.

How do I fix this "unknown error: cannot find plugin.xml" when I try to remove or change a plugin?

Removing a plugin from your project generates the following error:

Sometimes you may see this error:

This is not a common problem, but if it does happen it means a file in your plugin directory is probably corrupt (usually one of the json files found inside the plugins folder at the root of your project folder).

The simplest fix is to:

  • make a list of ALL of your plugins (esp. the plugin ID and version number, see image below)
  • exit the Intel XDK
  • delete the entire plugins directory inside your project
  • restart the Intel XDK

The XDK should detect that all of your plugins are missing and attempt to reinstall them. If it does not automatically re-install all or some of your plugins, then reinstall them manually from the list you saved in step one (see the image below for the important data that documents your plugins).

NOTE: if you re-install your plugins manually, you can use the third-party plugin add feature of the plugin management system to specify the plugin id to get your plugins from the Cordova plugin registry. If you leave the version number blank the latest version of the plugin that is available in the registry will be retrieved by the Intel XDK.

Why do I get a "build failed: the plugin contains gradle scripts" error message?

You will see this error message in your Android build log summary whenever you include a Cordova plugin that includes a gradle script in your project. Gradle scripts add extra Android build instructions that are needed by the plugin.

The current Intel XDK build system does not allow the use of plugins that contain gradle scripts because they present a security risk to the build system and your Intel XDK account. An unscrupulous user could use a gradle-enabled plugin to do harmful things with the build server. We are working on a build system that will insure the necessary level of security to allow for gradle scripts in plugins, but until that time, we cannot support those plugins that include gradle scripts.

The error message in your build summary log will look like the following:

In some cases the plugin gradle script can be removed, but only if you manually modify the plugin to implement whatever the gradle script was doing automatically. In some cases this can be done easily (for example, the gradle script may be building a JAR library file for the plugin), but sometimes the plugin is not easily modified to remove the need for the gradle script. Exactly what needs to be done to the plugin depends on the plugin and the gradle script.

You can find out more about Cordova plugins and gradle scripts by reading this section of the Cordova documentation. In essence, if a Cordova plugin includes a build-extras.gradle file in the plugin's root folder, or if it contains one or more lines similar to the following, inside the plugin.xml file:

<framework src="some.gradle" custom="true" type="gradleReference" />

it means that the plugin contains gradle scripts and will be rejected by the Intel XDK build system.

How does one remove gradle dependencies for plugins that use Google Play Services (esp. push plugins)?

Our Android (and Crosswalk) CLI 5.1.1 and CLI 5.4.1 build systems include a fix for an issue in the standard Cordova build system that allows some Cordova plugins to be used with the Intel XDK build system without their included gradle script!

This fix only works with those Cordova plugins that include a gradle script for one and only one purpose: to set the value of applicationID in the Android build project files (such a gradle script copies the value of the App ID from your project's Build Settings, on the Projects tab, to this special project build variable).

Using the phonegap-plugin-push as an example, this Cordova plugin contains a gradle script named push.gradle, that has been added to the plugin and looks like this:

import java.util.regex.Pattern

def doExtractStringFromManifest(name) {
    def manifestFile = file(android.sourceSets.main.manifest.srcFile)
    def pattern = Pattern.compile(name + "=\"(.*?)\"")
    def matcher = pattern.matcher(manifestFile.getText())
    matcher.find()
    return matcher.group(1)
}

android {
    sourceSets {
        main {
            manifest.srcFile 'AndroidManifest.xml'
        }
    }

    defaultConfig {
        applicationId = doExtractStringFromManifest("package")
    }
}

All this gradle script is doing is inserting your app's "package ID" (the "App ID" in your app's Build Settings) into a variable called applicationID for use by the build system. It is needed, in this example, by the Google Play Services library to insure that calls through the Google Play Services API can be matched to your app. Without the proper App ID the Google Play Services library cannot distinguish between multiple apps on an end user's device that are using the Google Play Services library, for example.

The phonegap-plugin-push is being used as an example for this article. Other Cordova plugins exist that can also be used by applying the same technique (e.g., the pushwoosh-phonegap-plugin will also work using this technique). It is important that you first determine that only one gradle script is being used by the plugin of interest and that this one gradle script is used for only one purpose: to set the applicationID variable.

How does this help you and what do you do?

To use a plugin with the Intel XDK build system that includes a single gradle script designed to set the applicationID variable:

  • Download a ZIP of the plugin version you want to use (e.g. version 1.6.3) from that plugin's git repo.

    IMPORTANT:be sure to download a released version of the plugin, the "head" of the git repo may be "under construction" -- some plugin authors make it easy to identify a specific version, some do not, be aware and careful when choosing what you clone from a git repo! 

  • Unzip that plugin onto your local hard drive.

  • Remove the <framework> line that references the gradle script from the plugin.xml file.

  • Add the modified plugin into your project as a "local" plugin (see the image below).

In this example, you will be prompted to define a variable that the plugin also needs. If you know that variable's name (it's called SENDER_ID for this plugin), you can add it using the "+" icon in the image above, and avoid the prompt. If the plugin add was successful, you'll find something like this in the Projects tab:

If you are curious, you can inspect the AndroidManifest.xml file that is included inside your built APK file (you'll have to use a tool like apktool to extract and reconstruct it from you APK file). You should see something like the following highlighted line, which should match your App ID, in this example, the App ID was io.cordova.hellocordova:

If you see the following App ID, it means something went wrong. This is the default App ID for the Google Play Services library that will cause collisions on end-user devices when multiple apps that are using Google Play Services use this same default App ID:

Back to FAQs Main

Reach: Intel® Edison based device is making highly precise GPS affordable for everyone

$
0
0

We’re used to having GPS receivers integrated in consumer devices all around us, but highly accurate GPS has, until now, been out of the reach of most people. For in-car navigation systems or smartphone applications, accuracy of 10 to 15 meters is good enough to enable people to find their way around. Highly accurate GPS receivers, with tolerances measured in centimeters, have been restricted to professional applications including surveying, construction, precision agriculture and drones because the systems cost tens of thousands of dollars.

Now, that’s changing, thanks to Emlid, the company behind Reach, a high-precision GPS receiver available for $235, based on the Intel® Edison microcomputer platform. The device weighs just 12g, so it can be easily mounted on a drone, and measures 26mm x 45mm.

Reach makes high-precision GPS available in a small form factor suitable for drones and other applications

“For drone hobbyists, Reach enables repeatable precision landings,” said Igor Vereninov, co-founder at Emlid. “We can georeference the images taken from drones really well, which is important for building precise 3D models of terrain. Reach is being used in the Alps to monitor glacier movement in real time, there’s a project that’s tracking race horses, and another that is mapping the sea floor to help with applications such as flood modelling.”

He adds: “We are working in partnership with Sigro Pilot to deliver a solution for machinery guidance. Sigro is working on the navigation software that installs on an Android tablet and uses precise coordinates from RTK GPS like Reach to help the driver follow the path. If farmers can reduce the overlap between runs when seeding or fertilizing a field, they can save up to 10% of their resources by only sowing and fertilizing where they need to.”

The machinery guidance solution on an Android tablet

With surveying equipment often being prohibitively expensive for the developing world, Reach also makes professional measurement affordable for construction and surveying applications there.

One Reach user compared the accuracy of Reach with that of professional surveying equipment by placing 20 control markers over a site that was 2km by 5km and surveying the points using both devices. He found the coordinates varied by just 5cm, while Reach costs a fraction of the price of the professional device.

All about that base

Like existing professional devices, Reach uses Real-Time Kinematic (RTK) algorithms and differential global positioning system (DGPS) technology to increase the precision of the measured position. The method uses the phase of the satellite signal’s carrier wave, rather than the information in the signal, because the carrier measurements have a lower margin of error. However, the challenge is to correctly align the signals, and ensure that calculations are not off by one wave, or more waves, introducing inaccuracies in the calculated positions. RTK uses a base station at a known coordinate. The base station broadcasts its known location together with the code and carrier measurements for all the satellites in view, so the mobile clients can align the carrier wave phases correctly. This enables the mobile devices to calculate their position relative to the base station with a high degree of precision. Base stations can be temporary or permanent and many devices, up to 20km away, can use the same base station for their corrections.

A Reach device can be used as a mobile client, or as a base station if it is stationary and its coordinates are known. This way, a system of two receivers connected to one another using wi-fi Internet access or radio communications can calculate precise coordinates even if no other DGPS base stations are available.

The position of base stations can be measured accurately using commercially or freely available correction services in many countries, including the Continuously Operating Reference Stations (CORS) network. If the precise location of the base station is not known, the mobile clients can still consistently and accurately calculate their position relative to the base station.

PylonGPS is developed by Charles West, a researcher from NC State University who works to create a service that would make it simple to stream the GPS correction data from Reach RTK base stations to anyone who wants it over the Internet. Emlid hopes to incorporate this in a future software release so that users can share the corrections from their base stations, and make more accurate positioning more accessible to everyone.

Developing Reach

The founders of Emlid previously integrated survey-grade GPS receivers into autopilots and mapping systems, costing many thousands of dollars, for a major commercial manufacturer. “We can’t completely replace those professional systems,” said Vereninov. “They perform better in challenging environments and can have larger distances from rover to base. But we are bringing a similar precision at a drastically lower price point.”

The idea for Reach came when Emlid sampled the RTKLIB toolkit developed by Tomoji Takasu and supported by the open source community (the toolkit has a BSD License). “We were using it to postprocess data from flights, but we quickly saw its potential,” said Vereninov. “If we wanted to do real-time positioning, though, we needed to have the receiver connected to a PC at all times. For surveying work, you use a laptop, for example, but that’s too heavy for a drone or other small device. We had the idea to use an Intel Edison single board computer, which can replace the laptop in this application, so RTKLIB runs on the GPS receiver itself.”

Positioning accuracy with the help of Reach. Red dots show normal GPS precision in good conditions. Green dots show positions provided by Reach in RTK mode. The receiver was kept stationary so in an ideal situation there would be no deviation. The graph shows that Reach provides positions with less deviation, because the green dots are much closer together. The numbered dots are the most recently plotted points in this real-time graph. Please note the scale.

A look inside Reach

The Reach hardware platform consists of three components: an Intel Edison computer running a custom, Yocto-built, Linux operating system and RTKLIB; a U-blox GPS-receiver; and an external Tallysman antenna, which plays an important role in enabling accurate positioning. The antenna weighs approximately 20g and measures approximately 30mm x 30mm, so it can be easily fitted to a drone or anywhere else where space is limited. Users can choose to fit an alternative antenna if it is better suited to their application. For example, a car or harvester might use a rugged unit that suppresses interference from the machinery. “The antenna quality and placement are essential for RTK,” said Vereninov. “It should be the first thing to plan on the device to ensure the best sky visibility and least interference. Everything else flows from that.”

The receiver supports the GPS satellite constellation of the US, as well as Russia’s Glonass, China’s Beidou, and Japan’s QZSS. In the future, Emlid plans to add support for the EU’s Galileo satellite constellation to improve signal availability. Using more satellites enables Reach to improve the precision of the calculated coordinates and the solution availability.

In addition to the satellite receiver, Reach has a number of other sensors: a tri-axial gyroscope, an accelerometer and a magnetometer. In the future, the device software will be upgraded to enable it to use these sensors to extrapolate the position when the signal is blocked (by the device going through a tunnel, for example) and to measure the tilt or roll of the device.

The device can also record timing markers from any pulse clock source. A common problem in 3D mapping is that you need to know the exact location of where the aerial photos were taken. When an aircraft is moving at 20m/s, a buffering delay on the camera of 1 second can result in an image being 20m off from the coordinate recorded when it was triggered. Reach can be connected to the hot shoe of the camera to precisely timestamp when the flash was fired, which happens at the same time as the photo is taken. By analyzing the GPS data after the flight, the photos can be accurately matched with their position.

Reach comes with the connectors needed to integrate with major autopilots on the market, and Emlid has recently released integrations with Pixhawk and its own Navio autopilot. The integration with the autopilot enables GPS data to be tunneled through the radio signal used for flight control and mission planning.

Precision landings: A video shows the precision of a drone landing using Reach

Building on Intel Edison

Why did the developers select Edison? “The size and form factor were important, especially for drone applications,” said Vereninov. “We were able to build a device that is not larger than the Intel Edison board itself. The best thing about Intel Edison is the price-performance ratio. It’s a really powerful platform, using x86 architecture, with wi-fi and Bluetooth integrated. We use the built-in Bluetooth to connect Android tablets to Reach, so Reach can replace the GPS on Android devices for more accurate geographic information systems (GIS) applications. Wi-fi is important for connecting to the Internet to receive corrections from the base station. When you’re trying to integrate radio frequency equipment, you usually need to certify it, but with Intel Edison everything is preintegrated and precertified.”

The Intel Edison platform has 4 GB memory to log data, so that it can be analyzed with greater precision after data collection. USB can be used to connect radios for communication between Reach devices and in the future 3G communications and flash drive logging could be added through software updates.

The device has a number of interfaces. UART can be used to integrate with autopilots, and GPIO pins can be used to control a camera or trigger another device. I2C is often used in drones to communicate to external devices and is also supported on Reach.

Reach can be powered through the USB port, powered from the autopilot in a drone, or powered using a power bank typically used for charging a mobile phone.

Satellite receiverU-blox – 72 channels, output rate up to 18 Hz, supports GPS/QZSS L1 C/A, GLONASS L10F, BeiDou B1, SBAS L1 C/A: WAAS, EGNOS, MSAS, Galileo-ready E1B/C
Computer platformIntel Edison – dual-core 500 MHz
Interfaces I2C, UART, GPIO, TimeStamp, OTG USB, Bluetooth, wi-fi, GNSS
Size 26mm × 45mm
Weight12g

Managing Reach in the browser

When you switch on the Reach device, it automatically runs the wi-fi hotspot mode, so a user can connect to it with a smartphone and change the settings to connect it to a wi-fi network for software updates.

The ReachView software enables any device with a web browser to connect to Reach to set up RTKLIB, view device logs, assess the quality of satellite observations, change streaming settings, configure connections and check the device position.  The ReachView software is open source and is distributed under the GNU GPLv3 License. It was developed using Python and WebSockets for a lightweight, real-time interface. “ReachView provides access to all the features of RTKLIB,” said Vereninov. “At the same time, we’ve simplified it by explaining what each feature does and using understandable names for the parameters. We’ve designed the interface so you don’t have to be a surveyor or RTK specialist to set up the devices.”

To support users, Emlid has launched a forum, and provided detailed documentation on the website.

ReachView: Using the ReachView software, any device with a browser can be used to access a Reach device.

Crowdfunding success

In 2015 the makers of Reach launched a crowdfunding campaign at Indiegogo to raise the required funds to manufacture the receiver. The developers had some crowdfunding experience: they previously raised about $30,000 to make Navio, a Raspberry Pi-based autopilot. As of December 2015, the third generation of Navio has been manufactured.

The results of the Reach campaign exceeded all the team’s expectations: the target was $27,000, but the campaign took $81,960 in preorders, with promotion being mainly word-of-mouth, showing there is strong demand for high-precision positioning systems. The most popular option was a set of two receivers, suitable for RTK applications.

Thanks to close cooperation with Intel, Emlid was able to acquire the required number of Intel Edison boards at an affordable price. Five months after the completion of the crowd-funding campaign, the first 400 pre-ordered Reach modules were dispatched to the customers.

Reach continues to be developed and extended with software updates as often as every two weeks. The development team has grown from four people to eleven, based in an office with roof access. “That’s been the biggest time saver,” said Vereninov. “The big issue with developing GPS is that you can’t test indoors. To test every little thing you need to go outside. It’s really hard to find an open space in a city where you can enjoy a good view. Having that roof and a permanently placed antenna really helps us save time developing Reach.”

Reach is now in regular production and available for order through Emlid’s website.

Intel Fuels Innovation at NASA Space Apps Challenge

$
0
0

Not everyone can become a rocket scientist, or can they? For many, software development can seem like rocket science; however, the Intel® IoT Developer Program supports events like the recent 2016 NASA Space Apps Challenge to demonstrate the ease of development on IoT platforms.

With the April 22-24 NASA Space Apps Challenge, held at venues worldwide, attendees at some locations were provided with the Intel® IoT Developer Kit containing everything needed to create innovative IoT projects and become highly engaged in the global developer community.

A key focus in the Space Apps Challenge was Open Data. As a founding member of the Open Interconnect Consortium, Intel supports open ecosystems with a broad range of IoT products and solutions. This broader choice for developers within an open foundation adds unique value to the collective ecosystem.

Each NASA Space Apps Challenge site was unique and the event in Pasadena was no exception. Not only did this location have a 48-hour hackathon, but they also hosted a meetup on April 12 and a Women in Data Bootcamp on April 21. As part of the sponsorship, Intel’s Software and Services Group contributed 60 Intel® Edison development boards with Grove* sensor kits.

Some key highlights about the 2016 NASA International Space Apps Challenge:

  • Roughly 20,000 participants at nearly 200 innovation sites around the globe.
  • NASA presented a variety of challenges in six futuristic areas: Solar System, Earth, Technology, Space Station, Aeronautics, and Journey to Mars.
  • Some sites hosted Women in Data Bootcamps.
  • Intel sponsored the Pasadena, CA. event together with Microsoft and CDW.
  • Intel supported NASA Space Apps Challenge events in Boston, San Francisco, Seattle, Austin, and Pasadena by donating Intel® IoT Developer Kits and providing on-site technical mentors.
  • At every location, the participating teams got a chance to demo their projects to a panel of judges and the attendees.  Each panel crowned two demos as the winning ones for a chance to compete in the Global Challenge across all NASA Space Apps Challenge sites.  In addition, the attendees at every site voted for their favorite demo for a chance to win the People’s Choice Award.
  • Intel provided prizes and recognition at the supported locations for the “Best Use” of the Intel® Edison Board in their demo.
  • Strong spirit of innovation, learning, inclusiveness and diversity, and use of open data.
  • 48-hour hackathon event.

Each NASA Space Apps Challenge site was unique and the event in Pasadena was no exception. Not only did this location have a 48-hour hackathon, but they also hosted a meetup on April 12 and a Women in Data Bootcamp on April 21. As part of the sponsorship, Intel’s Software and Services Group contributed 60 Intel® Edison development boards with Grove* sensor kits.

Ajay Mungara, an Intel Senior Product Manager, participated in the meetup introducing the attendees to the Intel® IoT Developer Kit, and inspired them by describing some innovative and interesting projects developed over the course of two days at other Intel sponsored hackathons conducted around the globe.

The Women in Data Bootcamp, which preceded the hackathon, included a talk by Amber Huffman, newly-appointed Intel Fellow, on “Delivering Technology While Life Happens”. Amber spoke with over 100 young women as part of an effort to build confidence and skillsets to pursue careers in technology in preparation for the hackathon. The Women in Data Bootcamp encouraged young women to pursue technical careers and sharpen their skills in hackathons. ;otable speakers included Renee Wynn, NASA CIO, Debra Diaz, NASA CTO, and Kiki Wolfkill, Head of Microsoft’s Halo Media Division.

As part of Intel’s efforts within this event, Grace Metri, Community Evangelist for Intel’s IoT Developer Program, coached makers on building their projects using the Intel® Edison board, sensor kits and harnessing the power of the Cloud. Grace’s presentation titled “Developing for the Earth and Beyond Using Intel® IoT Developer Kit” was well received.

In addition to supporting hackers during the two-day NASA Space Apps challenge with the dev kits, Intel brought in an on-site crew as part of the developer initiation process with a focus on inspiring, supporting and encouraging teams to adopt the Intel® Edison boards in development of their projects. In addition to Grace Metri, the crew included a “Rock Star Coder” – Ron Evans, Intel® Software Innovator. Also lending support was Cheston Contaoi, Product Manager on Intel’s IoT Developer Program team.

The hackathon was truly inspiring to everyone involved including attendees, participants, and mentors alike. Seventeen projects were demonstrated all displaying a high level of creativity and innovation. Intel’s Sales and Marketing Group NASA Account Executive, Rob Lalumondier, was on the NASA judging panel along with Renee Wynn, NASA CIO, and Tom Soderstrom, NASA’s Jet Propulsion Lab CTO. Both of the winning demos that will be taking part in the Global Challenge, plus the People’s Choice winning demo incorporated the Intel® IoT Developer Kit in their projects.

The first place winner at this year’s NASA Space Apps Challenge in Pasadena was a project called “Scintilla”. This project was designed by Chelsea Graf, Chris Del Guercio, Eric Gustafson, Konrad Ludwig and Kyle Spitznagel with the idea of “democratizing air quality data”. By using the Intel® Edison board and air quality sensors, they could build a network of monitoring stations and utilize social media and fitness tracker delivery methods to warn citizens of poor air quality. The designers hopes were that a change of lifestyle could positively impact personal health aspects and create additional synergy between humans and the environment.

The second place winner was Team Stardust with their Jarvis Sensor System built using the Intel® Edison board. Their goal was to monitor dust levels in real-time and through Microsoft* Azure they could monitor data to help prevent contamination of items going into space.

The “People’s Choice” winner was Team HoloCube. This Intel® Edison-based toy is used to teach kids about space science. Using LEDs, sensors and lasers, it acted as a miniature planetarium. The project was overwhelmingly popular, and even received the stamp of approval by the kids present on site.

Intel’s support to the developer community is not limited to the 2016 NASA Space Apps Challenge. Intel also sponsors roadshows, hackathons, and many other events all over the world. It is easy to get involved, and you don’t even need to know how to code, as there are many ways to be productive as a part of a team with a common goal. Check out the Intel® Software Developer zone for IoT to find out more about Intel’s efforts in IoT around the world.

Intel’s IoT Developer Program is a comprehensive program for makers, hobbyists, and professional developers offering knowledge, tools, dev kits and a community of experts to easily turn your innovative ideas into IoT solutions.


3.5M Players and Counting: Torn Banner Studios Shares the Secrets of its Success

$
0
0

When Ontario, CA-based Torn Banner Studios released the PC version of their medieval combat game, Chivalry: Medieval Warfare, in 2012, they expected to sell about 100,000 copies. They already had a big following from their game Half-Life 2 Mod, Age of Chivalry, so this target seemed ambitious but reasonable. Chivalry blew the top off these initial expectations. Four years later, 3.5 million copies have been sold and it continues to have a strong following.

We caught up with Alex Hayter, Senior Brand Manager at Torn Banner, to talk about the secrets of the game’s success. Read on to discover how they kept the momentum going, how they’re incorporating what they learned into their next product, and more—all while staying true to their mission of creating games they like to play.

Torn Banner’s Story

In many ways, Torn Banner’s path to success is a game developer’s dream come true. What started as a collaboration between students and hobbyists all over the world evolved into a game that everyone in the company was truly passionate about.

“We wanted do justice to the awesome medieval combat that we’ve seen in Hollywood movies, like Braveheart or Gladiator,” Alex explained.

There were other medieval games on the market, but none of them offered the kind of intense control that they were imagining, akin to what they were seeing in multiplayer shooter games with futuristic or modern storylines.

Like many new game developers, Torn Banner didn’t create a marketing plan or do market validation early in the process. From their experience creating the Half-Life 2 Mod, they knew there were people out there just like them, who loved these movies and would love to be able to jump into the melee with realistic weaponry and intense control. They had high internal standards for what this game should look and feel like—and operated under the assumption that if they felt the game was awesome, their customers would too. It was a risk that paid off.
 

Pricing

Torn Banner priced Chivalry at USD 25 on Steam, which was in the same range as other games of the same quality on the market at that time. It was a price that felt like a fair reflection of the game’s value—but also enabled them to offer deep discounts during Steam’s many sales.
 

Marketing Path

Although Torn Banner did not do a lot of pre-release market validation, after launch they paid close attention to what was helping Chivalry reach a broader audience and were nimble about building on these successes. Here are some of the things that helped the game really take off:

Made to Be Seen

YouTube was just emerging as a gaming video platform around the time that Chivalry came out. With its highly visual, historically accurate world, over-the-top violence, and the ability of players to get creative with swords, bows and arrows and other medieval weaponry in melee combat, and the game lent itself particularly well to the new medium. Add in player commentary (in period-perfect lingo, of course) and the ability for players to shout custom voice commands during battle and you had gaming videos that were as entertaining to watch as the game was to play. This had a huge viral effect and generated tons of new interest in the game.

5 Seconds of Gore

Another tactic that has worked well for Chivalry was the 5-second GIF. In fact, Hayter explained that the 5-second GIF became a sort of litmus test for how communicable a game is. “We knew what we were doing from the beginning in terms of tone and player fantasy,” Alex said. “In five seconds with Chivalry, it can be someone chopping off someone’s head with a sword and then the blood spurting out. Just by watching it you get the adrenaline rush and you can see how appealing that can be to a lot of different game players.”

Extend the World, Extend the Fun

One of the great things about Chivalry is that its popularity has continued long after its release. Torn Banner has been able to extend the fun and gain new customers through cross-promotions with other games on the same platform. For instance, they might include an item in Chivalry that players will recognize from another game, and vice versa. The catch? Players need to have both games to unlock the item.

Chivalrous Community

Community is more than a buzzword for Torn Banner team. They know that the Chivalry community is full of dedicated players who like medieval warfare as much as they do. Torn Banner has worked to actively nurture this community since the release by holding contests and community initiatives, and participating in discussions on social media, within the Steam community, and on their own website. These discussions have allowed them to get to know their customers better so they can make sure to keep creating games the community will love.

Intel Engagement

Nurturing organic marketing, such as gaming videos and discussion boards, is a great way for a small studio to reach a lot of people with limited resources. For Torn Banner, one limit is that they only have one dedicated marketing person—it’s Alex! While larger organizations might be able to produce new media assets regularly, Torn Banner wasn’t able to do so.

Enter Torn Banner’s engagement with Intel. Intel took notice of the excitement around Chivalry and approached Torn Banner with a partnership opportunity.

“Torn Banner Studios’ game Chivalry: Medieval Warfare grabbed our attention: they were an indie company developing and publishing a game with an inspired take on first-person, multiplayer gameplay in a very competitive genre—yet doing well,” said Patrick DeFreitas, Software Partner Marketing Manager at Intel.

“This engagement was an opportunity to enable their title for Intel® Iris™ graphics on the latest Intel® Core based platforms and help them continue to expand both their brand and game to new players and more millennials.”

One of the things that attracted Torn Banner to the Intel partnership was that Intel made it really easy to work with them. “It’s been nice that Intel hasn’t required us to put a ton of effort into creating new assets,” Hayter said.

Torn Banner engaged with Intel on two campaigns in 2015:

  • Holiday Contest. During the holiday season, Torn Banner partnered with Intel on a fun contest that helped increase Torn Banner’s exposure and drive traffic to the Chivalry page on the Intel® App Showcase. The holiday contest was a huge success, with Torn Banner receiving more than 4,300 entries (219% above goal!); more than 7M impressions; and 1,353 new Facebook followers in just six and a half days!
     
  • Twitch Trailer Promo. In this promotion, an Intel gaming specialist helped create a video trailer for Twitch TV that highlighted a few awesome gaming apps that run on Intel® architecture, including Chivalry. Another big success, this campaign drew 205,000 video views and resulted in a more than 2.25% click through rate to Torn Banner’s product page!

Future Plans

In keeping with their mission to make games they love to play and the high value they place on nurturing creativity, Torn Banner decided to focus next on a new game, rather than a sequel. Mirage: Arcane Warfare, comes out this fall and the team will be using many of the same go-to-market strategies that worked well for Chivalry, a little more formally this time. Specific plans include:

  • Getting early feedback through play testing
  • Releasing a beta to get even more feedback
  • Focusing on highly visual marketing assets, such as trailers, as well as making behind-the-scenes videos
  • Working to create hype for the game at events before launch
  • Listening to customer feedback but still staying focused on making the game they want to play

Ingredients for Success

Torn Banner has grown from a collaboration of mod developers into a thriving studio on the brink of launching its second game. They credit their success to four key ingredients:

  1. Make a really great game. The developers never faltered in their mission to create the game they wanted to play. The result was a highly visual, super fun game unlike any other on the market.
  2. Embrace ambassadors. YouTube and other video content traders were instrumental in spreading the word about Chivalry. Chivalry was a great candidate for this then-new form of entertainment—it’s highly visual and the gameplay and medieval world lent itself well to amusing commentary.
  3. Continue to improve. Once the game was released, Torn Banner actively supported with 46 patches to date, tons of free post-launch content, and an active presence in the community.
  4. Value creativity within the organization. A critical part of the culture at Torn Banner is at that everyone’s opinion matters. All team members—from marketing, to QA, to programming, to art—are valued for their creativity and have a voice in the games they make.

Final Words

“Don’t try to latch onto a successful game idea. Truly commit to your own thing, follow your passion.”
 

Download a summary of the secrets to Torn Banner’s success below.

 

Helios Headgear Uses Intel® RealSense™ Technology to Empower the Visually Impaired

$
0
0

Our Work and Motivation

The HELIOS* project is focused on enhancing and complementing human sensory functions with cutting-edge vision technologies.

A study published by the World Health Organization reveals an estimated 285 million visually impaired people worldwide: 39 million blind and 246 million with low vision.

We believe it is very important to improve mobility, safety, and access to knowledge for people with sight deficiencies.

Using Computer Vision, Artificial Intelligence, and Intel® RealSense™ technology, we are working on innovative solutions to help visually impaired individuals overcome several challenges they face on a daily basis. Our approach centers on the development of smart headgear to assist with partial or complete vision loss.

HELIOS Headgear Models and Features

The HELIOS headgear provides a series of accessibility features for visually impaired individuals, empowering them to perform actions and tasks with more ease and confidence.

HELIOS Touch

HELIOS Touch addresses individuals suffering from severe or complete blindness. It uses our HTI interface to translate visual data to the user via haptic signals, granting nearby environment localization and obstacle avoidance capabilities.

HELIOS Touch 3D representation
HELIOS Touch 3D representation

HELIOS Light

HELIOS Light addresses individuals suffering from low vision, using AR/VR technology to enhance the user’s level of visual perception, capitalizing on the Intel RealSense RGB and depth data streams to provide an adaptable vision aid for performing a variety of daily tasks.

HELIOS Light 3D representation
HELIOS Light 3D representation

A key feature of HELIOS is to give the user better awareness of the nearby environment, substantially improving freedom of movement and safety.

Understanding non-Braille text is another important functionality. HELIOS can read the content of books, magazines, or other printed material, such as restaurant menus.

Furthermore, HELIOS delivers a new layer of context for interpersonal interaction by recognizing friendly faces and social cues.

Hardware breakdown

Intel® RealSense™ Technology

Intel® RealSense™ cameras have RGB-D capability and versatile sensors that provide HELIOS with high-quality depth and RGB streams. Their features, performance, and small form factor make them optimal for integration into the HELIOS headgear.

Intel® RealSense™ camera R200. Learn more about it in this article
Intel® RealSense™ camera R200. Learn more about it in this article.

The Razer* Stargazer, which is a third-party version of the Intel® RealSense™ camera SR300.
The Razer* Stargazer, which is a third-party version of the Intel® RealSense™ camera SR300.

HTI* Haptic interface

HTI is a hardware component of HELIOS Touch, developed by our team. It is designed to translate visual data into haptic feedback, providing the user with an extra layer of information, in a precise, non-intrusive form.

HTI test board
HTI test board

Open Source Virtual Reality

The Razer OSVR Hacker Development Kit is a highly customizable platform for Virtual and Augmented Reality. It is an ideal off-the-shelf component for HELIOS Light given its open source, extendable nature, and compelling hardware design.

Razer OSVR HDK
Razer OSVR HDK

Intel Next Unit of Computing

Intel’s latest generation of small form factor PCs provides a robust platform for running HELIOS software components in real-time, with emphasis on performance, power efficiency, and portability.

Intel® NUC
Intel® NUC

Software. Intel® RealSense™ SDK

The Intel RealSense SDK is a central piece of HELIOS's software components. Out-of-the-box it facilitates access to high-frame-rate RGB, and depth and IR streams and provides a comprehensive set of computer vision algorithms for tasks such as person tracking, facial recognition, or 3D mapping. The SDK ships with a great set of sample projects and extensive online documentation

The following code sample reveals the key components for developing a Text-to-Speech module with RealSense and UWP (Universal Windows Platform):

        public async void StartRealSenseStreaming()

        {

            Status streamingStatus;



            // Set RealSense sample reader and bind SetOcrFrame event

            SampleReader sampleReader = SampleReader.Activate(senseManager);

            sampleReader.SampleArrived += SetOcrFrame;



            // Set RGB stream profile and device info filter

            Dictionary<StreamType, PerceptionVideoProfile> profiles = new Dictionary<StreamType, PerceptionVideoProfile>();

            profiles[StreamType.STREAM_TYPE_COLOR] = ColorProfile;

            sampleReader.EnableStreams(profiles);

            readers.Add(sampleReader);

            if (currentRealSenseDevice != null)

                senseManager.CaptureManager.FilterByDeviceInfo(currentRealSenseDevice.DeviceInfo);



            // Set streaming status message

            if ((streamingStatus = await senseManager.InitAsync()) == Intel.RealSense.Status.STATUS_NO_ERROR)

            {

                if ((streamingStatus = senseManager.StreamFrames()) == Intel.RealSense.Status.STATUS_NO_ERROR)

                {

                    StatusMessage = "Streaming started";

                }

                else

                {

                    StatusMessage = "Failed to stream: " + streamingStatus.ToString();

                }

            }

            else

            {

                StatusMessage = "Initialization failed: " + streamingStatus.ToString();

            }



            IsStreaming = true;

        }



        private void SetOcrFrame(Object module, SampleArrivedEventArgs args)

        {

            // Setting current frame for OCR processing

            Sample sample = args.Sample;

            if (sample == null) return;



            var localOcrFrame = sample.Color;

            if (localOcrFrame == null) return;



            lock (sample)

            {

                ocrFrame = localOcrFrame.SoftwareBitmap;

            }

        }


        private async void TextToSpeech()

        {

            // setup OCR engine for English

            OcrEngine ocrEngine = OcrEngine.TryCreateFromLanguage(new Language("en"));



            // recognize text from the RealSense OcrFrame

            var ocrResult = await ocrEngine.RecognizeAsync(RealSense.OcrFrame);



            if (!String.IsNullOrEmpty(ocrResult.Text))

            {

                // setup speech synthesizer

                var voice = SpeechSynthesizer.AllVoices;

                using (SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer())

                {

                    speechSynthesizer.Voice = voice.First(v => v.Gender == 0);

                    var voiceStream = await speechSynthesizer.SynthesizeTextToStreamAsync(ocrResult.Text);



                    // setup playback of voice synthesis

                    PlaybackVoice(voiceStream);

                }

            }

        }

Testing and validation

Mihai Leoveanu was born with severe sight deficiency, but that didn’t stop him from becoming the outstanding person he is today.

Mihai has a strong can-do attitude and is one of the best students in his graduating class. He is currently working on his Master’s Thesis that proposes a series of Accessibility improvements for the Targoviste Royal Court historical site. These improvements would allow visually impaired tourists to have a richer visiting experience.

Mihai is the first person to review the capabilities of our headgear.

Mihai field testing HELIOS
Mihai field testing HELIOS

Mihai reading with HELIOS
Mihai reading with HELIOS

During the experimentation process, Mihai provided observations for each feature of HELIOS he was using. He seamlessly adopted the new information sources, and in a matter of minutes he was successfully using the headgear to gain a more accurate description of his surroundings.

Conclusion

The results from development and testing are very positive. Functionalities like environment perception and non-Braille reading are much easier for the user. With further development, HELIOS has the potential to make a real difference for individuals with sight impairment, becoming a true complement to their senses.

About the Authors

Silviu-Tudor Serban, Cristian Dragomir, and Andrei Nistor are Intel RealSense technology experts with backgrounds in Computer Vision, Artificial Intelligence, Software Development, and IoT. Find out more at Helios Vision and Intel Devmesh.

Intel® VTune™ Amplifier XE 2016 Update 3 Fixes List

$
0
0

NOTE: Defects and feature requests described below represent specific issues with specific test cases. It is difficult to succinctly describe an issue and how it impacted the specific test case. Some of the issues listed may impact multiple architectures, operating systems, and/or languages. If you have any questions about the issues discussed in this report, please post on the user forums or submit an issue to Intel® Premier Support.

<Update 2 

DPD200254200Better visualization for bandwidth data
DPD200363058BSOD on Windows* 7 using VTune Amplifier 2015 Update 1
DPD200374547Crash in VTune Amplifier #version 2015 update 3
DPD200381055VTune Amplifier crashing machine if collection ends before program
DPD200381096VTune Amplifier cause BSOD while running
DPD200408498VTune Amplifier assert failure when attempting to view analysis
DPD200408522"Collection failed. The data cannot be displayed" message if special characters in path to application
DPD200409392VTune Amplifier crashes after user-mode collection
DPD200575463VTune Amplifier crash report in libamplxe_dbinterface_sqlite_1.99.so
DPD200577525VTune Amplifier does not start due to licensing issue - PerfAnl: Cannot connect to license server system.

 

Migrating Applications from Knights Corner to Knights Landing Self-Boot Platforms

$
0
0

While there are many different programming models for the Intel® Xeon Phi™ coprocessor (code-named Knights Corner (KNC)), this paper lists the more prevalent KNC programming models and further discusses some of the necessary changes to port and optimize KNC models for the Intel® Xeon Phi™ processor x200 (code-named Knights Landing (KNL)) self-boot (SB) platform.

Instruction Set Compatibility

Virtually all applications running today on an Intel® Xeon® processor-based platform will run on a KNL SB platform without modification. But it is recommended that you recompile your application for KNL to achieve best performance.

KNL supports Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set architecture (ISA), which are 512-bit vector extensions to the 256-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) SIMD instructions supported on current Intel Xeon processors. So all applications that currently run on Intel Xeon processors can run on KNL. However, the performance may be less than if you  were exploiting KNL ISA. You should use the KNL AVX-512 ISA, which includes foundation instructions, conflict detection instructions (CDI), error and reciprocal instructions (ERI), and prefetch instructions (PFI). Note: KNC supported a different 512-bit ISA, so all KNC applications must be recompiled and ported to KNL.

The Intel® Advance Vector Extensions instructions on current Intel® Xeon® Processor Family and the Intel® Xeon Phi™ Processor x200 (KNL).
The Intel® Advance Vector Extensions instructions on previous Intel® Xeon® Processor Family and the Intel® Xeon Phi™ Processor x200 (KNL).

Function-Specific Optimizations

As mentioned, nearly everything that runs on an Intel® Xeon® E5-2600 v4 product family (code-named Broadwell-EP) platform will run on KNL. Even legacy binaries from several generations ago will run out-of-the-box with few exceptions. How well code will run mostly depends on how well optimized or efficient the workload is in terms of core scaling, vector scaling, and memory bandwidth.

Optimizations that improve core scaling or parallel efficiency will benefit the application on both Intel Xeon processors and KNL, but KNL to a much greater degree since it has many more cores and threads.

Optimizations that improve vector scaling or SIMD efficiency will also benefit the application on both Intel Xeon and KNL. If you recompile using KNL AVX-512, KNL gains much more from exploiting the many benefits of KNL AVX-512, such as masking with larger and more registers.

If your workload is memory bandwidth sensitive, KNL’s MCDRAM or high bandwidth memory may offer high value, perhaps with little effort. If your workload total memory size is less than 16 GB, you can load your entire workload in MCDRAM and see much higher effective memory bandwidth capability (over 4x DDR). If your required memory size is larger than 16 GB, you can exploit KNL’s cache configuration where MCDRAM is a memory side cache to DDR4 memory, or you can exploit the memkind library now available via github*.

Integrate On-Package Memory Usage Models on the Intel® Xeon Phi™ Processor x200.
Integrated On-Package Memory Usage Models on the Intel® Xeon Phi™ Processor x200.

Migration Considerations

There are two issues that must be considered in migrating KNC applications to KNL Self-Boot.

  • The implementation type
  • The level of intrinsics and assembly code used

If you used the Intel® tools (compilers and performance libraries) for KNC and did not add assembly code or intrinsics, you must just recompile for KNL. Some of the optimizations that were needed to get good performance on KNC are tolerated on KNL, but may not be necessary. One example is data alignment. On KNC an unaligned instruction on aligned data had performance penalties, but on KNL there is no penalty on an unaligned instruction processing aligned data. This does not mean you must remove this code in migrating to KNL. It simply means that the alignment requirements for KNC were stringent and those for KNL are much more flexible, come at little or no cost, and do no harm.

If you wrote key portions of your application using assembly or KNC intrinsics, these will have to be rewritten for KNL. Since both are 512-bit SIMD with masking, most of the intrinsics porting should be easy. White papers on adapting KNC intrinsics to KNL are available at the Intel® Developer Zone.

Working with KNC Implementation Methods

Most KNC applications could be implemented as one of the following:

  • Native
  • Symmetric
  • Offload

Native is the simplest form, and a simple recompile for KNL SB will go a long way to creating a KNL SB binary. Most cases that ran well on KNC will run quite well on KNL SB. With a symmetric model, you can run some ranks on an Intel Xeon processor and some on KNC, and for this case, a simple recompile should be sufficient to get it running and in most cases running quite well on KNL.

An offload usage model runs part of the workload on the host and part of it on the KNC coprocessor. This code ports easily to the KNL coprocessor, but this paper is focused on the self-boot platform. You can run on the self-boot platform by using the best host version of the workload or taking advantage of the coprocessor version, which will revert back to running on the host when it sees no coprocessor. If you had vectorization and threading optimizations that were done for KNC, you will want to reuse them for KNL SB platforms.

KNC had some peculiar uarch/compiler deficiencies, which forced some developers to resort to intrinsics for their code (for example, gather/scatter, prefetch, alignment, and so on). KNL has made significant microarchitectural enhancements over KNC, so it is highly recommended to recompile the original reference code and use this as your starting point for KNL.

You must make an independent assessment of whether you should revert to intrinsics or assembly on KNL, but in many cases where this was needed on KNC, the need on KNL is most likely eliminated. The main reasons for this are the maturity and increased capabilities of the Intel compiler, the Intel AVX-512 ISA and uarch improvements. The vectorization reports within the Intel compilers were greatly improved to help you assess and improve vectorization and the Intel® Advisor XE Intel’s Vector tool provides an interactive assist in identifying and exploiting unrealized vectorization opportunities.

Summary

The KNL-based platform is transformative in its compatibility with legacy binaries, adherence to open industry-standard development tools and methodologies, and its ability to reveal more value from the most scalable applications. The more you improve the scalability of your software, the better performance you can achieve on the KNL SB-based platform. Please look for more information at the Intel Developer Zone.

Using Intel® VTune™ Amplifier XE to Tune Software on the Intel® Xeon® Processor E5 v4 Family

$
0
0

Download this guide (see Article Attachments, below) to learn how to identify performance issues on software running on the Intel® Xeon® Processor E5 v4 Family (based on Intel® Microarchitecture Codename Broadwell). The guide explains the General Exploration Analysis viewpoint available in Intel® VTune™ Amplifier XE. It also walks through some of the most common performance issues that the VTune Amplifier XE interface highlights, what each issue means, and some suggested ways to fix them.

For other tuning guides, please visit our Processor-specific Performance Analysis web page.

Viewing all 3384 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>