Intel® Advisor XE 2016 Update 4 - What’s new
We’re pleased to announce new version of the Vectorization Assistant tool - Intel® Advisor XE 2016 Update 4.
Below are highlights of the new functionality in Intel Advisor 2016 Update 4.
Full support for all analysis types on the second generation Intel® Xeon™ Phi processor (code named Knights Landing)
FLOPS and mask utilization
Tech Preview feature! Accurate hardware independent FLOPS measurement tool. (AVX512 only) Mask aware . Unique capability to correlate FLOPS with performance data.
Workflow
Batch mode, which lets you to automate collecting multiple analysis types at once. You can collect Survey and Trip Counts in single run – Advisor will run the application twice, but automatically, without user actions. For Memory Access Patterns (MAP) and Dependencies analyses, there are pre-defined auto-selection criterias, e.g. check Dependencies only for loops with “Assumed dependencies” issue.
Improved MPI workflow allows you to create snapshots for MPI results, so you can collect data in CLI and transfer self-contained packed result to a workstation with GUI for analysis. We also fixed some GUI and CLI interoperability issues.
Memory Access Patterns
MAP analysis now detects Gather instruction usage, unveiling more complex access patterns. A SIMD loop with Gather instructions will work faster than scalar one, but slower, than SIMD loop without Gather operations. If a loop has “Gather stride” category, check new “Details” tab in Refinement report for information about strides and mask shape for the gather operation. One of possible solutions is to inform compiler about your data access patterns via OpenMP 4.x options – for cases, when gather instructions are not necessary actually.
For AVX512 MAP analysis also detects Gather/scatter instruction usage, these instructions allow more code to vectorize but you can obtain greater performance by avoiding these types of instructions.
MAP report in enriched with Memory Footprint metric – distance in address ranges touched by given instruction. The value represents maximal footprint across all loop instances.
Variable name is now reported for memory accesses, in addition to source line and assembly instruction. Therefore, you have more accuracy in determining the data structure of interest. Advisor can detect global, static, stack and heap-allocated variables.
We added new recommendation to use SDLT for loops with an “Ineffective memory access” issue.
Survey and Loop Analytics
Loop Analytics tab has got trip counts and extended instruction mix, so you can see compute vs memory instruction distribution, scalar vs vector, ISA details, etc.
We have improved usability of non-Executed code paths analysis, so that you can see ISA and traits in the virtual loops and sort and find AVX512 code paths more easily.
Loops with vector intrinsics are exposed as vectorized in the Survey grid now.
Get Intel Advisor and more information
Visit the product site, where you can find videos and tutorials. .