Optimization Steps
The key to performance measurement is two-fold, know exactly what you are measuring and collect your baseline data. Next, profile your application and identify a specific and realistic performance goal based on the profiling data. Follow these steps to optimize your software.
Fundamental Concepts
The Intel Compilers provide a number of features for generating vectorized code. Auto-vectorization is the method used by the Intel Compilers to generate vectorized code for a given application without requiring code changes. Developers can also implement simple coding changes in the source code to enforce vectorization behavior.
Intel Compiler Auto-vectorization (C++ | Fortran)
Performance Essentials with OpenMP 4.0 Vectorization
Intermediate Techniques
Proven techniques for code optimizations and change recommendations are listed here. Note that these recmonndations depend entirely upon the application.
Fortran Array Data and Arguments and Vectorization
Data Alignment to Assist Vectorization
Program Optimization through Loop Vectorization
Random Number Function Vectorization
Optimization Reports
Code changes may be required in order to facilitate vectorization even further. Once a developer has made changes to the code, how does one that the changes elicit the expected behavior? Use of special compiler optimization reports to guide source code changes and verify that the code does indeed vectorize.
Vectorization and Optimization Reports
Overview of Vectorization Reports and the -vec-report6 Option
Advanced Methods
The techniques offering the most control require greater application knowledge and skill in knowing where they should be applied. But these more intensive techniques, such as intrinsics, can result in greater performance when properly used.
Getting Started with Intel® Cilk™ Plus SIMD Vectorization and SIMD-enabled Functions
Outer Loop Vectorization via Intel Cilk Plus Array Notations