Abstract
This is part 2 of a 3-part educational series of publications introducing select topics on optimization of applications for Intel’s multi-core and manycore architectures (Intel® Xeon® processors and Intel® XeonPhi™ coprocessors).
In this paper we discuss data parallelism. Our focus is automatic vectorization and exposing vectorization opportunities to the compiler. For a practical illustration, we construct and optimize a micro-kernel for particle binning particles.
Similar workloads occur applications in Monte Carlo simulations, particle physics software, and statistical analysis.
The optimization technique discussed in this paper leads to code vectorization, which results in an order of magnitude performance improvement on an Intel Xeon processor. Performance on Xeon Phi coprocessor compared to that on a high-end Intel Xeon is 1.4x greater in single precision and 1.6x greater in double precision.
Download the full articleTéléchargerDownload