Abstract
This is part 3 of a 3-part educational series of publications introducing select topics on optimization of applications for Intel’s multi-core and manycore architectures (Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors).
In this paper we discuss false sharing, highlighting the situations in which it may occur, and eliminating it with the help of data container padding.
For a practical illustration, we construct and optimize a micro-kernel for binning particles based on their coordinates. Similar workloads occur in Monte Carlo simulations, particle physics software, and statistical analysis.
Results show that the impact of false sharing may be as high as an order of magnitude performance loss in a parallel application. On Intel Xeon processors, padding required to eliminate false sharing is greater than on Intel Xeon Phi coprocessors, so target-specific padding values may be used in real-life applications.
Download the full article (PDF) DownloadDownload
In the second publication of this series, we will demonstrated optimization of this workload, focusing on vectorization. Optimization Techniques for the Intel® MIC Architecture: Part 2 of 3