This article demonstrates the performance benefits of the MPI-3 nonblocking collective operations supported by the Intel® MPI Library 5.0 and the Intel® MPI Benchmarks (IMB) 4.0 products. We’ll show how to measure the overlap of communication and computation, and demonstrate how an MPI application can benefit from the nonblocking collective communication.
The Message Passing Interface (MPI) standard is a widely used programming interface for distributed memory systems. The latest MPI-3 standard contains major new features such as nonblocking and neighbor collective operations, extensions to the Remote Memory Access (RMA) interface, large count support, and new tool interfaces. The use of large counts lets developers seamlessly operate large amounts of data, the fast, one-sided operations strive to speed up Remote Memory Access (RMA)-based applications, and the nonblocking collectives enable developers to better overlap the computation and communication parts of their applications and exploit potential performance gains.
Here, we concentrate on two major MPI-3 features: nonblocking collective operations (NBC) and the new RMA interface. We evaluate the effect of communication and computation overlap with NBC, and describe potential benefits of the new RMA functionality.
Download complete article (PDF) DownloadDownload