Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Intel® MPI Library: Supporting the Hadoop* Ecosystem

$
0
0

For decades, MPI has dominated as the model to use in distributed calculations. However, with high-performance computing (HPC) incorporating workloads requiring processing of huge volumes of input data, new approaches and frameworks have appeared. The most popular ones are the Apache Hadoop* MapReduce paradigm in general and the Hadoop software stack (including all tools and frameworks running on top of it) in particular.

Vanilla Hadoop is composed of several modules and implies certain constraints on the scope of problems it can solve efficiently. For example, MapReduce is not good at iterative algorithms since it requires dumping the intermediate results to the storage between the iterations. These  shortcomings triggered development of other frameworks on top of Hadoop, e.g., Apache Spark*, Apache Storm*, which allow for efficient in-memory data caching between the iterations. Moreover, YARN*, as a cluster management framework, allows for arbitrary paradigms, not only MapReduce, thus broadening the area of Hadoop applicability to the fields where mostly MPI could be found previously.

View complete article from Parallel Universe issue #24


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>