Principal Investigators:
David Keyes is a founding professor of Applied Mathematics and Computational Science at KAUST, where he focuses on high performance implementations of implicit methods for PDEs. He received a BSE from Princeton and a PhD from Harvard. He has held faculty positions at Yale, Old Dominion, and Columbia Universities and research positions at NASA and DOE laboratories, and has led the scalable solvers initiative of the DOE SciDAC program. He is a Fellow of AMS and SIAM, and recipient of the IEEE Sidney Fernbach Award, the ACM Gordon Bell Prize, and the SIAM Prize for Distinguished Service to the Profession.
Hatem Ltaief is a Senior Research Scientist in the Extreme Computing Research Center at KAUST, where he directs the KBLAS software project for dense and sparse linear algebraic operations on emerging architectures. He received an MS in computational science from the University of Lyon and an MS in applied mathematics and a PhD in computer science from the University of Houston. He has been a Research Scientist at the Innovative Computing Laboratory of the University of Tennessee and a Computational Scientist in the KAUST Supercomputing Laboratory. He is a member of the European Exascale Software Initiative (EESI2).
Rio Yokota is an associate professor in the Global Scientific Information and Computing Center at the Tokyo Institute of Technology and a consultant at KAUST, where he researches fast multipole methods, their implementation on emerging architectures, and their applications in PDEs, BEMs, molecular dynamics, and particle methods. He received his undergraduate and doctoral degrees in Mechanical Engineering from Keio University, and held postdoctoral appointments at the University of Bristol and Boston University and a Research Scientist appointment at KAUST. He is a recipient of the ACM Gordon Bell Prize.
Description:
The Intel® Parallel Computing Center (Intel® PCC) at King Abdullah University of Science and Technology (KAUST) aims to provide scalable software kernels common to scientific simulation codes that will adapt well to future architectures, including a scheduled upgrade of KAUST’s globally Top10 Intel-based Cray XC40 system. In the spirit of co-design, Intel® PCC at KAUST will also provide feedback that could influence architectural design trade-offs. The Intel® PCC at KAUST is hosted in the KAUST’s Extreme Computing Research Center (ECRC), directed by co-PI Keyes, which aims to smooth the architectural transition of KAUST’s simulation-intensive science and engineering code base. Rather than taking a specific application code and optimizing it, the ECRC adopts the strategy of optimizing algorithmic kernels that are shared among many application codes, and of providing the results in open source libraries. Chief among such kernels are Poisson solvers and dense symmetric generalized eigensolvers.
We focus on optimizing two types of scalable hierarchical algorithms – fast multipole methods (FMM) and hierarchical matrices – on next generation Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. These algorithms have the potential to replace workhorse kernels of molecular dynamics codes (drug/material design), sparse matrix preconditioners (structural/fluid dynamics), and covariance matrix calculations (statistics/big data). Co-PI Yokota is the architect of the open source fast multipole library ExaFMM, which attempts to integrate best solutions offered by FMM algorithms, including the ability to control expansion order and octtree decomposition strategy independently to create the fastest inverter to meet a given accuracy requirement for solver or a preconditioner on manycore and heterogenous architectures. Co-PI Ltaief is the architect of the KBLAS library, which promotes the directed acyclic graph-based dataflow execution model to create NUMA-aware work-stealing tile algorithms of high concurrency, with innermost SIMD structure well suited to floating point accelerators. The overall software framework of this Intel® PCC at KAUST, Hierarchical Computations on Manycore Architectures (HiCMA), is built upon these linear solvers and the philosophy that dense blocks of low rank should often be replaced with hierarchical matrices as they arise. Hierarchical matrices are natural algebraic generalizations of fast multipole, and are implementable in data structures similar to those that have made FMM successful on distributed nodes of shared memory cores.
FMM and hierarchical matrix algorithms share a rare combination of O(N) arithmetic complexity and high arithmetic intensity (flops/Byte). This is in contrast to traditional algorithms that have either low arithmetic complexity with low arithmetic intensity (FFT, sparse linear algebra, and stencil application), or high arithmetic intensity with high arithmetic complexity (dense linear algebra, direct N-body summation). In short, FMM and hierarchical matrices are efficient algorithms that will remain compute-bound on future architectures. Furthermore, these methods have a communication complexity of O(log P) for P processors, and permit high asynchronicity in their communication. Therefore, they are amenable to asynchronous programming models that are gaining popularity as architectures approach the exascale.