Abstract
In 2013, using an updated Intel® Math Kernel Library (Intel® MKL), ANSYS added Intel® Xeon Phi™ coprocessor support with automatic offload (AO) to their ANSYS Mechanical software. The result was a 1.72X performance improvement1,2 over two CPU cores alone. For ANSYS developers, the changes to the code were minimal, because support for their routines was built into Intel MKL. This paper summarizes the work that led to the ANSYS Mechanical performance benefits and the benchmarking of the coprocessor.
Contents
- ANSYS* – Design and Simulation Software Leader
- Intel – Leader in High-performance and Technical Computing
- Supporting Intel® Xeon Phi™ Coprocessor in ANSYS* Mechanical
- Intel® Math Kernel Library: Simplifying Support for Intel® Xeon Phi™ Coprocessor
- Automatic Offload
- ANSYS* Mechanical with Intel® Xeon Phi™ Coprocessor Support Benchmark
- Key Findings
- Conclusion
- Additional Resources
- Acknowledgements
- About the Author
ANSYS* – Design and Simulation Software Leader
ANSYS is a leader in engineering simulation software, enabling Simulation-driven Product Development. With ANSYS software, companies can foresee how product designs will behave in real-world environments. ANSYS products help companies refine and validate designs at a stage where the cost of making changes is minimal, while improving time to market.
ANSYS Mechanical software is a comprehensive Finite Element Analysis (FEA) tool for structural analysis, including linear, nonlinear, and dynamic studies. The engineering simulation product provides a complete set of elements behavior, material models, and equation solvers for a wide range of mechanical design problems. In addition, ANSYS Mechanical offers thermal analysis and coupled-physics capabilities involving acoustic, piezoelectric, thermal–structural, and thermo-electric analysis.
Intel – Leader in High-performance and Technical Computing
Intel, with its Intel® Xeon™ processor families, Intel® Xeon Phi™ coprocessors, Intel® workstation and server boards, Intel® chipsets, and many other products, is a leading technology innovator in a range of components for computing, storage, networking, and software development. Intel innovation enables balanced, high-performance, and energy-efficient systems.
The Intel Xeon Phi coprocessor, based on Intel Many Integrated Core (MIC) architecture, complements the Intel® Xeon® processor E5 family. With up to 1.2 teraflops per coprocessor, Intel Xeon Phi coprocessor enables new levels of performance for highly parallelized workloads, accelerating time-to-solution for today's most demanding computing applications, such as structural analysis using ANSYS Mechanical.
"ANSYS is excited to be supporting Intel® Xeon Phi™ coprocessors in our 15.0 release of ANSYS Mechanical. We are especially proud to be the first commercially available major engineering simulation software solution to support Xeon Phi coprocessors. Intel's approach to highly-parallel computing has assisted us in delivering commercial performance in the very first release. Intel's latest addition to supporting highly-parallel workloads is impressive, posting over 1 TFlops peak performance, which can be utilized by our direct sparse solver. We have been collaborating with Intel on Intel® Xeon® processors for years, and we see Intel Xeon Phi as a natural migration. We look forward to continuing to maximize value to our customers for both Intel Xeon processors and Intel Xeon Phi coprocessors."
Barbara Hutchings, Director, Strategic Alliances and High-performance computing strategy, ANSYS, Inc.
Supporting Intel® Xeon Phi™ Coprocessor in ANSYS* Mechanical
In 2013, working with Intel developers of Intel® Math Kernel Library (Intel® MKL), ANSYS added optimizations to ANSYS Mechanical code to support the Intel Xeon Phi coprocessor. By adding optimization for the Intel Xeon Phi coprocessor, ANSYS Mechanical can intelligently branch code executions depending on the configurations of the hardware. This enables support for executing equivalent solutions across multiple code paths:
- Parallelization based on threading within a single execution rank by OpenMP-style multi-threading or Intel MKL-threaded kernels (symmetric multi-processing – SMP)
- Parallelization based on distributed MPI domains using the licensed cores across multiple sockets in the platform and MPI calls (distributed multi-processing—DMP).
- Parallelization based upon Intel MKL Automatic Offload, which offloads parallelized kernels to the Intel Xeon Phi coprocessor.
Future additions to Intel MKL will also enable automatic offloading to the Intel Xeon Phi coprocessor within MPI-style DMP execution. These code modifications help ensure ANSYS Mechanical can instinctively deliver the performance offered by the platform and available Intel Xeon Phi coprocessors.
Intel® Math Kernel Library: Simplifying Support for Intel Xeon Phi Coprocessor Automatic Offload
Automatic offload (AO) library routines in Intel MKL allow developers to quickly add Intel Xeon Phi coprocessor support to their applications with minimal investment. Intel MKL library routines enable AO of parallelized code to the coprocessor, handling all buffer management and data transfers from the host to the coprocessor and back during execution. Intel MKL also handles threading of computationally intensive kernels for parallel performance. For ANSYS Mechanical, the ANSYS engineers simply added calls within the code that permitted Intel MKL to offload work as appropriate to the coprocessor. Those calls included detecting the existence of one or more coprocessor cards and initializing them.
Developers wanting to add Intel Xeon Phi coprocessor support to their code should check with Intel to find out if Intel MKL currently supports routines they execute. Intel continues to expand Intel Xeon Phi coprocessor support in Intel MKL to simplify the work independent software developers need to do to provide significant performance boosts for their customers.
ANSYS* Mechanical with Intel Xeon Phi Coprocessor Support Benchmark
ANSYS benchmarked performance3.4,5 of ANSYS Mechanical on a typical workstation to understand the benefits of using an Intel Xeon Phi coprocessor for ANSYS Mechanical workloads running on SMP Linux. ANSYS engineers used a typical ANSYS workload, V145sp-5, for the benchmark. Engineers ran multiple tests on the configurations listed in Table 1. The baseline and performance tests comprised the following:
- 2, 4, 8, 16, and 24 cores without Intel Xeon Phi coprocessor using OpenMP (SMP) for baselines.
- 2, 4, 8, 16, and 24 cores with Intel Xeon Phi coprocessor using OpenMP (SMP) for comparison to the baselines.
- 2, 4, 8, 16, and 24 cores without Intel Xeon Phi coprocessor using MPI (DMP) calls (Intel MKL does not yet support MPI in the Intel Xeon Phi coprocessor; this work is still underway).
Table 1. Benchmark Platform Configuration
Component | Host | Coprocessor |
---|---|---|
CPU | Intel® Xeon® processor E5-2697 v2 | Intel® Xeon Phi™ coprocessor 7120P |
Cores/Threads | 12/24 | 61/244 |
Clock Speed | 2.7 GHz / Turbo 3.5 GHz | 1.238 GHz |
Memory | 64 GB DDR3-1600, 8.0 GT/s | 16 GB GDDR5-5500, 5.5 GT/s |
Frimware/Software | Linux* Kernel 2.6.32-279.el6.x86_64 | Flash version 2.1.03.0386 |
Application Software Support | Intel® Composer XE 12.1 |
|
Key Findings
The ANSYS Mechanical V145sp-5 workload on SMP Linux performed best with 24 CPU cores and no coprocessor support, as shown in Figure 1. (Raw performance was actually improved when running the workload with distributed MPI domains on two sockets and 24 cores without Intel Xeon Phi coprocessor support (not shown). However, this benchmark evaluates the benefits of the Intel Xeon Phi coprocessor, which currently supports only SMP in ANSYS Mechanical—DMP support is part of the Intel MKL roadmap.)
Figure 1. Best run times for ANSYS Mechanical on V145sp-5
With two cores, the Intel Xeon Phi coprocessor provided a nearly 2X performance improvement over the baseline. At 24 cores, the Intel Xeon Phi coprocessor did not increase performance over the 24-core baseline. This was likely due to the benefit of data locality, fast memory-to-CPU communications, and 24 multi-threaded cores versus the communications delay from data transfers to the 61 cores in the coprocessor. However, a significant improvement over baseline is obtained with many fewer licensed cores. Considering the cost in ANSYS Mechanical core licenses for 24 cores (US$38,000), an Intel Xeon Phi coprocessor with only two licensed CPU cores (US$9,900) offers an important performance benefit for price/performance-minded users.
Table 2. ANSYS Mechanical licensing fees (at time of publication)
Software Licensing Costs US$ (# system CPU cores) | |||||
---|---|---|---|---|---|
Configuration/Code Path | 2 cores | 4 cores | 8 cores | 16 cores | 24 cores |
SMP: Xeon E5-2697v2 Host | 6600 | 13200 | 19000 | 38000 | 38000 |
SMP: Xeon E5-2697v2 + Phi7120P | 9900 | 16500 | 19000 | 38000 | 38000 |
DMP: Xeon E5-2697v2 Host | 6600 | 13200 | 19000 | 38000 | 38000 |
As shown in Figure 2, with only two licensed CPU cores and the Intel Xeon Phi coprocessor (which counts as a third core for license purposes), the benchmark ran 1.72x faster with Intel Xeon Phi coprocessor support. That cuts the run times nearly in half for an additional one-core license fee (US$6600 for two cores; US$9900 for three cores).
Figure 2. SMP Linux performance improvements over baseline with Intel Xeon Phi coprocessor
Purchasers considering adding many core licenses to ANSYS Mechanical versus adding an Intel Xeon Phi coprocessor should clearly understand the coprocessor performance benefit for fewer licenses. The capability to achieve a significant performance improvement with minimal licensing costs makes the investment in an Intel Xeon Phi coprocessor quite viable. In addition, as Intel continues to enhance Intel MKL, Intel Xeon Phi coprocessor Automatic Offload support within parallel MPI ranks will become easy for software companies to enable.
Conclusion
With 61 cores and 244 threads, the Intel Xeon Phi coprocessor can deliver significant performance improvements on highly parallel codes. Intel MKL enables software developers, such as ANSYS, to easily add Intel Xeon Phi coprocessor support to their codes. Intel MKL handles all buffer management and data transfers from the host to the coprocessor and back during execution. It also handles threading of computationally intensive kernels for parallel performance, minimizing the investment companies have to make to achieve considerable jumps in performance.
ANSYS added Intel Xeon Phi coprocessor support to ANSYS Mechanical in 2013. The result was a 1.72X speedup in their V145sp-5 workload on just two licensed CPU cores plus the coprocessor license. ANSYS developers' made minimal changes to their codes to achieve this benefit.
Software companies wanting to add Intel Xeon Phi coprocessor support into their code should check with Intel about how Intel MKL can help their efforts.
Additional Resources
For further reading:
http://software.intel.com/en-us/articles/math-kernel-library-automatic-offload-for-intel-xeon-phi-coprocessor
http://software.intel.com/sites/default/files/article/335818/intel-xeon-phi-coprocessor-quick-start-developers-guide.pdf (page 28)
https://www.tacc.utexas.edu/documents/13601/137150/Advanced+Offloading.pdf
https://www.tacc.utexas.edu/documents/13601/901837/offload_slides_DJ2013-3.pdf/4b27de31-e8c4-4848-b100-c4b670b48148
https://www.tacc.utexas.edu/c/document_library/get_file?uuid=308bde8f-a34e-422d-a05c-e1dbe2a06231&groupId=13601
Acknowledgements
Jeff Beisheim and Ray Browell of ANSYS, along with Bob Larson and Paul Besl of Intel provided data and input to this paper. Special acknowledgement goes to Intel MKL team for quick implementation of support for Intel Xeon Phi coprocessor.
About the Authors
Ken Strandberg, principal at Catlow Communications, writes technical articles, white papers, seminars, web-based training, technical marketing content, and interactive collateral. His clients include emerging technology companies, Fortune 100 enterprises, and multi-national corporations. Mr. Strandberg contributes to a variety of industries, including Software, Industrial Technologies, Design Automation, Networking, Medical Technologies, Semiconductor, and Telecom. Mr. Strandberg can be reached at ken@catlowcommunications.com.
Dr. Paul Besl is a software engineering manager in the Manufacturing (Vertical) Engineering Team (MET), which works closely with the ISV's LSTC, ANSYS, & SIMULIA. He has been with Intel for almost 8 years & has been involved with the Intel Xeon Phi program since January 2010. In the past, he has held various technical & managerial positions at General Motors, Alias|wavefront (now Autodesk), SDRC (now Siemens PLM), Bendix Aerospace (now Honeywell), and Arius3D. He received a distinguished dissertation award for his Ph.D. work in the Computer, Information, and Control Engineering department at the University of Michigan and graduated summa cum laude with AB Physics degree from Princeton University. He has authored numerous papers, book chapters, and a book.