Accelerating ANSYS* Mechanical Structural Analysis with Intel® Xeon Phi™ Coprocessors

Abstract

In 2013, using an updated Intel® Math Kernel Library (Intel® MKL), ANSYS added Intel® Xeon Phi™ coprocessor support with automatic offload (AO) to their ANSYS Mechanical software. The result was a 1.72X performance improvement1,2 over two CPU cores alone. For ANSYS developers, the changes to the code were minimal, because support for their routines was built into Intel MKL. This paper summarizes the work that led to the ANSYS Mechanical performance benefits and the benchmarking of the coprocessor.

ANSYS* – Design and Simulation Software Leader

ANSYS is a leader in engineering simulation software, enabling Simulation-driven Product Development. With ANSYS software, companies can foresee how product designs will behave in real-world environments. ANSYS products help companies refine and validate designs at a stage where the cost of making changes is minimal, while improving time to market.

ANSYS Mechanical software is a comprehensive Finite Element Analysis (FEA) tool for structural analysis, including linear, nonlinear, and dynamic studies. The engineering simulation product provides a complete set of elements behavior, material models, and equation solvers for a wide range of mechanical design problems. In addition, ANSYS Mechanical offers thermal analysis and coupled-physics capabilities involving acoustic, piezoelectric, thermal–structural, and thermo-electric analysis.

Intel – Leader in High-performance and Technical Computing

Intel, with its Intel® Xeon™ processor families, Intel® Xeon Phi™ coprocessors, Intel® workstation and server boards, Intel® chipsets, and many other products, is a leading technology innovator in a range of components for computing, storage, networking, and software development. Intel innovation enables balanced, high-performance, and energy-efficient systems.

The Intel Xeon Phi coprocessor, based on Intel Many Integrated Core (MIC) architecture, complements the Intel® Xeon® processor E5 family. With up to 1.2 teraflops per coprocessor, Intel Xeon Phi coprocessor enables new levels of performance for highly parallelized workloads, accelerating time-to-solution for today's most demanding computing applications, such as structural analysis using ANSYS Mechanical.

"ANSYS is excited to be supporting Intel® Xeon Phi™ coprocessors in our 15.0 release of ANSYS Mechanical. We are especially proud to be the first commercially available major engineering simulation software solution to support Xeon Phi coprocessors. Intel's approach to highly-parallel computing has assisted us in delivering commercial performance in the very first release. Intel's latest addition to supporting highly-parallel workloads is impressive, posting over 1 TFlops peak performance, which can be utilized by our direct sparse solver. We have been collaborating with Intel on Intel® Xeon® processors for years, and we see Intel Xeon Phi as a natural migration. We look forward to continuing to maximize value to our customers for both Intel Xeon processors and Intel Xeon Phi coprocessors."

Barbara Hutchings, Director, Strategic Alliances and High-performance computing strategy, ANSYS, Inc.

Supporting Intel® Xeon Phi™ Coprocessor in ANSYS* Mechanical

In 2013, working with Intel developers of Intel® Math Kernel Library (Intel® MKL), ANSYS added optimizations to ANSYS Mechanical code to support the Intel Xeon Phi coprocessor. By adding optimization for the Intel Xeon Phi coprocessor, ANSYS Mechanical can intelligently branch code executions depending on the configurations of the hardware. This enables support for executing equivalent solutions across multiple code paths:

Parallelization based on threading within a single execution rank by OpenMP-style multi-threading or Intel MKL-threaded kernels (symmetric multi-processing – SMP)
Parallelization based on distributed MPI domains using the licensed cores across multiple sockets in the platform and MPI calls (distributed multi-processing—DMP).
Parallelization based upon Intel MKL Automatic Offload, which offloads parallelized kernels to the Intel Xeon Phi coprocessor.

Future additions to Intel MKL will also enable automatic offloading to the Intel Xeon Phi coprocessor within MPI-style DMP execution. These code modifications help ensure ANSYS Mechanical can instinctively deliver the performance offered by the platform and available Intel Xeon Phi coprocessors.

Intel® Math Kernel Library: Simplifying Support for Intel Xeon Phi Coprocessor Automatic Offload

Automatic offload (AO) library routines in Intel MKL allow developers to quickly add Intel Xeon Phi coprocessor support to their applications with minimal investment. Intel MKL library routines enable AO of parallelized code to the coprocessor, handling all buffer management and data transfers from the host to the coprocessor and back during execution. Intel MKL also handles threading of computationally intensive kernels for parallel performance. For ANSYS Mechanical, the ANSYS engineers simply added calls within the code that permitted Intel MKL to offload work as appropriate to the coprocessor. Those calls included detecting the existence of one or more coprocessor cards and initializing them.

Developers wanting to add Intel Xeon Phi coprocessor support to their code should check with Intel to find out if Intel MKL currently supports routines they execute. Intel continues to expand Intel Xeon Phi coprocessor support in Intel MKL to simplify the work independent software developers need to do to provide significant performance boosts for their customers.

ANSYS* Mechanical with Intel Xeon Phi Coprocessor Support Benchmark

ANSYS benchmarked performance3.4,5 of ANSYS Mechanical on a typical workstation to understand the benefits of using an Intel Xeon Phi coprocessor for ANSYS Mechanical workloads running on SMP Linux. ANSYS engineers used a typical ANSYS workload, V145sp-5, for the benchmark. Engineers ran multiple tests on the configurations listed in Table 1. The baseline and performance tests comprised the following:

2, 4, 8, 16, and 24 cores without Intel Xeon Phi coprocessor using OpenMP (SMP) for baselines.
2, 4, 8, 16, and 24 cores with Intel Xeon Phi coprocessor using OpenMP (SMP) for comparison to the baselines.
2, 4, 8, 16, and 24 cores without Intel Xeon Phi coprocessor using MPI (DMP) calls (Intel MKL does not yet support MPI in the Intel Xeon Phi coprocessor; this work is still underway).

Table 1. Benchmark Platform Configuration

Component	Host	Coprocessor
CPU	Intel® Xeon® processor E5-2697 v2	Intel® Xeon Phi™ coprocessor 7120P
Cores/Threads	12/24	61/244
Clock Speed	2.7 GHz / Turbo 3.5 GHz	1.238 GHz
Memory	64 GB DDR3-1600, 8.0 GT/s	16 GB GDDR5-5500, 5.5 GT/s
Frimware/Software	Linux* Kernel 2.6.32-279.el6.x86_64 Intel® Turbo Boost technology6 enabled Intel® Hyper-Threading technology7 enabled	Flash version 2.1.03.0386 SMC Firmware 1.15.4830 SMC Boot Loader Version :f1.8.4326 MPSS 2.1.6720-15 uOS version : 2.6.38.8-g2593b11 EC enabled Turbo disabled
Application Software Support	Intel® Composer XE 12.1 Intel® MPI Library 4.1.1.036 (Update 1)

Key Findings

The ANSYS Mechanical V145sp-5 workload on SMP Linux performed best with 24 CPU cores and no coprocessor support, as shown in Figure 1. (Raw performance was actually improved when running the workload with distributed MPI domains on two sockets and 24 cores without Intel Xeon Phi coprocessor support (not shown). However, this benchmark evaluates the benefits of the Intel Xeon Phi coprocessor, which currently supports only SMP in ANSYS Mechanical—DMP support is part of the Intel MKL roadmap.)

Figure 1. Best run times for ANSYS Mechanical on V145sp-5

With two cores, the Intel Xeon Phi coprocessor provided a nearly 2X performance improvement over the baseline. At 24 cores, the Intel Xeon Phi coprocessor did not increase performance over the 24-core baseline. This was likely due to the benefit of data locality, fast memory-to-CPU communications, and 24 multi-threaded cores versus the communications delay from data transfers to the 61 cores in the coprocessor. However, a significant improvement over baseline is obtained with many fewer licensed cores. Considering the cost in ANSYS Mechanical core licenses for 24 cores (US$38,000), an Intel Xeon Phi coprocessor with only two licensed CPU cores (US$9,900) offers an important performance benefit for price/performance-minded users.

Table 2. ANSYS Mechanical licensing fees (at time of publication)

Software Licensing Costs US$ (# system CPU cores)
Configuration/Code Path	2 cores	4 cores	8 cores	16 cores	24 cores
SMP: Xeon E5-2697v2 Host	6600	13200	19000	38000	38000
SMP: Xeon E5-2697v2 + Phi7120P	9900	16500	19000	38000	38000
DMP: Xeon E5-2697v2 Host	6600	13200	19000	38000	38000

As shown in Figure 2, with only two licensed CPU cores and the Intel Xeon Phi coprocessor (which counts as a third core for license purposes), the benchmark ran 1.72x faster with Intel Xeon Phi coprocessor support. That cuts the run times nearly in half for an additional one-core license fee (US$6600 for two cores; US$9900 for three cores).

Figure 2. SMP Linux performance improvements over baseline with Intel Xeon Phi coprocessor

Purchasers considering adding many core licenses to ANSYS Mechanical versus adding an Intel Xeon Phi coprocessor should clearly understand the coprocessor performance benefit for fewer licenses. The capability to achieve a significant performance improvement with minimal licensing costs makes the investment in an Intel Xeon Phi coprocessor quite viable. In addition, as Intel continues to enhance Intel MKL, Intel Xeon Phi coprocessor Automatic Offload support within parallel MPI ranks will become easy for software companies to enable.

Conclusion

With 61 cores and 244 threads, the Intel Xeon Phi coprocessor can deliver significant performance improvements on highly parallel codes. Intel MKL enables software developers, such as ANSYS, to easily add Intel Xeon Phi coprocessor support to their codes. Intel MKL handles all buffer management and data transfers from the host to the coprocessor and back during execution. It also handles threading of computationally intensive kernels for parallel performance, minimizing the investment companies have to make to achieve considerable jumps in performance.

ANSYS added Intel Xeon Phi coprocessor support to ANSYS Mechanical in 2013. The result was a 1.72X speedup in their V145sp-5 workload on just two licensed CPU cores plus the coprocessor license. ANSYS developers' made minimal changes to their codes to achieve this benefit.
Software companies wanting to add Intel Xeon Phi coprocessor support into their code should check with Intel about how Intel MKL can help their efforts.

Additional Resources

For further reading:
http://software.intel.com/en-us/articles/math-kernel-library-automatic-offload-for-intel-xeon-phi-coprocessor
http://software.intel.com/sites/default/files/article/335818/intel-xeon-phi-coprocessor-quick-start-developers-guide.pdf (page 28)
https://www.tacc.utexas.edu/documents/13601/137150/Advanced+Offloading.pdf
https://www.tacc.utexas.edu/documents/13601/901837/offload_slides_DJ2013-3.pdf/4b27de31-e8c4-4848-b100-c4b670b48148
https://www.tacc.utexas.edu/c/document_library/get_file?uuid=308bde8f-a34e-422d-a05c-e1dbe2a06231&groupId=13601

Acknowledgements

Jeff Beisheim and Ray Browell of ANSYS, along with Bob Larson and Paul Besl of Intel provided data and input to this paper. Special acknowledgement goes to Intel MKL team for quick implementation of support for Intel Xeon Phi coprocessor.

About the Authors

Ken Strandberg, principal at Catlow Communications, writes technical articles, white papers, seminars, web-based training, technical marketing content, and interactive collateral. His clients include emerging technology companies, Fortune 100 enterprises, and multi-national corporations. Mr. Strandberg contributes to a variety of industries, including Software, Industrial Technologies, Design Automation, Networking, Medical Technologies, Semiconductor, and Telecom. Mr. Strandberg can be reached at ken@catlowcommunications.com.

Dr. Paul Besl is a software engineering manager in the Manufacturing (Vertical) Engineering Team (MET), which works closely with the ISV's LSTC, ANSYS, & SIMULIA. He has been with Intel for almost 8 years & has been involved with the Intel Xeon Phi program since January 2010. In the past, he has held various technical & managerial positions at General Motors, Alias|wavefront (now Autodesk), SDRC (now Siemens PLM), Bendix Aerospace (now Honeywell), and Arius3D. He received a distinguished dissertation award for his Ph.D. work in the Computer, Information, and Control Engineering department at the University of Michigan and graduated summa cum laude with AB Physics degree from Princeton University. He has authored numerous papers, book chapters, and a book.

Accelerating ANSYS* Mechanical Structural Analysis with Intel® Xeon Phi™ Coprocessors

Abstract

Contents

ANSYS* – Design and Simulation Software Leader

Intel – Leader in High-performance and Technical Computing

Supporting Intel® Xeon Phi™ Coprocessor in ANSYS* Mechanical

Intel® Math Kernel Library: Simplifying Support for Intel Xeon Phi Coprocessor Automatic Offload

ANSYS* Mechanical with Intel Xeon Phi Coprocessor Support Benchmark

Key Findings

Conclusion

Additional Resources

Acknowledgements

About the Authors

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112