Introduction
This document briefly gives an overview of the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and shows different ways to build an application for the Intel® Xeon Phi™ processor x200 using the Intel® compiler.
Intel® AVX-512 Family of Instructions
Intel AVX-512 instructions deliver a comprehensive set of functionality and higher performance than Intel® AVX and Intel® AVX2 family of instructions (see section 1.2 here for a complete description). Intel AVX-512 instruction set architecture (ISA) consists of the following groups:
Intel AVX-512 Foundation instructions (AVX-512F) are the base of Intel AVX-512. They include extensions of the Intel AVX and Intel AVX2 family of SIMD instructions but are encoded using EVEX encoding scheme with support for 512-bit vector registers, up to 32 vector registers in 64-bit mode, and conditional processing using opmask registers.
Intel AVX-512 Conflict Detection instructions (AVX-512CD) provide efficient conflict detection to allow more loops to be vectorized.
Intel AVX-512 Exponential and Reciprocal instructions (AVX-512ER) are designed to provide building blocks for accelerating certain transcendental math computations.
Intel AVX-512 Prefetch instructions (AVX-512PF) are new instructions that can be useful for reducing memory operation latency exposure that involve gather/scatter instructions.
Intel (AVX-512BW) extend AVX-512 instruction set to cover 8-bit and 16-bit integer operations.
Intel (AVX-512DQ) are new 32-bit and 64-bit AVX-512 instructions for enhancing integer and floating-point operations.
Intel AVX-512 Vector Length Extensions (AVX-512VL) extends most AVX-512 operations to also operate on XMM (128-bit) and YMM (256-bit) registers, instead of only ZMM (512-bit) registers.
AVX-512F, AVX-512CD, AVX-512ER, and AVX-512PF are implemented in the Intel Xeon Phi processor x200 (code named Knights Landing) while AVX-512F, AVX-512CD, AVX-512BW, AVX-512DQ, and AVX-512VL are implemented in the Intel® Xeon® processor.
Compiling for the Intel® Xeon Phi™ processor x200
The instruction groups common to both the Intel Xeon Phi processor x200 and the Intel Xeon processor are AVX-512F and AVX-512CD. The AVX-512ER and AVX-512PF groups are implemented in the Intel Xeon Phi processor x200 only. The AVX-512BW, AVX-512DQ and AVX-512VL groups are implemented only in the Intel Xeon processor.
You can use the following Intel compiler options to build an executable for Intel AVX-512 code generation:
The compiler option –xcode
The general form is –xcode on Linux* and /Qxcode on Windows* where code is the argument. This option tells the compiler which processor features it may target. To generate Intel AVX-512 instructions, you can use one of the three different arguments to generate different categories:
-xCOMMON-AVX512: use this option to generate AVX-512F and AVX-512CD.
-xMIC-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512ER and AVX-512FP.
-xCORE-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512BW, AVX-512DQ and AVX-512VL.
For example, to generate Intel AVX-512 instructions for the Intel Xeon Phi processor x200, you should use the option –xMIC-AVX512. For example, on a Linux system
$ icc –xMIC-AVX512 application.c
This compiler option is useful when you want to build a huge binary for the Intel Xeon Phi processor x200. Instead of building it on the coprocessor where it will take more time, build it on an Intel Xeon processor-based machine
The compiler option –axcode
The general form is –axcode on Linux and /Qaxcode on Windows where code is the argument. This option is used for multiple, feature-specific, auto-dispatch code paths for Intel® processors and generates a baseline-code path (generic IA code path). The baseline-code path will be used when the hardware platform does not support the specific ISA. You can use one of the three different arguments to generate different Intel AVX-512 categories:
-axCOMMON-AVX512: use this option to generate AVX-512F and AVX-512CD; a baseline-code path is also generated.
-axMIC-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512ER and AVX-512FP; a baseline-code path is also generated.
-axCORE-AVX512: use this option to generate AVX-512F, AVX-512CD, AVX-512BW, AVX-512DQ and AVX-512VL; a baseline-code path is also generated.
This compiler option is useful when you try to build a binary that can run on multiple platforms.
The compiler option –xHost
The general form is -xHost on Linux and /QxHost on Windows. This option is used for the highest instruction set available on the compilation host processor.
For example, on an Intel® Xeon® v3 system running Linux use the following command to generate AVX2 ISA
$ icc –xHost application.c
However, if running the same command on the Intel Xeon Phi processor x200, it will generate Intel AVX-512 for that architecture (equivalent to –xMIC-AVX512)
If running the same command on future Intel Xeon processors that support Intel AVX-512, it will generate Intel AVX-512 for that architecture (equivalent to -xCORE-AVX512).
Note that all options above are not available for the Intel® Xeon Phi™ x100 coprocessor.
Using the Intel® Software Development Emulator to test your executable
As a note, you can also use the Intel® Software Development Emulator (Intel® SDE) to test your executable for the Intel Xeon Phi processor x200. Intel SDE is a software emulator and is mainly used for code emulating future instructions only, not for performance. This article here provides information in details on how to run the Intel SDE for the Intel Xeon Phi processor x200.
Conclusion
This document briefly summarized different Intel AVX-512 groups and what options in the Intel compiler you can use to build an executable for the Intel Xeon Phi processor x200 to properly take advantage of Intel AVX-512 ISA.
References