Reference Implementations for Intel® Architecture Approximation Instructions VRCP14, VRSQRT14, VRCP28, VRSQRT28, and VEXP2

Author: Marius Cornea, Intel Corporation

We are providing two files, RECIP14.c and RECIP28EXP2.c, containing reference implementations for the scalar versions of 10 approximation instructions introduced in the "Intel® Architecture Instruction Set Extensions Programming Reference" document (see https://software.intel.com/en-us/isa-extensions). The files can be downloaded from the links provided above.

RCP14.c contains emulation routines for the underlying algorithms of:

VRCP14PD - Compute Approximate Reciprocals of Packed Float64 Values with relative error of less than 2^-14
VRCP14SD - Compute Approximate Reciprocal of Scalar Float64 Value with relative error of less than 2^-14
VRCP14PS - Compute Approximate Reciprocals of Packed Float32 Values with relative error of less than 2^-14
VRCP14SS - Compute Approximate Reciprocal of Scalar Float32 Value with relative error of less than 2^-14
VRSQRT14PD - Compute Approximate Reciprocals of Square Roots of Packed Float64 Values with relative error of less than 2^-14
VRSQRT14SD - Compute Approximate Reciprocal of Square Root of Scalar Float64 Value with relative error of less than 2^-14
VRSQRT14PS - Compute Approximate Reciprocals of Square Roots of PackedFloat32 Values with relative error of less than 2^-14
VRSQRT14SS - Compute Approximate Reciprocal of Square Root of Scalar Float32 Value with relative error of less than 2^-14

The corresponding emulation routines (only scalar versions) are:

RCP14S - reciprocal approximation for Float32
RCP14D - reciprocal approximation for Float64
RSQRT14S - reciprocal square root approximation for Float32
RSQRT14D - reciprocal square root approximation for Float64

RCP28EXP2.c contains emulation routines for the underlying algorithms of:

VRCP28PD - Approximation to the Reciprocal of Packed Double Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRCP28SD - Approximation to the Reciprocal of Scalar Double Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRCP28PS - Approximation to the Reciprocal of Packed Single Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRCP28SS - Approximation to the Reciprocal of Scalar Single Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRSQRT28PD - Approximation to the Reciprocal Square Root of Packed Double Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRSQRT28SD - Approximation to the Reciprocal Square Root of Scalar Double Precision Floating-Point Value with Less Than 2^-28 Relative Error
VRSQRT28PS - Approximation to the Reciprocal Square Root of Packed Single Precision Floating-Point Values with Less Than 2^-28 Relative Error
VRSQRT28SS - Approximation to the Reciprocal Square Root of Scalar Single Precision Floating-Point Value with Less Than 2^-28 Relative Error
VEXP2PD - Approximation to the Exponential 2^x of Packed Double Precision Floating-Point Values with Less Than 2^-23Relative Error
VEXP2PS - Approximation to the Exponential 2^x of Packed Single Precision Floating-Point Values with Less Than 2^-23Relative Error

The corresponding emulation routines (only scalar versions) are:

RCP28S - reciprocal approximation for Float32
RCP28D - reciprocal approximation for Float64
RSQRT28S - reciprocal square root approximation for Float32
RSQRT28D - reciprocal square root approximation for Float64
EXP2S - Base-2 exponential approximation for Float32
EXP2D - Base-2 exponential approximation for Float64

The reference functions have to be compiled with the DAZ and FTZ mode turned off (e.g. with the Intel compiler for Linux, using the -no-ftz option), and have to be run with the rounding mode set to round-to-nearest, and with floating-point exceptions masked.

Usage example for RCP14S and RCP14D