Product Version: Intel® Fortran Compiler 15.0 and a later version
Cause:
A vectorizable loop contains loads from memory locations that are not contiguous in memory (sometimes known as a “gather”). These may be indexed loads, as in the example below, or loads with non-unit stride. The compiler has issued a hardware gather instruction for these loads.
(Note that for compiler versions 16.0.1 and earlier, the compiler may also emit this message when gather operations are emulated in software).
The vectorization report generated using Intel® Fortran Compiler's optimization and vectorization report options:
Windows* OS: /O2 /Qopt-report:2 /Qopt-report-phase:vec
Linux OS or OS X: -O2 -qopt-report2 -qopt-report-phase=vec
Example:
An example below will generate the following remark in optimization report:
subroutine gathr(n, a, b, index) implicit none integer, intent(in) :: n integer, dimension(n), intent(in) :: index real(RT), dimension(n), intent(in) :: a real(RT), dimension(n), intent(out) :: b integer :: i do i=1,n b(i) = 1.0_RT + 0.1_RT*a(index(i)) enddo end subroutine gathr
$ ifort -c -xcore-avx2 -qopt-report=4 -qopt-report-file=stdout gathr.F90 -DRT=4 -S | egrep 'gather|VECTORIZED'
remark #15415: vectorization support: gather was generated for the variable a: indirect access [ gathr.F90(10,29) ]
remark #15300: LOOP WAS VECTORIZED
remark #15458: masked indexed (or gather) loads: 1
remark #15301: REMAINDER LOOP WAS VECTORIZED
$ egrep gather gathr.s
vgatherdps %ymm4, -4(%r8,%ymm3,4), %ymm5 #10.29
vgatherdps %ymm7, -4(%r8,%ymm6,4), %ymm8 #10.29
vgatherdps %ymm3, -4(%r8,%ymm2,4), %ymm4 #10.29
$
The compiler has vectorized the loop using a “gather” instruction from Intel® Advanced Vector Extensions 2 (Intel® AVX2).
Compare to the behavior when compiling with -DRT=8 as described in the article for diagnostic #15328.
Resolution:
See also:
Requirements for Vectorizable Loops
Vectorization and Optimization Reports