Quantcast
Viewing all articles
Browse latest Browse all 3384

Diagnostic 15037: loop was not vectorized: vectorization possible but seems inefficient (Fortran)

This diagnostic can also occur in the form:

    remainder loop was not vectorized: vectorization possible but seems inefficient

Cause:

The compiler only auto-vectorizes a loop if its internal heuristics indicate that a speed-up is likely. If a speed-up seems unlikely or is too uncertain, the compiler emits the message "vectorization possible but seems inefficient" and does not vectorize the loop. Common reasons for this include:

1. non-unit stride memory access;
2. indirect memory access;

3. low iteration count.

These scenarios do not always prevent vectorization. The compiler takes into account the total amount of work in the loop, amongst other factors, when deciding whether vectorization is likely to be beneficial. Below are examples for all the scenarios:

Example:

subroutine d_15037_1(a, b, n, istride)
  real,    intent(inout), dimension(n) :: a
  real,    intent(in   ), dimension(n) :: b  
  integer, intent(in   )               :: n, istride
 
  do i=1,n,istride
     a(i) = a(i) + b(i)
  enddo
end subroutine d_15037_1
> ifort -c -vec-report2 d_15037_1.f90
d_15037_1.f90(6): (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient
 
subroutine d_15037_2(a, b, c, ind, n)
  real,    intent(out), dimension(n) :: a
  real,    intent(in ), dimension(n) :: b, c
  integer, intent(in ), dimension(n) :: ind
  integer, intent(in )               :: n

  do i=1,n
     a(i) = b(ind(i)) + c(i)
  enddo
end subroutine d_15037_2
> ifort -c -vec-report2 d_15037_2.f90
d_15037_2.f90(7): (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient
subroutine d_15037_3(a, b, n)
  real,    intent(inout), dimension(n) :: a
  real,    intent(in   ), dimension(n) :: b  
  integer, intent(in   )               :: n

  do i=1,3
     a(i) = a(i) + b(i)
  enddo
end subroutine d_15037_3
> ifort -c -vec-report2 d_15037_3.f90
d_15037_3.f90(6): (col. 3) remark: loop was not vectorized: vectorization possible but seems inefficient

 

Resolution:

If you believe that vectorization might nevertheless result in a speedup, you can override the compiler's cost model by inserting the directive
!DIR$ VECTOR ALWAYS       before the loop, as a hint to the compiler. The compiler will still test for dependencies and will not vectorize the loop unless it is safe.
You may instead require vectorization by using the directive
!DIR$ SIMD   or   !$OMP SIMD  (from OpenMP 4.0).

In this case, the compiler will not perform dependency analysis, and it is the programmer's responsibility to ensure that vectorization is safe.

Remainder loops:

After the main loop kernel has been vectorized, any remaining iterations at the end of the loop are known as the "remainder loop". Usually, the number of remaining iterations is less than that needed to completely fill a vector register. The compiler may still try to vectorize such a remainder loop. However, because the number of iterations is normally small, it is quite common for the compiler to decide that vectorization is not worthwhile and to emit the diagnostic:

     remainder loop was not vectorized: vectorization possible but seems inefficient


Viewing all articles
Browse latest Browse all 3384

Trending Articles