This document provides information about asynchronous data transfer, asynchronous computation and memory management without data transfer. This document includes code examples of common usage scenarios. The examples in this article are in Fortran only.
Introduction
Two different Fortran directives are used for data transfer and wait for completion.
The directive for data transfer only, with asynchronous option is:
!dir$ offload_transfer <clauses> [ signal(<tag>) ]
The directive to wait for completion of asynchronous activity is
!dir$ offload_wait <clauses> wait(<tag>)
The offload directive also takes optional signal and wait clauses
!dir$ offload <clauses> [ signal(<tag>) ] [ wait(<tag>) ]<statement>
The offload_transfer and offload_wait directives are stand-alone and do not apply to the subsequent code block.
Data Transfer
The offload_transfer directive is a stand-alone directive, meaning that no statement succeeds it. This directive contains a target clause and either all in clauses, or all out clauses. Without a signal clause, offload_transfer initiates and completes a synchronous data transfer. With a signal clause, initiates the data transfer only. The offload_transfer directive can also take a wait clause. A later directive with wait clause is used to wait for data transfer completion.
Expressions in the signal and wait clauses are address-sized values that serve as tags on the asynchronous operation.
! Example 1 ! Synchronous data transfer CPU -> MIC ! Next statement executed after data transfer is completed !dir$ offload_transfer target(mic:0) in(a,b,c) ! Example 2 ! Initiate asynchronous data transfer CPU -> MIC !dir$ offload_transfer target(mic:0) in(a,b,c) signal(s)
The offload_wait directive is also a stand-alone directive which does not require a succeeding statement. This directive contains a target clause and a wait clause, which cause the directive to start execution only after the asynchronous activity associated with the tag has completed.
! Example 3 ! Wait for activity signaled by &p to be completed. Variable p is the tag. !dir$ offload_wait target(mic:0) wait(s)
Memory Management
The offload_transfer directive can be used for memory allocation and deallocation by avoiding the data transfer with the use of the nocopy clause. This is typically done outside of a loop to amortize cost of allocation.
! Example 4 #define ALLOC alloc_if(.TRUE.) free_if(.FALSE.) #define FREE alloc_if(.FALSE.) free_if(.TRUE.) #define REUSE alloc_if(.FALSE.) free_if(.FALSE.) ! Allocate memory on the coprocessor (without also transferring data) !dir$ offload_transfer target(mic:0) nocopy(p,q: ALLOC) do … ! Use of allocated memory on the coprocessor for offloads !dir$ offload target(mic:0) in(p: REUSE) out(q: REUSE) ! computation using p and q enddo … ! Free memory on the coprocessor (without also transferring data) !dir$ offload_transfer target(mic:0) nocopy(p,q: FREE)
Send Input Data Asynchronously
The most typical usage initiates the data transfer, executes some CPU activity, then starts the offload computation that will use the transferred data. The data is placed in the same variables listed in the transfer initiation. Those variables must be accessible by the time the offload directive begins execution.
! Example 5 ! Initiate asynchronous data transfer MIC -> CPU !dir$ offload_transfer target(mic:0) in(p,q,r) signal(s) … … ! Do the offload only after data has arrived !dir$ offload target(mic:0) wait(s) ! offload computation … = p
Receive Output Asynchronously
In asynchronous offload, an offload computation produces results that will be transferred back to the host at a later time. The offload directive finishes the work but does not immediately copy the data back. Instead, an asynchronous offload_transfer initiates the copy. Later, when results are needed, an offload_wait is used to retrieve the data.
! Example 6a ! Perform the offload computation but don’t copy back results immediately !dir$ offload target(mic:0) nocopy(p) ! offload computation … ! Initiate asynchronous data transfer MIC -> CPU !dir$ offload_transfer target(mic:0) out(p) signal(s) … … ! Wait for data to arrive !dir$ offload_wait target(mic:0) wait(s)
Asynchronous Computation
The host initiates an offload to be performed asynchronously and can proceed to next statement after starting this computation. Later in the code, an offload_wait directive is used to wait for completion of the offload activity.
! Example 6b character :: signal_var integer, allocatable, dimension :: p do … ! Initiate asynchronous computation !dir$ offload … in(p) signal(signal_var) call mic_compute(); call concurrent_cpu_activity(); !dir$ offload_wait (signal_var); enddo
Testing Signals
Some scenarios require testing to determine whether the computation signaled with a given tag is finished. Use the Offload_signaled function (non-blocking mechanism) to check if an offload has completed.
! Example 7 ! Initiate asynchronous computation program prog use mic_lib implicit none integer :: c !dir$ offload target(mic:mic_no) signal(c) ! offload computation statement ... ! Test if computation has been completed if (Offload_signaled(mic_no, c) /= 0) then … endif
Double-buffering
Use the offload, offload_transfer and offload_wait directives to implement a double-buffering algorithm. The example below shows memory allocation on the target device, asynchronous data transfers, the use of signal clauses to control asynchronous offloads.
! Example 8 - Double-buffering Input subroutine do_async_in() integer :: i !dir$ offload_transfer target(mic:0) in(in1: REUSE) signal(sig1) do i=1, iter if (MOD(i, 2) == 1) then ! Odd numbered iterations !dir$ offload_transfer target(mic:0) if(i /= iter) in(in2: REUSE) signal(sig2) !dir$ offload target(mic:0) nocopy(in1) wait(sig1) out(out1: REUSE) call compute(in1, out1); else !dir$ offload_transfer target(mic:0) if(I /= iter) in(in1: REUSE ) signal(sig1) !dir$ offload target(mic:0) nocopy(in2) wait(sig2) out(out2: REUSE) call compute(in2, out2); endif enddo end subroutine ! Example 8 - Double-buffering Output subroutine do_async_out() integer :: i do i=1, iter if(MOD(i, 2) ==1) then ! Odd numbered iterations if (i<iter) then ! all iterations except the last !dir$ offload target(mic:0) in(in1: REUSE) nocopy(out1) call compute(in1, out1) !dir$ offload_transfer target(mic:0) out(out1: REUSE) signal(sig1) end if if (i>1) then ! all iterations except the first !dir offload_wait target(mic:0) wait(sig2) call use_result(out2) endif else ! even numbered iterations if(i < iter) then ! all iterations except the last !dir$ offload target(mic:0) in(in2:REUSE) nocopy(sig2) call compute(in2, out2) !dir$ offload_transfer target(mic:0) out(out2:REUSE) signal(sig2) endif if(i > 1) then ! all iterations except the first !dir$ offload_wait target(mic:0) wait(sig1) call use_result(out1) endif endif enddo end subroutine
Summary
Asynchronous offload allows data transfer and computation to overlap. This method does not require the use of additional threads on the host and is useful for pipelined operations. Refer to the following sample code installed with the Intel® Fortran Compiler for more details (default installation directory):
- Linux*: /opt/intel/composer_xe_2015/Samples/en_US/Fortran/mic_samples/LEO_Fortran_intro
- Windows*: C:\Program Files (x86)\Intel\Composer XE 2015\Samples\en_US\Fortran