Typical reductions in OpenMP* involve using a associative operator op to do local reductions, and then using a reduction clause to collect those local reductions. For example, the following code computes a dot product by computing local sums on each thread and then summing them.
// Returns dot product of two vectors
int dot( int x[], int y[], int n ) {
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for( int i=0; i<n; ++i ) {
sum += x[i]*y[i];
}
return sum;
}
The reduction clause specifies two things:
- When control enters the parallel region, each thread in the region gets a thread-private copy of sum, initialized to the identity element for +.
- When control leaves the parallel region, the original sum is updated by combining its value with the final values of thethread-private copies, using +. Since + is associative (or nearly so for floating-point), the final sum has the same value (or nearly so) as it would for serial execution of the code.
// Returns dot product of two vectors
int dot( int x[], int y[], int n ) {
int c = 0;
int s = 0;
#pragma omp parallel for reduction(+:c,s)
for( int i=0; i<n; ++i ) {
int p = x[i]*y[i];
int q = c^p;
c &= p|s;
c |= p&s;
c <<= 1;
s ^= q;
}
return c+s;
}
Consider it a puzzle, not recommended practice. If you need a hint for why it works correctly, look at this datasheet for a classic TTL part.
*Other names and brands may be claimed as the property of others