How to spawn a statement in Cilk Plus

This blog discusses how to "spawn a statement" in Cilk Plus and some pragmatic considerations for doing so.

Cilk Plus has two ways to create parallelism:

spawn a function or functor with cilk_spawn
execute iterations of a loop in parallel with cilk_for

A common question is "can I spawn a statement"? The direct answer is no. But with C++11 lambda functions you can fake it by wrapping the statement in a lambda expression. For example, the string-copy idiom "while(*s++=*t++);" can be spawned by writing:

cilk_spawn [&]{while(*s++=*t++);}();

The code is an instance of the syntax "cilk_spawn functor();", where the functor is created by a C++11 lambda expression. See Lambda Functions in C++11 - The Definitive Guide for an introduction to C++ lambda expressions. Lambda expressions are already available in recent releases of the Intel, Microsoft, GNU, and LLVM compilers for C++.

Bigger Example

I had difficulty coming up with a good motivating example of why you would want to spawn a statement, because a function call is a natural boundary for spawning work. For exposition, I conjured the following example, which requires spawning a compound statement. The code applies a routine expensive_routine to each line from stdin.

    const size_t BUF_SIZE = 1024;
    char* buf;
    for(;;) {
        buf = new char[BUF_SIZE];
        if( !fgets(buf,BUF_SIZE,stdin) )
            break;
        cilk_spawn [=]{
            expensive_routine(buf);
            delete[] buf;
        }();
    }
    cilk_sync;
    delete[] buf;

It's not enough to spawn just expensive_routine, because buf must be deleted afterwards. Thus the code spawns both actions as a single task, by wrapping them in a C++11 lambda function. The next section explains why I used [=] instead of [&].

Even in C you can do the equivalent, since you can wrap any statement in a function, as long as you create the right arguments. The C++11 lambda feature just provides a convenience of automatic wrapping.

Correctness Issues

Functions make nice task boundaries because their parameters can be used to capture the values upon which the task operates. With lambda expressions, the capture is implicit, so so you need to think about what you are capturing, and whether to capture by value ([=]) or capture by reference ([&]). In general:

Use capture by reference for read-only values, unless capture by value is cheaper.
Use capture by value for values that might be modified by the caller (or on behalf of the caller) while the spawnee is running.

The earlier file processing example demonstrates the last point. Suppose that it was written with [&] instead of [=]. Then a race condition would exist for pointer buf that would almost surely cause corruption problems. The root problem would be that a running instance of the lambda body would be looking at the original pointer buf in main, but its value is overwritten by each iteration of the loop. With [=], the lambda body has its own copy of pointer buf.

Performance Issues

Lambdas expressions can hurt performance, which you can often fix by understanding how lambdas are implemented. The problem is that lambda expressions introduce another level of indirection, which might not only require more instructions, but might also hide optimization opportunies from the compiler. For sake of argument, consider my earlier example:

 cilk_spawn [&]{while(*s++=*t++);}();

Because I specified capture reference, the code compiles as if it was written:

    struct anon {
        char*& s;
        char*& t;
        anon( char*& s_, char*& t_ ) : s(s_), t(t_) {}
        void operator()() {while(*this->s++=*this->t++);}
    };
    cilk_spawn anon(s,t)();

Each access to s and t now requires two levels of indirection: one for the explicit this-> and one for the implicit indirection for a reference type. A compiler can sometimes hoist the indirections out of the loop. Alas for this example the implicit indirection is hard to hoist, because the C and C++ standards allow type char to alias with any other type, so the compiler must assume that each write to *this->s might modify the location referenced by this->s or this->t.

A fix is to hoist the values into local temporaries by hand, by writing:

    cilk_spawn [&]{
        auto ss=s, tt=t;
        while(*ss++=*tt++);
        s=ss; t=tt;
    }();

If I really do not need the final values of s and t, I would leave out the final two assignments and change [&] to [=].

By the way, to be worth spawning, the strings are going to have to be really long, probably at least a megabyte, and memory bandwidth is likely to be the limiting resource. However, I was able to eke out a little speedup when I tried two of these copy operations (with manual hoisting) in parallel on megabyte strings. Without the manual hoisting, the parallel code was significantly slower than the original serial version for the platform I tried.

Summary

You can use lambda expressions to "spawn a statement", but the need for such is rare. When performance is important, consider the extra level(s) of indirection implied by functors .

For more information about Intel Cilk Plus, see the website http://cilkplus.org. For questions and discussions about Intel Cilk Plus, see the forum http://software.intel.com/en-us/forums/intel-cilk-plus.

Multithread development