One of the pitfalls of parallel programming is the need to consider whether your code modifies a memory location in two parallel strands. Doing so is called a race because whether you get the correct answer depends on the timing of the executed instructions. Parallel reads are OK, since the value doesn’t change. But two writes, or a read and a write in parallel are errors. Consider the following code which increments a global location:
mov MEM, rax ;; Move the value of MEM to rax
add rax, 1 ;; Increment rax
mov rax, MEM ;; Store rax into MEM
Now consider what can happen if this sequence is executed in parallel by two threads. Assume “MEM” starts with a value of 7.
Thread A | Thread B | |
t0 | Load the value 7 from MEM to %rax | |
t1 | Add 1 to %rax | Load the value 7 from MEM to %rax |
t2 | Store %rax to MEM | Add 1 to %rax |
t3 | Store %rax to MEM |
At the end of the sequence, we expect MEM to have a value of 9. But if the two threads run at the same time (or are interleaved), MEM will end up with the value 8. Finding this sort of bad behavior by inspection is difficult. Fortunately, this is the type of detail-oriented problem that computers are good at.
The Cilk Screen Race Detector monitors your application for all reads from and writes to memory and reports on any read/write or write/write conflicts it finds. A race is reported if any possible schedule of the program can produce results that may be different from the serial execution of the program. Note that Cilk screen will only report on the Cilk parallel region. It will ignore any other threads created by the program using other threading packages or the OS’ native calls.
Getting Cilk Screen
Cilk screen is provided as part of the Intel Cilk Plus SDK, which is available as a free download at http://cilkplus.org/download. Download the version appropriate for your operating system and install it on your system. The Cilk Plus SDK includes:
- The Cilk screen race detector
- The Cilk view scalability analyzer
- Documentation on how to run the race detector and scalability analyzer
- Sample applications
Using Cilk Screen
Cilk screen runs against your application’s binary image. While debugging information is not required to find races, debugging information may be required to translate program addresses into symbolic names. Cilk screen can be run against both optimized and unoptimized code, but you may wish to disable inlining to make it easier to understand the callstack. We recommend testing with unoptimized builds during development and verifying that optimized builds are correct before shipping your code.
Monitoring every memory read and write imposes a large performance penalty on your program. You should use smaller data sets when running your application under Cilk screen. However, keep in mind that Cilk screen will only analyze code that is executed as part of a test. We recommend that you run your application under Cilk screen with a variety of input data sets with the aim of maximizing the code coverage within your application. Because the resolution of one race may expose or create a previously unreported race, you should run Cilk screen after any program change until your program is race-free.
Here’s a simple program to demonstrate using Cilk screen:
#include <cilk/cilk.h> #include <stdio.h> int sum = 0; void add_to_sum(int i) { sum += i; } int main(int argc, char **argv) { cilk_spawn add_to_sum(1); add_to_sum(2); cilk_sync; printf("The sum is: %dn", sum); return 0; }
To run your application under Cilk screen, simply prefix the command to invoke your application with the “cilkscreen --” command. The “--” indicates the start of your command and is optional:
C:sumx64Debug>cilkscreen -- sum.exe Cilkscreen Race Detector V2.0.0, Build 3327 for Intel64 Race condition on location 000000013F87A150 write access at 000000013F8712E5: (C:sum.cpp:8, sum.exe!add_to_sum+0x45) read access at 000000013F8712DF: (C:sum.cpp:8, sum.exe!add_to_sum+0x3f) called by 000000013F871474: (C:sum.cpp:14, sum.exe!main+0x182) Variable: 000000013F87A150 - sum Race condition on location 000000013F87A150 write access at 000000013F8712E5: (C:sum.cpp:8, sum.exe!add_to_sum+0x45) write access at 000000013F8712E5: (C:sum.cpp:8, sum.exe!add_to_sum+0x45) called by 000000013F871474: (C:sum.cpp:14, sum.exe!main+0x182) Variable: 000000013F87A150 - sum The sum is: 3 2 errors found by Cilkscreen Cilkscreen suppressed 1 duplicate error messages
By default Cilk screen will write it’s output to stderr. You can redirect it using the –r option to specify a file to receive the information in text format, or the –x option to specify that the information is to be written as XML. You can get a full list of the Cilk screen options using -? or --help.
Race condition on location 000000013F87A150
Cilk screen detected a race condition at memory location 0x13F87A150
write access at 000000013F8712E5: (C:\sum.cpp:8, sum.exe!add_to_sum+0x45)
The first access that participated in the race was a write from the instruction at 0x13F8712E5, which is 0x45 bytes from the start of the function add_to_sum(), which is in image sum.exe. The instruction corresponds to line 8 of the source file sum.cpp.
read access at 000000013F8712DF: (C:\sum.cpp:8, sum.exe!add_to_sum+0x3f)
The second access that participated in the race was a read from the instruction at 0x13F8713DF, which is 0x3F from the start of the function add_to_sum(), which is in image sum.exe. The instruction corresponds to line 8 of the source file sum.cpp.
called by 000000013F871474: (C:\sum.cpp:14, sum.exe!main+0x182)
Cilk screen displays the callstack for the second memory reference involved in the race. add_to_sum() was called by main() at line 14 in sum.cpp, which corresponds to the CALL instruction at offset 0x182 of main() in sum.exe.
While reporting the callstack for both accesses would be preferable, maintaining the callstack for every memory access in case there’s a race would be expensive in both memory and time.
Variable: 000000013F87A150 - sum
On Windows, Cilk screen will attempt to symbolize the memory address. This information is not available on other platforms at this time.
The sum is: 3
This line is output from the program. Since we didn’t redirect the diagnostic output to a file, it is intermingled with the output from the program.
2 errors found by Cilkscreen
Cilkscreen suppressed 1 duplicate error messages
At the conclusion of the run, Cilk screen will summarize the errors it found and how many duplicate errors it suppressed. You can use the –a option to have Cilk screen report all errors if you wish.
It’s important to emphasize that you do not need to have sources available to use Cilk screen. We worked with a customer who found races in a 3rd party library they were calling from their Cilk code. You can see this for yourself using the following code:
#include <cilk/cilk.h> #include <iostream> int main(int argc, char **argv) { cilk_for(int i = 0; i < 20; i++) { std::cout << i << std::endl; } return 0; }
It’s well known that C++ stream I/O is not thread safe. This will show you why.