Non-Obvious Implementation Choices for the Monte Carlo Simulation

This article was written for the Monte Carlo Simulation of European Swaptions sample.

Visit the Wikipedia pages for the Monte Carlo algorithm and swaptions to learn more about what this code is doing.

There are a few parts of the Monte Carlo code that serve a non-obvious purpose and benefit from an expanded explanation here.

Separated "for" loops for simulations and additions to achieve consistent results

The astute programmer might notice that the second “for” loop in the Monte Carlo loop (in Simulations.cpp) could be rolled into the first, thus eliminating the redundancy of iterating to c_num_simulations twice, like so

for (int path=0; path<c_num_simulations; ++path) {
    calculate_path_for_swaption_kernel_scalar(initial_LIBOR_rate, volatility, 
    normal_distribution_rand+(path*c_time_steps), discounted_swaption_payoffs+path);
    total_payoff += discounted_swaption_payoffs[i];
}

The astute parallel programmer might even consider using an Intel® Cilk reducer so that a correct answer can be maintained even in the cilk_for version of the loop, like so

cilk::reducer_opadd<float_point_precision> total_payoff;
cilk_for (int path=0; path<c_num_simulations; ++path) {
    calculate_path_for_swaption_kernel_scalar(initial_LIBOR_rate, volatility, 
    normal_distribution_rand+(path*c_time_steps), discounted_swaption_payoffs+path);
    total_payoff += discounted_swaption_payoffs[i];
}

This is a valid and efficient way to derive the final reduction value, but this Monte Carlo simulation involves floating-point operations, which muddies the waters a little bit. Due to the approximate nature of floats, mathematical operations can result in a loss of accuracy in the final answer. The problem is only magnified if the operation is between a very large and very small number. It just so happens that the random numbers generated from the normal distribution from the Intel® Math Kernel Library (Intel® MKL) has such large and small numbers. This means that calculations normally considered associative, where the order does not matter, such as addition, may vary based on the order they are conducted. While the nondeterministic nature of parallel code would not normally affect the answer, if many float additions are conducted at relative random, the final answer will change between runs. This is not to say that the answer being calculated is incorrect—the calculation itself is the same every time—but the way it is rounded will change the final answer. (For a more in-depth explanation, see our “Consistency of Floating-Point Results using Intel Compiler” white paper.)

Thus it is up to the programmer to decide if bit-for-bit reproducibility is important. For this particular sample, in order to maintain a sense of consistency, the averaging of all simulations is done outside of the parallel loop; this makes the floating point rounding exactly the same for each “for” loop.

for/cilk_for (int path=0; path<c_num_simulations; ++path) {
    calculate_path_for_swaption_kernel_scalar(initial_LIBOR_rate, volatility, 
    normal_distribution_rand+(path*c_time_steps), discounted_swaption_payoffs+path);
}
for (int path=0; path<c_num_simulations; ++path) {
    total_payoff += discounted_swaption_payoffs[i];
}

Accessing the normal distribution in a uniform fashion for both scalar and Array Notation

The array notation kernel (calculate_path_for_swaption_kernel_array()) is almost identical to its scalar counterpart (calculate_path_for_swaption_kernel_scalar()). At first glance, the only difference between the two is the support of vectorization in the form of appropriately placed [:]. There is, however, one other major difference: accessing the normal distribution.

This normal distribution is used as a way to grow the evolving swap rate of each swaption portfolio in a pseudo organic way. While the ordering is random, to maintain a sense of consistency, it should be the same random numbers for every test. Simply accessing the array in-order for the scalar and array notation does not keep this order. This simplified example explains why:

Let’s say the random (or in this case, not very random) normal distribution array looks like

RND: [1, 2, 3, 4, 5, 6]

Let’s say we want to complete two simulations; that is,

c_num_simulations = 2

This would mean that each simulation will use half of the numbers in the normal distribution, or 3 numbers each.

c_time_steps = 3

In the scalar kernel, each simulation draws all of its numbers at once, such that

simulation 1: [1, 2, 3]
simulation 2: [4, 5, 6]

Now think about accessing the normal distribution array in an in-order fashion for the array notation kernel. This would look like:

sqiz[:] = sqrt(c_reset_interval) * normal_distribution_rand[j*c_simd_vector_length: c_simd_vector_length];

It draws two numbers at a time, where the first one is assigned to simulation 1 and the second one is assigned to simulation 2. Simulation 1 would get [1], and simulation 2 would get [2], and so on. Thus,

simulation 1 w/ Array Notation: [1, 3, 5]
simulation 2 w/ Array Notation: [2, 4, 6]

Each time this array notation kernel is called, it actually follows a different route than its scalar counterpart. This results in a different, but computationally similar final answer. Similar to the previous modification, this answer is not wrong, but simply different due to ordering. To fix the ordering, so that the scalar and vector kernels are bit-for-bit consistent, the access pattern for the normal distribution should be

sqiz[:] = sqrt(c_reset_interval) * normal_distribution_rand[j:c_simd_vector_length:c_time_steps];

`fp:source` and `Qfast-transcendentals`

fp:source specifies to the compiler that all intermediate calculations should be completed with the source floating-point accuracy. In other words, if two floats are multiplied, floating point values are used for all steps of the multiplication. This is an important setting because the Intel compiler defaults to fp:double(Windows*) or fp:extended(Linux*/OS X*) when compiling for 32 bit applications, which forces all floating-point calculations with floats to first up-convert the number to a double/extended. While this can increase final value accuracy, it may come at the cost of excessive casting, which can slow down the calculation. (Note: fp:source additionally implies fp:precise)

Qfast-transcendentals allows the Intel compiler to make fast calls for transcendental functions even under fp:precise. Normally, fp:precise will enable a slow but very accurate implementation of transcendental functions, but enabling Qfast-transcendentals allows the compiler to use faster, SSE2 calls where there is hardware support. By default, these fast transcendental functions introduce variability in the calculations, but the compiler option Qfimf-precision=high tells the compiler to use more accurate (and only slightly slower) SSE2 function calls that, at least for this sample, maintain bit-for-bit consistency.

Non-Obvious Implementation Choices for the Monte Carlo Simulation

This article was written for the Monte Carlo Simulation of European Swaptions sample.

Separated "for" loops for simulations and additions to achieve consistent results

Accessing the normal distribution in a uniform fashion for both scalar and Array Notation

`fp:source` and `Qfast-transcendentals`

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112

This article was written for the Monte Carlo Simulation of European Swaptions sample.

Separated "for" loops for simulations and additions to achieve consistent results

Accessing the normal distribution in a uniform fashion for both scalar and Array Notation

fp:source and Qfast-transcendentals

Trending Articles

`fp:source` and `Qfast-transcendentals`