Introduction
SunGard’s Adaptiv Analytics* allows traders to run pre-deal cost-of-credit calculations. Due to the volume and complexity of products, these calculations are often time consuming, causing delays that can lead to missed opportunities or taking action with incomplete information.
Since SunGard’s customer usage model is often running multiple instances simultaneously instead of running a single instance as fast as possible, running SunGard’s Adaptiv Analytics on systems with more cores can help improve the performance dramatically. SunGard’s adoption of Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel’s investment in parallel computing through the use of vectorization lanes and registers has helped provide superior scalability and performance for SunGard’s industry-leading risk management solution. These improvements are helping to meet the growing computational requirements of the market and the regulatory environment.
This paper describes how Adaptiv Analytics running on systems equipped with Intel® Xeon® processor E7-8890 v3 gained a performance improvement over running on systems with the previous generation of Intel® Xeon® processor E7-4890 v2.
SunGard’s Adaptiv Analytic and Intel® Xeon® Processor E7-8890 V3
For hardware, Intel Xeon processor E7-8890 v3 has 18 cores over comparing to Intel Xeon E7-4890 v2 that has 15 cores resulting in increasing parallelism in E7-8890 v3. In additional to that, E7-8890 v3 has larger memory bandwidth comparing to E7-4890 v2 and uses DDR4 memory while E7-4890 v2 uses DDR3 memory, thus speeding up the executions.
For hardware, the Intel Xeon processor E7-8890 v3 has Intel AVX2 while the Intel Xeon processor E7-4890 v2 only supports Intel® Advanced Vector Extensions (Intel® AVX). Let see how Intel AVX2 improves the performance of this produce.
# Cores | # Threads | Memory | Vectorization | |
---|---|---|---|---|
Intel® Xeon® E7-4890 v2 | 15 | 30 | DDR3 | AVX |
Intel® Xeon® E7-8890 v3 | 18 | 36 | DDR4 | AVX2 |
Table 1. Processors Comparison
SunGard’s Adaptiv Analytics uses the Monte Carlo simulation to perform risk analysis. The Monte Carlo simulation is often used whenever there is a need to analyze the behavior of activities or processes that involve uncertainty, such as risk management. This simulation calculates the results multiple times using a random set of values, giving the decision maker a range of possible outcomes. The random set of values is generated from the probability functions.
To increase the accuracy of the possible outcome, Monte Carlo simulation needs to run for a long time period, possibly repeating up to 10,000 times. This is where Intel® AVX2 along with features of E7-8890 v3 mentioned above can provide advantages over those of E7-4890 v2.
The following paragraph talks about functions frequently used by Monte Carlo simulation for vector or matrix manipulations and are optimized by Intel AVX2.
daxpy
Function daxpy computes the following operation on double-precision values:
A× α + B
Where:
A and B: matrix or vector
α: Constant
dgemv
Function dgemv calculates the following operation on double-precision values:
α× A× x + β× y
Or
α× AT× x + β× y
Where:
α and β : Constants
x and y : Vectors
A: Matrix
Functions daxpy and dgemv are implemented in the Intel® Math Kernel Library (Intel® MKL) and Intel® Integrated Performance Primitives (Intel® IPP). Starting with version 11 of Intel MKL and version 8 of Intel IPP, the two functions are optimized using Intel AVX2. SunGard’s Adaptiv Analytics uses the daxpy and dgemv versions of Intel MKL and Intel IPP, thus taking advantage of Intel AVX2 performance improvements in the Intel Xeon processor E7-8890 v3. Using Intel’s libraries means that developers don’t have to modify their codes to take advantage of new enhancement features in future Intel® Xeon® processors.
Performance test procedure
To prove that Intel AVX2 along with the new microarchitecture in the Intel Xeon processor E7 v3 improve the performance of SunGard’s Adaptiv Analytics, we performed tests on two platforms. One system was equipped with the Intel Xeon processor E7-8890 v3 and the other with the Intel Xeon processor E7-4890 v2.
We created a launcher to execute x amount of instances of a command-line tool called RunCalcDef.exe that performs the calculations using the SunGard’s Adaptiv Analytics engine. On the system equipped with the Intel Xeon processor E7-8890 v3, we launched 18 instances with 8 nodes per instance while launching 30 instances with 4 nodes per instance on the system equipped with the Intel Xeon processor E7-4890 v2. Node is the term used in SunGard’s Adaptiv Analytics that specifies how many threads are operating on a subset of the 10k scenarios of the Monte Carlo simulation.
Why didn’t we use the same amount of instances and nodes on both systems? The reason: The system equipped with the Intel Xeon processor E7-8890 v3 has 4 sockets, each of which can handle 36 threads with hyper-threading on for a total of 144 threads for the whole system. On the other hand, the system equipped with the Intel Xeon processor E7-4890 v2 has 4 sockets, each of which can handle 30 threads with hyper-threading on for a total of 120 threads for the whole system. Using the same amount of instances and nodes on the system with Intel Xeon processor e7-8890 v3 as on the system with Intel Xeon processor E7-4890 v2 would result in over-subscribing the cores leading to a decrease in performance.
The tests computed the throughput, calculations per second, by dividing the total number of calculations executed (a pre-known value based on the number of instances) by the average execution time in seconds.
Test configurations
System equipped with Intel Xeon processor E7-8890 v3
- System: Pre-production
- Processors: Intel Xeon processor E7-8890 v3 @2.5 GHz
- Cores: 18
- Memory: 384 GB DDR4-2133 MHz
System equipped with Intel Xeon processor E7-4890 v2
- System: Pre-production
- Processors: Intel Xeon processor E5-4890 v2 @2.8 GHz
- Cores: 15
- Memory: 512 GB DDR3-1600 MHz
Operating System: Microsoft Windows Server* 2012 R2
Application: SunGard Adaptiv Benchmark v13.1
Test results
Figure 1. Performance comparison between processors.
Figure 1 shows a 1.47x performance gain of the system with the Intel Xeon processor E7-8890 v3 over that of the system with the Intel Xeon processor E7-4890 v2. The performance gain is due to the enhanced microarchitecture, increase in core count, better memory type (DDR4 over DDR3), and Intel AVX2.
Conclusion
More cores, enhanced microarchitecture, and the support of DDR4 memory contributed to the performance improvement of SunGard’s Adaptiv Analytics on systems equipped with the Intel Xeon processor E7-8890 v3 compared to those with Intel Xeon processor E7-4890 v2. With the introduction of Intel AVX2, matrix manipulations get a boost. In addition, applications that make use of Intel MKL and Intel IPP will receive a performance boost without having to change the source code, since their functions are optimized using Intel AVX2.
References
[1] Wikipedia. Basic Linear Algebra Subprograms. https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
[2] Investopedia. Corporate Finance – Risk-Analysis Techniques. http://www.investopedia.com/exam-guide/cfa-level-1/corporate-finance/risk-analysis-techniques.asp
[3] Intel® Integrated Performance Primitives (Intel® IPP). https://software.intel.com/en-us/intel-ipp
[4] Intel® Math Kernel Library (Intel® MKL) https://software.intel.com/en-us/intel-mkl
[5] LAPACK: Linear Algebra PACKage – dgemv.f. http://www.netlib.org/lapack/explore-html/dc/da8/dgemv_8f_source.html
[6] LAPACK: Linear Algebra PACKage – daxpy. http://www.netlib.org/lapack/explore-html/d9/dcd/daxpy_8f.html