Vectorization: A Key Tool To Improve Performance On Modern CPUs

Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).

The Rise of Parallelism

For the past decade, Moore’s law has continued to prevail, but while chip makers have continued to pack more transistors into every square inch of silicon, the focus of innovation has moved away from greater clock speeds and towards multicore and manycore architectures.

A great deal of focus has been given to engineering applications that are capable of exploiting the growing number of CPU cores by running multi-threaded or grid-distributed calculations. This type of parallelism has become a routine part of designing performance critical software.

At the same time, as the multicore chip design has given rise to task parallelism in software design, chipmakers have also been increasing the power of a second type of parallelism: instruction level parallelism. Alongside the trend to increase core count, the width of SIMD (single instruction, multiple data) registers has been steadily increasing. The software changes required to exploit instruction level parallelism are known as ‘vectorisation’.

The most recent processors have many cores/threads and the ability to implement single instructions on an increasingly large data set (SIMD width).

A key driver of these architectural changes was the power/ performance dynamic of the alternative architectures.

• Wider SIMD – Linear increase in transistors & power

• Multi core – Quadratic increase in transistors & power

• Higher clock frequency – Cubic increase power

SIMD provides a way to increase performance using less power.

Software design must adapt to take advantage of these new processor technologies. Multi-threading and vectorisation are each powerful tools on their own, but only by combining them can performance be maximised. Modern software must leverage both Threading and Vectorisation to get the highest performance possible from the latest generation of processors.

Why Vectorise?

Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values (vector) at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD). For example, a CPU with a 512 bit register could hold 16 32- bit single precision doubles and do a single calculation.

16 times faster than executing a single instruction at a time. Combine this with threading and multi-core CPUs leads to orders of magnitude performance gains.

Implementing Vectorization

There are a range of alternatives and tools for implementing vectorisation. They vary in terms of complexity, flexibility and future compatibility. The simplest way to implement vectorisation is to start with Intel’s 6-step process. This process leverages Intel tools to provide a clear path to transforming existing code into modern, high-performance software leveraging multicore and manycore processors.

Applying Vectorization to CVA Aggregation

The Finance domain provides many good candidates for vectorization. A particularly good example is the aggregation of Credit Value Adjustment (CVA) and other measures of counterparty risk. The most common general purpose approach to calculation of CVA is based on a Monte-Carlo simulation of the distribution of forward values for all derivative trades with a counterparty. The evolution of market prices over a series of forward dates is simulated, then the value of each derivative trade is calculated at that forward date using the simulated market prices. This gives us a ‘path’ of projected values over the lifetime of each trade. By running a large number of these randomized simulated ‘paths’, we can estimate the distribution of forward values, giving both the expected and extreme ‘exposures.’ The simulation step results in a 3-dimensional array of exposures. The task of calculating CVA from these exposures occurs in several steps: netting, collateralisation, integration over paths, and integration over dates.

More Details

Check out this whitepaper (PDF).

Image may be NSFW.
Clik here to view.

Also a complete webinar (on quantifi's site) and associated slide-deck (PDF)

Vectorization: A Key Tool To Improve Performance On Modern CPUs

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112