Selftime-based FLOPS computing (Vectorization Advisor)

Let's talk about important specifics of computing FLOPS that can significantly affect FLOPS data interpretation especially in the Roofline chart.

At the moment, we are computing FLOPS using self-time (noninclusive). This means if you have nested loops, the FLOPS and arithmetic intensity computed for outer loop do not include operations happening in the inner loop. Our recommendation is to use FLOPS and Roofline information for outer loop taking in account these specifics.

This becomes trickier when you call functions inside your loop. Again, a noninclusive approach is used. So, operations, which happen inside such function, will not be counted to the loops' FLOPS and arithmetic intensity. The self time of the loop will be used to compute loops' FLOPS. Results of roofline analysis for such loop can lead to wrong conclusion and action plan.

Consider the following example where a modified matrix multiplication is used.

double compute(double a, double b)
{
	double factor = a/b;
	return (((((1+factor)*factor+1)*factor+1)*factor+1)*factor+1)*factor+1;
}
void multiply_d_noinline(int arrSize, double **aMatrix, double **bMatrix, double **product)
{
    for(int i=0;i<arrSize;i++) {
        for(int j=0;j<arrSize;j++) {
            double sum = 0;
            for(int k=0;k<arrSize;k++) {
#pragma noinline
                sum += compute(aMatrix[i][k],bMatrix[k][j]);
            }
	    product[i][j] = sum;
        }
    }
}

void multiply_d_inline(int arrSize, double **aMatrix, double **bMatrix, double **product)
{
    for(int i=0;i<arrSize;i++) {
        for(int j=0;j<arrSize;j++) {
            double sum = 0;
#pragma novector
            for(int k=0;k<arrSize;k++) {
                sum += compute(aMatrix[i][k],bMatrix[k][j]);
            }
	    product[i][j] = sum;
        }
    }
}

In the multiply_d_noinline function, most of computation performed in the compute routine that is called from the innermost computational loop. So all the computational and memory operations are excluded from the FLOPS and arithmetic intensity for the loop whereas all computations are inlined in the multiply_d_inline function and involved in the FLOPS metric calculation.

Let's see what roofline plot looks like.

Significant difference in the position of inlined and not inlined loops on the roofline plot

Although both loops do similar compute work, their positions on the plot are diverged considerably. Moreover, interpretation of the roofline plot results tells that the not inlined version of the loop is memory bound and requires better cache usage. On the other hand, the inlined loop is compute bound and vectorization is essential for performance improvement. What actually required is vectorization improvement for both loops.To take notice of these specifics we recommend to enable "Loops and Functions" filtering in the filter bar. You can see an extra dot appeared on the chart that represents a compute function FLOPS. So, interpreting roofline data for the loop with nested calls you should not only take in account the loop itself but also all the nested calls.

Roofline functions view is enabled

The sample code used in this article can be downloaded by the following link.

Selftime-based FLOPS computing (Vectorization Advisor)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112