This document describes the suggested scenario for using Roofline feature of Intel® Advisor.
- Intel® Advisor XE can be executed as a standalone GUI tool or integrated Visual Studio plug-in on Windows or in command line. If you plan to use command line or standalon GUI you should run advixe-vars.bat (on Windows) or source advixe-vars.sh (on Linux) to set up environment variables.
- To run a standalone GUI use advixe-gui.exe or advixe-gui command.
- For standalone tool create an Advisor project. Make sure the checkbox “Collect information about FLOPS…” is checked on the “Survey Trip Count Analysis” project settings page.
- In the Visual studio you can configure project settings by pressing toolbar button.
- To get Roofline analysis results you will have to run two different analysis of vectorization workflow. First run survey to collect information about program structure and loop execution times. Press button below the Survey Target analysis or press button on toolbar.
- After passing survey analysis you’ll have a general information about loops in your program. There are different types of information collected for every loop. You can refer to Advisor documentation to obtain help on what is displayed in the grid and how to use it for improving performance of your application.
The second analysis to run is "Trip Counts and FLOPS" analysis. Please run it to collect number of fp-opeations and memory operations. Press button for the “Find Trip Counts and FLOPs” to run it
Open a “roofline chart’ tab to see collected roofline data for your application.
On the chart you can see different rooflines available on your machine. Memory/Cache bounds and compute bounds. Those roofline obtained dynamically by running a small benchmark prior to running your application. Memory/Cache rooflines define a performance ceiling if data cannot fit to the particular cache. The compute rooflines show compute performance bounds if scalar or single/double precision vector or FMA computations are used.
You can disable/enable rooflines in the toolbox hidden over the three-stripes small box in the upper-right corner of the chart. Hot loops display parameters can be also tuned here.
You can see a red dot in the bottom represents position of my hottest loop on the roofline plot. As you can see I have a huge performance improvement opportunity in this application. Sometimes opening roofline tab you can miss some of your loops. It means they are so slow and do not fit to the bottom of your chart. To locate them simply resize the roofline chart panel
If your application is not threaded you can use single-threaded rooflines by checking a check-box in the top of the chart.There are also zooming controls available.
Selection of a particular loop on the roofline plot makes source code of the loop to be displayed in the bottom pane. You also can select loops on the survey report page and then switch to the roofline page with the loop highlighted on roofline.
The bottom pane contains several tab with various information about loops. Please refer to Advisor documentation to get additional help on it.
If you have nested loops in nested routines changing a filtering mode to “Loops And Functions” can be helpful because only selflime FLOPS metric is calculated/ To analyse FLOPS data for outer loops all nested loops and functions calls should be carefully reviewed. For more information on this topic refer to the following article. Selftime-based FLOPS computing.
For every hot loop in your program analyse loop position in roofline plot. Identify performance gaps and opportunities for every loop. Use other information and recommendations exposed by Advisor to improve performance of your application.
If you have any questions or problems please contact Advisor team by email vector_advisor@intel.com.