Whether you’re tuning development code for the first time or conducting advanced performance optimizations, Intel® VTune™ Amplifier turns raw profiling data into performance insight. If you need to determine bottlenecks, sync points, and CPU hotspots in your PC game code developed with the Unreal* Engine, you can take advantage of the graphical user interface to sort, filter, and visualize data from a local or remote target, with low overhead.
In Unreal Engine* 4.19, Intel® software engineers worked with Unreal* to add support for Intel VTune Amplifier instrumentation and tracing technology (ITT) markers. This guide shows the user how to take advantage of the new integration to generate annotated traces of Unreal Engine 4 (UE4) inside the Intel VTune Amplifier 2018 UI. Download UE4 from Unreal Engine. Download a free trial of Intel VTune Amplifier.
Capturing Unreal Engine* 4 Traces
Scoped events are cumulative CPU timings of blocks of code analyzed frame by frame. Scoped events at the function or “between braces” level can now be captured and viewed in the Intel VTune Analyzer profiler using ITT events. Setting up scoped events can help you track standard engine statistics.
To get started, run the Intel VTune profiler as an Administrator.
For the application, choose the UE4 Editor by including the entire path.
For application parameters, specify the game with any necessary settings, such as resolution. In the example below, the UE4 Particle Effects demo is profiled. Make sure you add “-VTune” at the end of the application parameters command line (see figure 1). If you need help with the command-line arguments in addition to the -VTune switch, refer to the Command-Line Arguments section of the UE4 documentation.
Select the checkbox to set the application directory as the working directory. If you need help with any of the other settings on this screen, use the F1 key to access VTune’s help system.
Figure 1. Setting up the application, game, and application parameters under the analysis target tab.
Next, move to the Analysis Type tab and choose “Advanced Hotspots” under the Algorithm Analysis heading (see figure 2).
Set the CPU sampling interval at 1ms.
For this example, to keep overhead down, at “Select a level of details provided with event-based sampling collection” click “Hotspots.”
Set the Event mode to “All.”
Select the checkbox for “Analyze user tasks, events, and counters.”
Figure 2. Setting up advanced hotspots under the analysis type tab.
Next, start the game through the Intel VTune profiler.
In the Unreal Engine dev console, which you can open with the ~ (tilde) key while the workload is running, type “stat NamedEvents.” Scoped events will now be tracked. Note that you need a Development build to make this feature work. For more information, refer to the Build Configurations section of the UE4 help system.
When finished collecting statistics, stop the profiler.
Viewing Unreal Engine 4 Traces
After processing the results, the summary will show captured Top Task types statistics, similar to figure 3.
Figure 3. Statistics gathered for the top tasks.
At the Advanced Hotspots screen, move to the “Bottom-up” tab (see figure 4). The Bottom-up view will show an in-depth look at the tasks. Use the “Grouping” pull-down menu to select the view for “Task Domain / Task Type /Task Duration Type / Function / Call Stack.”
Figure 4. The bottom-up view shows an in-depth look at the reported tasks.
You can keep exploring the report for your code profile from additional tabs on the Advanced Hotspot screen. For example, the “Platform” view will depict timing for named events (see figure 5).
Figure 5. Timing for named events seen from the platform view.
There is a lot of information in these reports for you to inspect. For more information, see the documentation for Intel VTune Amplifier Tutorials. You’ll find HTML and PDF documents to walk you through examples, as well as sample code to solve issues with Windows*, Linux*, C++, Fortran, OpenMP*; Android* challenges surrounding energy usage; detecting hotspots; identifying locks and waits that prevent parallelization, and more.
Custom Events
Any code snippet inside UE4 that you want to optimize may be investigated by encapsulating it with cycle counters as described in this guide. This gives you the ability to define custom events and follow their execution on the thread timeline in the Intel VTune Analyzer UI.
Conclusion
Performance on modern processors requires much more than optimizing single-thread performance. High-performing code must be:
- Threaded and scalable to utilize multiple CPUs
- Vectorized for efficient use of SIMD units
- Tuned to take advantage of non-uniform memory architectures and caches
With Intel VTune Amplifier, you get advanced profiling capabilities with a single, user-friendly analysis interface. UE4 and Intel VTune Amplifier work together to let you investigate your code and profile it to run smoothly across multiple cores. In addition, the optimization tools allow you to create faster code, get more accurate data about the CPU and GPU, and investigate threading and memory usage—all with low overhead. Plus, you’ll get answers more quickly thanks to easy analysis that turns data into insights. Download the most recent versions of the Unreal Engine and the Intel VTune Amplifier today to get ready to take your game-dev efforts to the next level.