Introduction
In terms of processing power and user experience, a virtual reality (VR) system falls into three types: premium, mainstream, and entry-level. Premium VR represents high-end VR and includes products on the market with high configuration, high-performance PCs, or game consoles. The main VR peripherals that support premium VR are HTC Vive*, Oculus Rift*, and Sony PlayStation* VR.
The hardware performance of mainstream VR is not up to the quality of high-end VR, but still uses PC processors for VR computing power. Entry-level VR comprises mobile VR devices, such as Gear* VR, and Google Cardboard* and includes VR glasses and all-in-one machines that use mobile phone chips as computing devices.
This article described the methods of testing and profiling VR games based on HTC Vive* and Oculus Rift on a PC. Compared to traditional PC games, VR games differ in gameplay design, input mode, and performance requirements. Gameplay and input are not within the scope of this tutorial. Instead, we look at the different aspects of performance requirements of a VR game versus a traditional game.
The size of pixel processing per second is an important measurement of VR experience. Because the screen resolution of the current HTC Vive and Oculus Rift CV1 is 2160x1200, when doing the actual rendering, more sampling is needed to offset the resolution loss caused by lens distortion. For HTC Vive and Oculus Rift CV1, this loss is as high as 140 percent. The size of pixel processing for VR reaches a surprising 457 million per second.
Performance testing and analysis are important parts of VR games. These tasks help achieve the necessary requirements and ensure full utilization of all CPU and GPU processing capabilities. Before testing, Oculus needs to close the Asynchronous Spacewarp (ASW) and SteamVR* needs to close the asynchronous reprojection so that the VR runtime compensation intervention doesn’t affect the performance analysis behind it.
To disable ASW, in the Oculus SDK, you can open and run Program Files\Oculus\Support\oculus-diagnostics\OculusDebugTool.exe.
Figure 1. Asynchronous Spacewarp configuration.
To disable the reprojection function in SteamVR, use Settings/Performance.
Figure 2. SteamVR* configuration.
Tools for VR
Software tools play a vital role in testing and analyzing VR games. The main tools for these tasks include Fraps*, GamePlus*, Unreal Engine* console command, Windows* Assessment and Deployment Kit (ADK), SteamVR frame timing, and Intel® Graphics Performance Analyzers (Intel® GPA).
The Fraps FPS (Frames Per Second) Counter
The Fraps FPS (frames per second) counter is a traditional test and frame-time tool, which developers can use to test the maximum frame rate, minimum frame rate, and the average frame rate over a period of time. The results can be easily imported into an Excel* file to generate graphics (see the top of Figure 3). As shown in the graph, we can see whether the frame rate change is smooth throughout the entire process. In addition, Fraps is handy for taking screenshots, which can be saved for reporting purposes.
Figure 3. (Top) Fraps FPS (frames per second) shows the frame rate change over time. (Bottom) The maximum frame rate, minimum frame rate, and average frame rate of the game’s frame rate change generated in Fraps over time.
The bottom of Figure 3 shows the maximum frame rate, minimum frame rate, and average frame rate of the game's frame rate change generated in Fraps over time.
As the data shows, the frame rate of the scene in this VR game is low most of the time—only about 45 FPS—and does not meet HTC’s requirements. With this kind of performance, the player will experience dizziness or have motion sickness (In order to prevent discomfort or so-called motion sickness when playing the VR game, helmet manufacturers are required to reach a stable frame rate of 90 FPS.) At this point, we can use the GPUVIEW in the Windows ADK to determine whether the problem is due to the GPU or CPU.
Benchmark Utility GamePP*
Although Fraps is free-sharing software, it has not been updated for a long time. GamePP* is a similar benchmark utility from China (http://gamepp.com/) that you can use. When running this utility, a tool window automatically displays at the top of the game window. This window displays FPS, CPU temperature, graphics occupancy rate, CPU, graphics card, and memory usage, and so on, in real time (see Figure 4). Another disadvantage of Fraps is that it cannot be used to test DirectX* 12 games, but you can use PresentMon, another tool, to collect FPS data.
Figure 4. GamePP* real-time data display interface.
The tool window at the top of the game window provides real-time monitoring of the running game and its performance. But both Fraps and the GamePP utility are designed for traditional games and can only be displayed on a monitor.
Unreal Engine Tool
VR gamers wearing helmets or head-mounted displays (HMDs), who cannot see a game’s data changes in real time on the monitor, have two options if they want to see real-time performance data in a helmet:
- If the game is based on Unreal Engine, use the stat FPS console command.
- Use the SteamVR frame timing method, which can see real-time performance data in both the helmet and the display monitor. The SteamVR frame timing settings can be found here: https://developer.valvesoftware.com/wiki/SteamVR/Frame_Timing
Figure 5 shows the SteamVR frame timing data.
Figure 5. The missed frame in the head-mounted display.
When dropped frames occur in the game scene frame, a Missed Frames box will display on the HDM, as shown in Figure 5. The thicker the density of the red bar that displays in this box, the more frequent the frame drops occurred.
Figure 6. The CPU and GPU running on the PC.
Figure 6 shows more detailed data on the PC display. You can also configure the display in the HDM on the above show in headset. In the data readout, blue indicates the GPU rendering time and tan indicates the GPU free time.
As shown in Figure 6, the GPU rendering of some frames exceeds 11.11 milliseconds (ms), which will miss the time of Vsync and cause the missed frames. These frames cannot reach 90 FPS. Using this SteamVR frame timing tool, we can learn more about GPU bound, but it cannot determine whether the CPU render thread did not pass the rendering command in time, which caused the GPU to be a bubble or GPU rendering time to be too long.
Unity Engine’s Console Command Tool.
If the game was developed using the Unity Engine 4 engine and is a development version—not a release version—you can view the real-time performance data of the game using the Unity Engine’s console command tool.
You can press the ~ button in the game to display the command line window. The following are some of the common console commands:
- Stat FPS: Displays FPS per frame and frame time. This command is easy to use in VR because the frame rate is displayed on the VR headset. It is convenient for testers to observe the real-time performance of the game in use.
- Stat Unit: Shows the total time of each frame in the game, time consumed by the game logic thread, game render thread time consumption, and GPU time consumption (see Figure 7). In general, if the time consumption of a frame is close to that of the logical thread, the bottleneck is in the logical threads. However, if the time consumption of a frame is close to the rendering thread time, the bottleneck is in the rendering thread. If both times are closer to the GPU time, the bottleneck is on the graphics card.
Figure 7. Screenshot of the Stat Unit command.
Stat SceneRendering: Shows the various parameter values on the game’s render thread (see Figure 8).
Figure 8. Screenshot of the stat SceneRendering command.
Stat Game: Shows the real-time view of parameter values running on the game logic thread, such as artificial intelligence (AI), physics, blueprint, memory allocation, and so on (see Figure 9).
Figure 9. Screenshot of the Stat Game command.
Stat GPU: Shows the time parameters of the GPU main render content in each frame in real time (see Figure 10).
Figure 10. The Stat GPU Command Display Screen
Stat InitViews: Shows the time and efficiency data that culling takes (see Figure 11).
Figure 11. Screenshot of the Stat InitViews command.
Stat LightRendering: Displays the render time required for lighting and shading (see Figure 12).
Figure 12. Screenshot of the Stat LightRendering command.
Additional commands, such as Stat A and Stat, can be referenced from the Unreal official webpages:
- https://docs.unrealengine.com/udk/Three/ConsoleCommands.html
- https://docs.unrealengine.com/latest/INT/Engine/Performance/StatCommands/
GPUVIEW Tool
Further analysis can be done using GPUVIEW and Windows Performance Analyzer (WPA) in the Windows ADK. GPUVIEW is a powerful tool (for more information, refer to https://graphics.stanford.edu/~mdfisher/GPUView.html).
Of the above commands, Stat Unit gives a preliminary indication of whether a Frame is GPU bound or CPU bound. Sometimes the results are inaccurate, such as when a thread of a CPU causes a bubble in the middle of a GPU frame. If this happens, the GPU rendering time seen in the Stat Unit command is actually the add-ons of the real-time rendering time and bubble time. In this case, both GPUVIEW and WPA analysis are needed.
For example, as shown in Figure 13, the middle of each frame has a 2 ms bubble, thus the GPU is not working. Originally, frame rendering time was less than 11 ms, but with the bubble the rendering time is more than 11.1 ms as required by 90 FPS, which leads to the following frame missing the Vsync time. As a result, the frame drop occurred.
Figure 13. GPUVIEW interface: The middle of each frame has a 2 ms bubble
At this point, we can open the same Merged.etl file using WPA and find the time window of the bubble through the timeline to locate which thread of the CPU is heavier and what is running at this time on that thread (see Figure 14).
Figure 14. Windows* Performance Analyzer interface: the time window of the bubble through the timeline
If the rendering time of a GPU frame in GPUVIEW is more than 11.11 ms, the GPU bound can be determined, and then Intel® GPA can be used to analyze which parts of the pipeline are overloaded.
Intel® GPA
Intel GPA is a powerful, free graphics performance analyzer tool, which can be downloaded at https://software.intel.com/en-us/gpa. Intel GPA includes the following independent tools:
- System Analyzer: Real-time display of game performance indicators.
- Graphics Frame Analyzer: The frame analyzer.
- Platform Analyzer: Locates the CPU and GPU workloads.
- Graphics Trace Analyzer: Captures detailed event traces for other analyzer analysis.
Intel GPA Graphics Frame Analyzer is used in conjunction with GPUVIEW. It can view the draw call, render target, texture map, overdraw, and shader of a certain frame in a game. By simplifying the shader, you can design an experiment to detect which part of the rendering affects performance, in order to identify the key part to optimize (see Figure 15).
Figure 15. Interface of the Intel® Graphics Performance Analyzers Graphics Frame Analyzer.
Case Study
Let’s use an example to showcase how we can test and analyze a VR game.
You can use dxdiag command to view the machine configuration before testing:
CPU | Intel® Core™ i7-6700K processor 4.00 GHz |
---|---|
GPU | NVIDIA GeForce* GTX 1080 |
Memory | 1x8 GB DDR3 |
OS | Windows* 10 Pro 64-bit (10.0, Build 10586) |
Driver | 22.21.13.8233 |
First we run the Fraps test for a period of time and draw frame rate changes. As shown in Figure 16, we can see that during the first half of the test, there are some scenes that can reach 90 FPS, but in most of the latter half, the frame rate is fluctuating around 45, which does not meet the required standard. Further analysis is required.
Figure 16. FPS display.
Use the Unity Engine console stat FPS command to find a scene with a lower frame rate to conduct the analysis. If you think that the game is changing too fast to grab the data, you can use the console command PAUSE to PAUSE the game to make it easy to open the tools you need. In combination with the parameters of the stat Unit command, there are probably bottlenecks in both the CPU rendering thread and the GPU (see Figure 17).
Figure 17. Screenshot of stat unit command.
We must use GPUVIEW and WPA for simultaneous analysis.
Figure 18. GPU rendering time.
The first thing we can see from GPUVIEW is that the rendering time of a frame is 13.69 ms, over 11.11 ms, so the performance is not likely to reach 90 FPS (see Figure 18).
Next, we see that there is about 10 ms time on the CPU where there is only audio thread running (see Figure 19). Other threads are basically free, which means the audio thread did not make full use of CPU resources, which provides an opportunity to use the CPU for special effects, such as more AI, physics, materials, or particle effects.
Figure 19. CPU idle time.
This is also true from the WPA, where the game and render threads are basically idle.
Figure 20. CPU running thread displayed on the Windows* Performance Analyzer.
Using Intel GPA, you can see that there are less than 1,000 draw calls, which is a reasonable number.
Figure 21. All the draw calls in the Intel® Graphics Performance Analyzers Frame Analyzer.
Select all the Target to do experiments, where the time of the frame spent can be roughly seen.
Test Target | Before the Test | After the Test |
---|---|---|
2x2 Textures | 60.4 FPS | 71.4 FPS |
1x1 Scissor Rect | 60.4 FPS | 160.9 FPS |
Simple Pixel Shader | 60.4 FPS | 133 FPS |
The 2x2 textures experiment uses simple textures instead of textures in the real scene. Experiments have shown that simple textures don't have significant performance improvement, so texture optimizations can be ignored.
The 1x1 scissor rect experiment is to remove the pixel processing stage in the rendering pipeline. From this experiment, the performance has been improved significantly.
The simple pixel shader experiment, as the name suggests, replaces the original shader from a simplified pixel shader, and the performance is greatly improved through experiments.
From the experiments above, the pixel processing task in the GPU rendering pipeline is rather heavy. Another way to view which operation in a frame takes most of the time is to use ToggleDrawEvents at command line input in Unity Engine 4.
Taking the aspect analysis by making the specific function names on each draw call attach, and then catch Frame using Intel GPA, the time spent by each draw call is shown in the Intel GPA Frame Analyzer.
Figure 22. Intel® Graphics Performance Analyzers shows the execution functions of each Draw call.
Below is a table of a few time-consuming modules. This information will help you focus on the modules that have a high time ratio and selectively do the optimization.
Modules | Time Spent Ratio |
---|---|
BasePass | 15 percent |
Lights | 32.4 percent |
PostProcessing | 15.9 percent |
For more detailed Intel GPA analysis, please refer to another article from Intel® Developer Zone: https://software.intel.com/en-us/android/articles/analyze-and-optimize-windows-game-applications-using-intel-inde-graphics-performance
Summary
Optimization is one method you can use to experience a high-quality game if the hardware is not yet fully capable of achieving the high performance required for an immersive VR experience. Finding the bottlenecks during game optimization requires a comprehensive use of various tools and methods that were described in this article. This article also provided some insights through a variety of experiments and parameter adjustments to locate the CPU or GPU performance bottlenecks of the game, so as to improve the experience of the game.