CPU Performance Optimization & Differentiation for VR Applications Using Unreal Engine* 4

By: Wenliang Wang

Virtual reality (VR) can bring an unprecedented user immersion experience, but at the same time, due to the characteristics such as binocular rendering, low latency, high resolution, and forced vertical synchronization (vsync), VR generates great pressure on CPU render threads, logic and threads, and computing of the graphics processing unit (GPU)¹. How to effectively analyze the bottlenecks of VR application performance and optimize the CPU thread to improve the degree of parallelization on working threads, thereby reducing the GPU waiting time to improve the utilization rate, are keys to determining whether a VR application is running smooth, is free of dizziness, and is immersive.

The Unreal Engine* 4 (UE*4) is one of two major game engines currently used by VR developers. Understanding the CPU thread structure and associated optimization tools of UE4 can help in developing better UE4-based VR applications. This paper covers the CPU performance analysis and debugging instructions, thread structure, optimization methods, and tools on UE4. It also covers how to make full use of idle computing resources of the CPU core to enhance performance of VR content, and provide corresponding performance of audio and visual content based on the different configurations of the various game players. The goal is to make a game that has the best immersive VR experience.

Why Optimize the PC VR Game

Asynchronous timewarp (ATW), asynchronous spacewarp (ASW) and asynchronous reprojection are technologies provided by VR runtime that can generate a composite frame when the frame drop appears in the VR application, by inserting frames; equivalent to reducing the delay. However, these are not perfect solutions, and each have different limitations: ATW and asynchronous reprojection can compensate for the motion-to-photon (MTP) delay generated by the rotational movement, but if the head position is moved or there are moving objects on the screen, even with ATW and asynchronous reprojection the MTP delay cannot be reduced. In addition, the ATW and asynchronous reprojection need to be inserted between the draw call of a GPU. When a draw call is too long (for example, post-processing) or the time left to give the ATW and asynchronous reprojection is insufficient, the frame insertion will fail. ASW will lock the frame rate at 45 frames per second (fps) when rendering cannot keep up, and add 22.2 milliseconds (ms) for a frame to render, to insert a composite frame between two rendering frames using traditional image motion estimation (motion estimation), as shown in Figure 1.

Figure 1: ASW interpolation effect.

In a synthetic frame, the acute movement or transparent part of the frame produces deformation (for example, the part within the red circles in Figure 1); violent illumination change is also prone to estimating errors. When continuous frames are inserted using ASW, picture shaking can be easily felt by users. These VR runtime technologies are not good solutions to the problem of frequent frame drops. Developers should ensure that VR applications in most cases can be stable running at 90 fps, and only rely on the above methods to solve accidental frame drops.

Introduction to Unreal Engine* 4 Performance Debugging Instructions

Applications developed with UE4 can query various real-time performance data via the stat command in the console command ^2-3. The stat unit instruction allows you to see the total frame rendering time (Frame), rendering thread consumption time (Draw), logical thread consumption time (Game), and GPU consumption time (GPU), from which you can see which part is restricting the frame rendering time, as shown in Figure 2. Combined with show or showflag instructions, dynamic switch can be used to control various features to observe the impact of each feature on rendering time, and find out factors that impact the performance, during which the pause command can be executed to suspend the logical thread to observe the result.

It should be noted that the GPU consumption time includes both GPU work time and GPU idle time, so even if it shows that the GPU spent the longest time in the stat unit, it does not necessarily mean that the problem is on the GPU. It is possible that a CPU bottleneck could cause the GPU to be in an idle state most of the time, and extends the time it takes for the GPU to complete a frame rendering. So, there is a need to combine other tools, such as GPUView* ⁴, to analyze the CPU and GPU time chart from which to locate the bottleneck position.

screenshot of frame statistics
Figure 2: Stat unit statistics.

In addition, because VR is opened with forced vertical synchronization, as long as the frame render time is more than 11.1 ms, more than 0.1 ms leads to a frame taking two full vertical synchronization cycles to complete. As a consequence, it is easy to slow down the performance of a VR application because of a slight scene change. For a better result use the - emulatestereo command with the resolution set to 2160 x 1200. The screen percentage ratio (screenpercentage) is set to 140, which can be used to analyze the performance without display of the VR head and closure of the vertical synchronization.

The performance data associated with the rendering thread can be seen through the stat scenerendering, including the number of draw calls, visibility culling length, light processing time, and so on. For visibility of culling, the stat initviews instruction can be used to further understanding and analysis of the processing time of each part, including frustum culling, precomputed visibility culling, and dynamic occlusion culling.

To judge the efficiency of each culling, enter the stat sceneupdate command to see the time it takes to update the world scene including add, update, and remove lights. In addition, you can write the frame rendering information into the log by specifying when the render time of a frame is over the t.H itchThreshold by using the stat dumphitches instruction.

To make the game effects match to different PC levels, stat physics, stat anim, and stat particles are frequently used instructions related to CPU performance, corresponding to the physical computing time (cloth simulation, damage effect, and so on), skin meshing computing time, and CPU particle computing time. Because these instructions can be assigned to different work threads for parallel processing in UE4, they can be extended accordingly so that the VR application is effectively adapted to different levels of hardware. As a result, VR immersive experience and overall performance can be enhanced by the increase in the number of CPU cores.

In addition, you can directly enter the console commands stat startfile and stat stopfile to collect the real-time running data for a designated time period, and then use the Stats Viewer in the UE4 session frontend to view the utilization ratio of CPU thread and the call stack, finding the CPU hot spot, and carry on the corresponding optimization, as shown in Figure 3. The functions are similar to the Windows* Performance Analyzer (WPA) in the Windows* Assessment and Deployment Kit (ADK).

Screenshot of UE4 built in Stats Viewer
Figure 3: The Stats Viewer built in the UE4 session frontend.

CPU Optimization Techniques for UE*4 VR Applications

In the process of VR development, when encountering CPU performance problems, not only do we need to find out where the bottleneck is, but we also have to master the tools provided in UE4 that can help to optimize the bottleneck. By understanding the usage, effects, and differences of each tool we can quickly identify and select the most appropriate strategy to optimize the performance of VR applications. In this section we will focus on the UE4 optimization tools.

Rendering Thread Optimization

Due to performance, bandwidth, and multisample anti-aliasing (MSAA) considerations, current VR applications usually use forward rendering instead of deferred rendering. However, in the forward rendering pipeline of UE4, in order to reduce GPU overdraw, the prepass phase before base pass will force the use of early-z to generate the depth buffer, resulting in reduced GPU workload being submitted before the base pass. In addition, DirectX* 11 is basically in a single-threaded rendering category, and multi-threaded ability is poor. In the situation that there are significant numbers of draw calls or primitives in the VR scene, culling calculation time becomes longer. Basically, the calculation phase before base pass is likely to produce GPU bubbles due to rendering thread bottlenecks, reducing the utilization rate of the GPU, and triggering the frame drop. The optimization of rendering threads is of vital importance to VR development.

Figure 4 shows an example of a VR game that is limited to CPU rendering threads. The VR game runs on HTC Vive*, with an average frame rate of 60 fps. Although the GPU appears to be the main performance bottleneck from the console command stat unit, each frame of the rendering thread Draw takes a very long time. From the frame time in SteamVR*, it can be clearly seen that the CPU even has a late start, which means the workload of the rendering thread is very heavy (in SteamVR, the calculation of the rendering thread of a frame starts 3 ms before the vertical synchronization at the beginning of the frame, which is known as the running start. The intention was to use 3 ms extra delay in exchange for the rendering thread to work in advance, so that the GPU would be able to work immediately after vertical synchronization of the frame to maximize the efficiency of the GPU. If a frame of render thread works before the next vertical sync, the 3 ms is not yet finished. It blocks the running start of the next frame, which is called late start. Late start makes the rendering thread work delayed, resulting in the production of the GPU bubble.

In addition, in the frame time of SteamVR, it can be seen that the time used by the GPU is every other frame. Other is higher, and from the following analysis, it can be seen that this is actually the GPU bubble time before the prepass.

If we use GPUView to analyze the scene in Figure 4, we get the result of Figure 5, where the red arrow refers to the time that the CPU rendering thread starts. Because of the running start, the first red arrow starts counting 3 ms before the vertical sync; but when the time reaches the vertical sync, the GPU still has no work to do until it reaches 3.5 ms, where the GPU works shortly, following with 1.2 ms idle. Only after that can the CPU submit the prepass work to the CPU context queue, and 2 ms after the completion of the prepass, the base pass work can be submitted to the CPU context queue to let the GPU execute.

In Figure 5, the locations indicated by the red circles are GPU idle time; the total time (also known as GPU bubbles) adds up to nearly 7 ms, which directly leads to the frame drop, as the GPU rendering is not able to finish within 11.1 ms. As a result, it needs two vertical synchronization cycles to complete the work of this frame. We can combine the WPA analysis on the call stack of the rendering thread during the GPU bubbles and find out which functions cause the bottleneck¹. The second red arrow refers to the location of the start time of the rendering thread for the next frame. Because a frame drop occurred in this frame, the rendering thread of the following frame adds a full vertical synchronization cycle for the calculation.

When the GPU of the next frame starts working after a vertical sync, the rendering thread has filled the CPU context queue, so the GPU has enough work to do without generating GPU bubbles. As long as there are no GPU bubbles a frame rendering in 9 ms will be able to complete, so the next frame will not drop. Three vertical sync cycles are needed to complete the rendering of two frames, which is why the average frame rate is 60 fps.

The analysis from Figure 5 shows that in this example, the GPU is not actually a performance bottleneck; so long as the real bottleneck of the CPU rendering thread is solved, the VR game can reach 90 fps. In fact, we found that rendering thread bottlenecks exist in most development of VR applications with UE4, so familiarization with the following tools for UE4 rendering thread optimization can greatly improve the performance of the VR application.

Screenshot of game
Figure 4: An example of a VR game with a CPU rendering thread bottleneck, which shows the SteamVR* statistics on time consumption of CPU and GPU per frame.

Screenshot of game
Figure 5: The time view of GPUView* for the Figure 4 example; you can see the CPU rendering thread bottlenecks leading to the GPU idle, which triggered the frame drop.

Instanced Stereo Rendering

VR doubles the number of draw calls due to binocular rendering, which can easily lead to rendering thread bottlenecks. Instanced stereo rendering only needs to submit a draw call once as the object, and then, respectively, apply the corresponding transformation matrix on the left and right eye angle by the GPU so that the object can be drawn to the left and right eye angle. This part equals the transfer of CPU work to the GPU processing. It increases the GPU vertex shader work but can save a half draw call; therefore, it typically reduces the rendering thread load, resulting in a performance increase of about 20 percent for VR applications, unless the number of draw calls in the VR scene is low (<500). You can choose to turn on or turn off the instanced stereo rendering in the project settings.

Visibility Culling

Rendering thread bottlenecks in VR applications is usually caused by two major reasons; one is static grid computing and the other is a visual culling. Static grid computing can be optimized by merging draw call or mesh, while visual culling needs to reduce the number of primitives or dynamic occlusion culling.

Visibility culling bottlenecks are particularly severe in VR applications because VR is forced to reduce the delay time by limiting the calculation of the CPU rendering thread for each frame in advance to 3 ms before vertical sync (running start/queue ahead), while in the UE4 InitViews (including visibility culling and setting the dynamic shadow) stage the GPU work is not generated. Once InitViews takes more than 3 ms, it produces GPU bubbles and reduces GPU utilization, likely causing dropped frames, so visibility culling in the VR needs to be the major focus of the optimization.

Visibility culling in UE4 consists of four parts; the sequence, according to the calculation complexity from low to high, is:

Distance culling
View frustum culling
Precomputed occlusion culling
Dynamic occlusion culling, including hardware occlusion queries and hierarchical z‑buffer occlusion

During the design, the best way is to remove the majority of primitives using numbers 1 through 3 culling as much as possible, in order to reduce the InitViews bottleneck, because the computing work of number 4 (dynamic occlusion culling) is much greater than the other three. The following focuses on the interpretation of view frustum culling and precomputed occlusion culling.

View Frustum Culling

In UE4 VR, the view frustum culling of a VR application is only done once, separately, to the right and left eye camera, which means that all primitives must be used twice in the scene to complete the entire view frustum culling. But we can change the UE4 code to implement super-frustum culling⁵, namely the merger of left eye and right eye view frustum to complete view frustum culling and, in one scene, can save the rendering thread roughly half of view frustum culling time.

Precomputed Occlusion Culling

After distance culling and view frustum culling, we can use precomputed occlusion culling to further reduce the number of primitives that need to be sent to the GPU to do dynamic occlusion culling, to reduce the time that the rendering thread is spent processing the visibility culling and, at the same time, to reduce the frame popping phenomenon of the dynamic occlusion system (because the query result of GPU occlusion culling needs one frame before it can returns. It is likely to produce a visibility error when the angle of view is rotating fast or when the object is in the corner attachment).

Precomputed occlusion culling is equivalent to increasing the memory usage and the time to construct the light in exchange for the lower occupation of rendering thread; the larger the memory occupied and the time of pre-stored decoded data will be relatively increased. However, VR scenes are generally smaller relative to traditional games, and most of the objects in the scene are static objects. There is a limit to the user's moveable area, which is a favorable factor for precomputed occlusion culling, and this is also an optimization that must be done for VR application development.

In practice, precomputed occlusion culling would automatically cut the entire scene into the visibility cells of the same size, based on the parameter setting, which covers all the possible locations of the view camera. In the position of each cell, the precomputed occlusion culling stores the primitives in the cell, which will be 100 percent removed. In the actual operation, the look up table (LUT) reads the primitives that are to be removed in the current location. The precomputed occlusions that are stored do not need to do dynamic occlusion culling again in the runtime.

We can use the console command Stat InitViews to see Statically Occluded Primitives to know how many primitives are processed out by precomputed occlusion culling, use Decompress Occlusion to view the decoding time of each frame of stored data, and use Precomputed Visibility Memory in Stat Memory to check memory usage of pre-stored data. Where Occluded Primitives includes the number of primitives that are precomputed and dynamic occlusion culling, increasing the proportion of Statically Occluded Primitives/Occluded Primitives (more than 50 percent) helps to significantly reduce InitViews time. The detailed setup steps and limitations of precomputed occlusion culling in UE4 can be found in^6-7.

Screenshot of precomputed occlusion culling example
Figure 6: Precomputed occlusion culling example.

Static Mesh Actor Merging

The Merge Actors tool in the UE4 can automatically merge multiple static grids into a grid body to reduce the drawing call, and it can be selected in the settings whether or not to merge material, light map, or physical data, according to the actual needs. The setting process can be referred to⁸. In addition, there is another tool in the UE4—the Hierarchical Level of Detail (HLOD)⁹; the difference is that HLOD will only merge objects with distant levels of details (LODs).

Instancing

Achieving the same grid body or object in the scene (such as haystacks or boxes), can be implemented using instanced meshes. It only needs to submit one draw call; the GPU in the drawing will do the corresponding coordinate transformation based on the location of the object. If there are many of the same grids in scenes, instancing can effectively reduce the rendering call of the rendering thread. Instancing can be set in the blueprint (BlueprintAPI -> Components -> InstancedStaticMesh (ISM))¹⁰. If you want to have different LODs for each instantiated object, you can use hierarchical ISM (HISM).

Monoscopic Far-Field Rendering

Limited by interpupillary distance, the human eye has a different sense of objects at different distances. According to the average of 65mm per capita interpupillary distance of the human eye, the strongest distance depth sensation is between 0.75m to 3.5m; depth sensation beyond eight meters is not easy to perceive, and the degree of sensitivity drops when the distance is farther.

Based on this feature, Oculus* and Epic Games* introduced monocular far-field rendering in the forward rendering pipeline of UE 4.15, allowing VR applications to be set to monocular or binocular, depending on the distance of each object to the view camera¹¹. If there are many long-range objects in the scene, these long-range objects can be used to reduce the rendering of the scene and the cost of pixel shading.

For example, each frame of the Oculus Sun Temple scene can reduce the rendering costs by 25 percent with monocular far-field rendering. It is worth noting that the current single-phase far-field rendering in UE4 can only be used on GearVR*; the support of the PC VR will be included with the new version of UE4. The detailed setting method for monocular far-field rendering in UE4 can be found in reference¹². You can also view the contents of a stereoscopic buffer or monoscopic buffer in the control panel by entering the command vr.MonoscopicFarFieldMode^0-4.

Logical Thread Optimization

In the VR rendering pipeline of UE4, the logical thread is calculated one frame earlier than the rendering thread, and the rendering thread will generate a proxy based on the result of the previous frame logical thread and render it accordingly, to ensure that the rendering process does not change; at the same time, the logic thread will be updated, and the update results will be reflected through the next line of the screen to the screen. Since the logical thread is calculated one frame in advance in UE4, the logical thread does not become a performance bottleneck unless the logical thread takes more than one vertical sync period (11.1 ms). But the problem is that in UE4, the logical threads and rendering threads can only run on a single thread, the blueprint in the gameplay; actor ticking, artificial intelligence, and other calculations are handled by the logical thread. If there are more actors or interactions in the scene that cause the logical thread to take more than one vertical synchronization cycle, then it needs to be optimized. Here are two performance optimization techniques for logical threads.

Blueprint Nativization

In the UE4 default blueprint conversion process, you need to use a VM to convert the blueprint into C++ code, during which the cost of the VM will result in performance loss. UE 4.12 introduced a blueprint nativization; all or part of the blueprint (inclusive/exclusive) can be directly compiled into C++ code, to dynamically load as a run-time DLL to avoid VM overhead and improve the efficiency of logical threads. Detailed settings can be found in reference¹³.

It should be noted that if the blueprint itself has been optimized (for example, the calculation of the module directly with C++), blueprint nativization performance improvement is limited. Also, the function UFUNCTION in the blueprint cannot be inline; the function for repeated calls can be used in the blueprint math node (inline) or through a UFUNCTION call inline function. The best way is of course to assign the work directly to other threads^14-15.

Skeleton Meshing

If too many actors cause logic thread bottlenecks in the scene, in addition to lower LOD (skeletal mesh LODs) and animation ticking, you can also use LOD or distance from the nearest hierarchical approach to deal with the interaction between the behavior of actors. Sharing of some skeletal resources among several LODs is another viable option¹⁶.

CPU Differentiation of UE4 VR Application

The above describes several VR application CPU optimization techniques, but optimization can only ensure that VR applications do not drop frames, or cause motion sickness—it cannot further enhance the experience. If you want to enhance the VR experience, you must make the greatest possible use of the computing power provided by the hardware, and translate these computing resources into content, effects, and picture performance to the end user, which requires the CPU to provide the corresponding differentiated content based on computing power. Following are five techniques of CPU differentiation.

Cloth Painting

UE4 Cloth Painting is performed mainly through the work thread assigned by the physical engine. The impact on the logical thread is small. And Cloth Painting is required for each frame to be calculated, even if the cloth is not within the screen display range, and needs to be calculated to determine whether the update will be displayed to the screen, so the calculation is relatively stable. The corresponding Cloth Painting program can be selected according to the adaptation to CPU capacity¹⁷.

Destructible Mesh

In UE4, Destructible Mesh is performed mainly through the work thread assigned by the physical engine; this part can be strengthened if a high performance CPU is available. The results include more objects that can be destroyed, the destruction of more fragments or fragments in the scene, and the existence of a longer time. Destructible Mesh presence will greatly enhance the performance of the scene and more immersive experience; the setting process can refer to¹⁸.

CPU Particles

CPU Particles is a module that is relatively easy to expand, although the number of particles from the CPU is less than that of the GPU. Maximizing the use of CPU multi-cores computing power can reduce the burden on the GPU, and CPU particles come with the following unique features. They can:

Glow
Be set to the particle material and parameters (metal, transparent material, and so on)
Be controlled by a specific gravitational trajectory (can be affected by the point, line, or other particles to attract)
Produce shadows

During the development process, you can set the corresponding CPU particle effect for different CPUs.

Two screenshot for side to side comparison
Figure 7: Particles differentiation in the Hunting Project*.

Steam Audio*

For VR applications, in addition to the screen, another important element for creating immersive experience is the audio. Directional 3D audio is an effective way to enhance the immersive VR experience. Oculus has introduced the Oculus Audio* SDK¹⁹ to simulate 3D audio, but the SDK is relatively simple on the environmental sound simulation, and relatively not popular. Steam Audio*²⁰ is a new 3D audio SDK offered by Valve*, which supports Unity* 5.2 or newer and UE 4.16 or higher version, and provides a C language interface. Steam Audio has the following features:

Provides 3D audio effects based on real physical simulation, supports directional audio filtering for head-related transfer function (HRTF), and ambient sound effects (including sound occlusion, real-world audio transmission, reflection, and mixing sound); also supports access to the inertia data of VR head.
It is possible to set the material and parameters (scattering coefficient, absorption rate for different frequencies, and so on) for each object in the scene. The simulation of environmental sound can be processed in real time or by baking, according to the computing power of the CPU.
Many of the settings or parameters in the ambient sound can be adjusted according to the quality or performance requirements such as HRTF interpolation methods, the number of audio ray traces and the number of reflections, and the form of mixing.
Compared to the Oculus Audio SDK, which only supports the shoebox model and sound masking is not supported, Steam Audio 3D audio simulation is more realistic and complete, providing finer quality control.
Free, and not bound to a VR header or platform.

Steam Audio collects the source and the listener's status and information from the logical process of UE4, and uses the work thread for light tracking and environmental reflection simulation of the sound. The calculated impulse response is then transferred to the audio rendering thread for the corresponding filtering and mixing work of the sound source, and then output by the operating system's audio thread to the headset (such as Windows* XAudio2).

The entire process is done by the CPU threads. While adding 3D audio does not increase the load of rendering thread and logical thread, the performance of the original game will not be affected; thus, it is very suitable for a VR experience optimization. The detailed setup process can be found in the Steam Audio documentation²¹.

Scalability

The scalability setting of UE4 is a set of tools that adjust the performance of the control screen by means of parameters to fit different computing platforms²². For the CPU, the scalability is mainly reflected in the following parameters on the set:

View distance): Distance culling. Distance culling scale ratio (r.ViewDistanceScale 0 – 1.0f)
Shadows: Shadow quality (sg.ShadowQuality 0 - 3)

Screenshot of shade differentiation in a Tencent* VR game
Figure 8: Shade differentiation in the Tencent* VR game Hunting Project* .

Foliage: Number of foliage being rendered each time (FoliageQuality 0 - 3)

Screenshot of foliage differentiation in a Tencent* VR game
Figure 9: Foliage differentiation in the Tencent* VR game Hunting Project*.

Skeletal mesh LOD bias (r.SkeletalMeshLODBias)
Particle LOD bias (r.ParticleLODBias)
Static mesh LOD distance scale (r.StaticMeshLODDistanceScale).

Summary

This article describes a variety of CPU performance analysis tools, optimization methods, and differentiation techniques, based on the limitations of the article. To learn more, refer to the reference section. Proficiency in a variety of CPU performance analysis tools and techniques can quickly find bottlenecks and optimize accordingly, and in fact this is very important for VR applications. In addition, while optimizing the use of idle, multi-threaded resources at the same time, you can make the application to achieve better picture effects and performance, providing a better VR experience.

Reference

Performance Analysis and Optimization for PC-Based VR Applications: From the CPU’s Perspective:
https://software.intel.com/en-us/articles/performance-analysis-and-optimization-for-pc-based-vr-applications-from-the-cpu-s
2 Unreal Engine Stat Commands: https://docs.unrealengine.com/latest/INT/Engine/Performance/StatCommands/index.html
Unreal Engine 3 Console Commands: https://docs.unrealengine.com/udk/Three/ConsoleCommands.html
GPUView: http://graphics.stanford.edu/~mdfisher/GPUView.html
The Vanishing of Milliseconds: Optimizing the UE4 renderer for Ethan Carter VR: http://www.gamasutra.com/blogs/LeszekGodlewski/20160721/272886/The_Vanishing_of_Milliseconds_Optimizing_the_UE4_renderer_for_Ethan_Carter_VR.php
Precomputed Visibility Volumes: http://timhobsonue4.snappages.com/culling-precomputed-visibility-volumes
Precomputed Visibility: https://docs.unrealengine.com/udk/Three/PrecomputedVisibility.html
Unreal Engine Actor Merging: https://docs.unrealengine.com/latest/INT/Engine/Actors/Merging/
Unreal Engine Hierarchical Level of Detail: https://docs.unrealengine.com/latest/INT/Engine/HLOD/index.html
Unreal Engine Instanced Static Mesh: https://docs.unrealengine.com/latest/INT/BlueprintAPI/Components/InstancedStaticMesh/index.html
Hybrid Mono Rendering in UE4 and Unity: https://developer.oculus.com/blog/hybrid-mono-rendering-in-ue4-and-unity/
Hybrid Monoscopic Rendering (Mobile): https://developer.oculus.com/documentation/unreal/latest/concepts/unreal-hybrid-monoscopic/
Unreal Engine Nativizing Blueprints: https://docs.unrealengine.com/latest/INT/Engine/Blueprints/TechnicalGuide/NativizingBlueprints/
Unreal Engine Multi-Threading: How to Create Threads in UE4: https://wiki.unrealengine.com/Multi-Threading:_How_to_Create_Threads_in_UE4
Implementing Multithreading in UE4: http://orfeasel.com/implementing-multithreading-in-ue4/
Unreal Engine Skeleton Assets: https://docs.unrealengine.com/latest/INT/Engine/Animation/Skeleton/
Unreal Engine Clothing Tool: https://docs.unrealengine.com/latest/INT/Engine/Physics/Cloth/Overview/
How to Create a Destructible Mesh in UE4: http://www.onlinedesignteacher.com/2015/03/how-to-create-destructible-mesh-in-ue4_5.html
Oculus Audio SDK Guide: https://developer.oculus.com/documentation/audiosdk/latest/concepts/book-audiosdk/
A Benchmark in Immersive Audio Solutions for Games and VR: https://valvesoftware.github.io/steam-audio/
Download Steam Audio: https://valvesoftware.github.io/steam-audio/downloads.html
Unreal Engine Scalability Reference: https://docs.unrealengine.com/latest/INT/Engine/Performance/Scalability/ScalabilityReference/

About the Author

Wenliang Wang is a senior software engineer for Intel's Software and Services Group. He works with VR content developers for Intel CPU performance optimization and differentiation, sharing the CPU optimization experience to make more efficient use of the CPU. Wenliang is also responsible for the implementation, analysis, and optimization of multimedia video codecs and real-time applications, and has over 10 years of experience in video codecs, image analysis algorithms, computer graphics, and performance optimization, and has published numerous papers in the industry. Wenliang graduated from the Department of Electrical Engineering and Communication Engineering Research Institute of Taiwan University.