Download [PDF 2MB]
Intro:
F1* 2015 is the latest FORMULA ONE* game produced by Codemasters, based on a unique iteration of their proprietary EGO engine. The game was created almost entirely from scratch with the new game engine dramatically improving both the visual quality and AI abilities. The F1 2015 engine is Codemasters’ first to target the eighth generation of consoles (PS4* and Xbox One*) and PCs. The new engine architecture was designed to run on the multiple cores found in the consoles with the intention that it would scale to work equally well on the PC. A patch in November 2015 that added an updated audio system, and together with higher quality settings for the CPU-driven particle system, makes full use of high-end gaming PCs. This article details most of the work done for the patch.
Figure 1:F1* 2015
Adapting to PC hardware
The ability of game code to adapt to different underlying CPU hardware is becoming increasingly important as hardware evolves to encompass a wider range of clock speeds and core counts. Games designed when PCs typically came with single and dual core processors or for earlier console generations were normally designed around a limited number of threads. These primary threads were dedicated to rendering and the main game logic, while smaller tasks were distributed to any other processors in the system. On the PC this design method favored processors with very high single-threaded performance. Today both eighth-generation consoles have eight individual, relatively low-performance, high-efficiency x86 CPU cores with the game code using between six and seven of those cores (the remainder being reserved for the OS). This compares with modern consumer PCs that have anywhere between two and eight CPUs. If the core supports simultaneous multi-threading (SMT), each PC core can appear to the OS as two logical processors with technologies like Intel® Hyper-Threading Technology (Intel® HT Technology)i. This means a PC game can be running on up to 16 logical processors shared with the OS. Therefore, the old approach isn’t suitable for newer hardware, and while the code for consoles can be optimized to run on a very specific configuration, on a PC the game code is required to adapt to the hardware.
First PC Tests
As the game neared its July 10, 2015 release date, it became clear that the new engine had succeeded in its goal of more efficiently utilizing multiple CPU cores and had very different performance characteristics compared to previous FORMULA ONE titles that had been designed for the last generation of consoles. Figure 2 shows how the CPU and GPU workloads compare between Codemasters’ previous FORMULA ONE title, F1 2014 (left) and F1 2015(right), running at their highest quality settings at an unlocked frame rate.
Figure 2: GPUView on F1* 2014 vs F1* 2015.
The images are captured using Microsoft GPUView (included in the Windows* Performance Toolkit as part of the Windows Platform SDK) and show how well parallelized the work is between the CPU and the GPU. GPUView can also be used to detect synchronization and GPU starvation issues. The top section of the graph represents the GPU activity and a close up is shown in Figure 3. The gaps in the Purple line at the top left show breaks in the GPU activity on F1 2014, which mean the GPU is idling and not doing any actual work. The F1 2014 game is not GPU-limited on the high-end system tested, and at a resolution of 1080P it was CPU-bound.
Figure 3: Close up of GPU activity on the F1* 2014 and F1* 2015 games
The corresponding line on the right in Figure 3 (in blue) is solid, meaning the GPU is constantly active in F1 2015. In Figure 2 the lines below the Yellow separator represent the CPU thread activity; on F1 2014 (left), a single CPU thread is almost constantly busy and is the limiting factor. A close up of this is also shown below in Figure 4.
Figure 4: CPU activity
The different colors used for the bars represent the logical processors in the system. The changing color of the threads means they tend to run on a different logical processors on each frame. Another active thread can be seen at the bottom of the F1 2014 graph, and it corresponds to the graphics driver workload; while on the image on the right-hand side (F1 2015) all the CPU threads have some idle time and there is no obvious dependency between threads. When the GPU and CPU views are combined you can see that F1 2014 was limited by a critical path through the engine that feeds the graphics API from a single thread. Thus, F1 2014 represents the classic engine designed to run on previous consoles, with a workload optimized to run well on a fast dual-core processor with some benefits from moving to a quad core, but adding more than 4 cores didn’t provide any tangible benefit as the limiting factor is the single main rendering thread.
The new engine on the other hand significantly reduced the CPU overhead associated with keeping the graphics card busy, making full use of Intel HT Technology and distributing the work evenly across logical processors.
Figure 5: GPUView of F1* 2015 at 60FPS
The coding team quickly saw that the PC was significantly faster than the console hardware for which they were doing much of their optimizing. While it had taken a lot of effort to achieve typical frame processing at 60FPS (16 ms of processing) on the consoles, on 4th generation Intel® Core™ processor (such as the Intel® Core™ i5-4670) the CPU was completing the work in a fraction of a frame, in many cases under 10ms. Figure 5 (above) shows the GPUView when the title is limited to a 60Hz refresh rate. Faster processors (e.g., 6th generation Intel® Core™ i7-6900K processors) made the imbalance even greater. Figure 6 shows the game running at an unlocked frame rate on a 5th generation Intel® Core™ i7-5960x processor with an NVIDIA GTX980 video card.
Figure 6: Unlocked GPUView on Intel® Core™ i7-5960x processor with 8 cores (16 threads)
When compared to the quad core system (Figure 2), the system with the Intel Core i7-5960x processor with 16 logical processors was idling the CPU for a larger percentage of the frame. This shows that the new engine benefits not only from increased single-threaded performance, but is also capable of benefiting from additional CPU cores. The result shows that even at slightly lower CPU frequencies the Intel Core i7-5960x processor with six cores can outperform the higher clocked Intel Core i7-6700k processor (quad core).
These initial tests showing the GPU was already being fully utilized shifted the emphasis on the PC. GPU-side optimizations continued, but instead of spending developer resources optimizing the CPU rendering path (such as moving work to the GPU), the studio started investigating other ways to better utilize the CPU to improve user experience and improve the realism of the game.
Improving Realism
The challenge was to improve the game’s realism in ways that benefit users without affecting gameplay due to online multi-player requirements and without adding significantly more GPU-side work. As such, AI changes and improved physics accuracy were ruled out, as any improvements had to be achieveble on all PC HW that would be used to play the game to ensure multiplayer experience wouldn’t be compromised. Even in single-player mode any changes to the cars’ behavior would be problematic, requiring very careful rebalancing of the cars' handling. Instead, Codemasters concentrated on improving realism in two areas. Specifically, an upgraded audio engine and an increase in the amount of dynamic visual content via an upgraded particle system. These systems were chosen because they were already CPU-based and were previously limited by their console resource budget. The PC gave the designers room to create a more immersive experience using many of the effects they had originally been prevented from doing because of hardware limitations.
Audio
Improving the audio was seen as a scalable way to enhance the user experience without affecting gameplay. The audio in F1 2015 is handled by a middleware package that creates its own thread for mixing audio on the CPU. Codemasters previously found that if this thread was stalled or delayed, it would get audible dropouts. To prevent dropouts on the consoles, this audio thread would get a CPU core dedicated to itself to ensure nothing could delay processing. On the PC, the audio system had a dedicated logical processor with the game task system using the remaining cores. Consoles force the thread affinity to the worker threads, whereas the PC uses SetThreadIdealProcessor(), which helps the OS with prioritization.
Even with a dedicated core/logical processor, it was important to complete the mixing at a sufficient rate, and so the maximum number of audio voices was limited by the amount that could be processed in a worst case scenario, such as a crash. The limit for the audio was originally set to 5 cars plus the player’s.
With a significantly more powerful processor to handle the CPU mixing and potentially more CPU cores, the OS was less likely to attempt to schedule additional jobs on the processor assigned to mix the audio. Correspondingly, a high-quality audio option could be added to the PC version that had the following improvements:
- An increase in the number of cars contributing audio around the player from 6 (5 AI cars + player car) to 11 (10 AI cars + player car).
- Removal of some instance limits so more voices would play when transitioning from one object to another, avoiding sudden cut-offs.
- A replacement of some middleware reverb effects with more advanced ones, using more sophisticated algorithms, more reflections, and the use of pre-defined impulse files to simulate environments such as grandstands, bridges, tunnels, and the track side barriers together with passing cars.
Particles and Weather
Another improvement was an upgraded particle system. The particle system had been limited by the available CPU/GPU resources on the consoles, and art work had been authored within these constraints. The particle system was already CPU-based as Codemasters’ graphics programmer, Andrew Wright explains:
“We anticipated (correctly) that we would be tighter on GPU time than on CPU time, so the particles system was always designed as an efficient heavily vectorized CPU system.”
This meant that the coding team had already started on a system that could scale based on the amount of CPU resources available. What’s more, they could do so in a way that didn’t necessarily increase the GPU work at the same time. Keeping the particles on the CPU also had other benefits, Andrew explains:
“The CPU system is very versatile, and it handles collision against the track for particles flagged for that – mostly stuff like gravel and grass. This works for particles that are not visible, so a swift change of camera will catch previously invisible particles in mid-bounce. Collisions can trigger sound effects. This part might be hard on a GPU. “
The first part of the task was to increase the amount of generated particles with small and subtle changes to their art content, such as reducing the size of the particles as the particle density increased. These changes give a similar visual effect from a distance, but show much more detail up close. This was done for the various kick-up effects created when the car tires interact with the track (both on- and off-track surfaces). This effect is shown in Figure 7.
Figure 7: Improved gravel effects
In a similar fashion, the car tire smoke effects were improved with relatively large “billboard” particles being replaced by much smaller particles that could better reflect the shape of the smoke, which worked particularly well. The effect improved the volumetric lighting applied to the smoke as the smaller particles allowed a better mathematical representation of the light fallout within the volume, as shown in Figure 8.
Figure 8: Improved tire smoke
Although the improvements to the particle systems were significant, they were only visible for short periods of time; for example, when the player or AI lost control of the car. The studio quickly realized that the enhanced particle system could be used to significantly improve wet weather effects--a part of the game that didn’t depend on the player’s skill level to showcase the improved visuals.
One of the main upgrades Codemasters promoted was the game’s improved handling, with significant improvements to the cars’ behavior in wet conditions. Wet weather plays a significant part of the real Formula 1 calendar, with many races renowned for the extreme weather conditions that affect the race. Table 1 shows the probability of rain in a race in the game's Champion and Pro season modes. On average 34% of the game’s races will be affected by rain for at least some part of a four-hour race.
The change to smaller, denser particles meant the water behavior could be modeled much more accurately. The particle system handles data curves over particle lifetime for properties like color, alpha, erosion, angular drag, linear drag, and gravity; it even ties in with the same wind system as the rain.
In Figures 9 and 10 you can see debug images showing the movement of the individual spray from the car wheels and its interaction with the air passing over the car surface. The effect is to create vortices spiralling off the back of the car
Figure 9: Debug vortices showing water spray movement
Figure 10: Debug vortices rear view
Using smaller particles also meant the existing lighting model worked much better on the spray. Lighting from emitters like the car engines could be much more accurately modelled on small particles than “billboards” representing a large volume of spray. Figure 11 shows the type of lighting seen in the spray trails.
Figure 11: Better lighting
Another upgraded area was the interaction with the surface water, including the amount of water kicked up into the trailing vortices that gets sprayed up from the contact points between the wheel and track surface. This helps visually ground the car in much the same way shadows do on a sunny day.
Figure 12 shows the before and after for the puddle interaction.
Figure 12: Puddle interaction
Figure 13 shows the track spray laid down behind each car (left) and a debug view taken from Intel® INDE Graphics Performance Analyzers for DirectX* (right).
Figure 13: Track spray
The final upgrade was an improved rain simulation. Originally, the game used a simple GPU-based algorithm that rendered a few thousand rain particles per frame, applying gravity to each rain drop as they fell. The new rain simulation moved more of the update routine to the CPU allowing interaction with the wind data used in other parts of the game. The number of rain drops was increased by a factor of 10x, reducing the size of the individual streaks to keep pixel coverage fairly similar. In Figure 14 the debug images are shown on the right of the rain taken from the same starting location. The lower image has 217K rain primitives compared to just 21K for the old system above. Despite the extra primitives, the actual number of pixels affected increased only slightly from 72k to 119K.
Figure 14: Rain Debug View
When increasing the amount of rain and water vortices, care was also taken to adjust the transparency values used in the particle system to produce a similar overall distance fogging effect to ensure gameplay wasn’t altered by the visual settings.
Rebalancing the CPU Workload
Normally, F1 2015's limiting factor in PC performance was the GPU workload. This load was especially heavy on lower-end video cards. The most important part of developing and enabling these new effects was to ensure that these extra particles did not increase the GPU workload. The second resource consideration was to achieve good distribution of the CPU calculations, balancing the work across the available CPU resources and ensuring there was no increase in the work for the critical render path in the engine.
The GPU load control was achieved in two ways. First, vertex work was moved from the vertex shader to the CPU. This reduced the amount of work done per vertex on the GPU. It did not completely remove the GPU workload, but it was reduced by a factor of two. So rendering ten times as many particles only resulted in a five-fold increase in the vertex processing cost. The second large change was to reduce fill rate costs, such as with rain. A ten times increase in raindrops resulted in an increase of only 1.65x in actual rendered pixels. In the case of the vortex trails behind the cars the changes were even more pronounced. An increase in vertices from 3K to 70K actually resulted in a pixel drop of 5800K to 2500k, effectively halving the fill rate cost of the effect. The end result was an effect made up of 20 times the number of particles with no significant increase in GPU rendering cost.
Load balancing on the CPU was done by distributing the particle work across as many available logical processors as possible. Figure 15 shows the distibution of work on four-core and six-core systems (both with Intel HT Technology enabled). The purple and red blocks represent the particle and weather systems during very heavy rain on a six-core system (twelve logical processors). The particle work is done on 6 of those 12 processors, while on the four-core system, the work uses five out of the eight logical processors, but has to share those five with other tasks in the engine's task system.
Figure 15: CPU load balancing, 6 core system (left) and 4 core (right)
The F1 2015 engine uses a task-based system for its work distribution, with different task scheduling predefined for 2, 4, 8, 12, and 16 logical processor systems. The system tries to schedule tasks to reduce any dependencies.
Visual Comparisons
You can see in Figure 16 a side-by-side comparison of the original (left) and ultra particle systems (right). Note the improvements to the car vortices, with the spray more clearly tied to the individual cars that are generating it.
Figure 8: Car vortices
Improvements to the spray that the cars kick up when they lose grip on the track are clearly shown in Figure 17, with individual spray droplets visible against the F1 2015 logo on the higher quality settings. It’s also possible to see the improvements to the tire's interaction with the track.
Figure 9: Skid
Performance scaling
F1 2015 includes a built-in benchmark that uses the game's current graphics settings and can be configured to run under clear or stormy conditions. Figure 19 shows the benchmark performance numbers recorded on an Intel Core i7-5960x processor, running at a fixed 3.0ghz (its base frequency) using a NVIDIA TitanX* video card. Figure 20 shows the settings we used. The numbers reported are the average frame rate for the full benchmark run.
Figure 10: Multi-core Performance scaling
Figure 11: Multi-core Benchmark Settings
The number of physical cores was modified in the BIOS with all other hardware settings untouched. The tests were then repeated with Intel HT Technology disabled.
The game wasn’t run on a system with just 2 cores and no Intel HT Technology as this was below the game's listed minimum system requirements. In general, enabling Intel HT Technology increased performance approximately equal to adding 2 more cores to the PC, moving from two cores with Intel HT Technology to four cores with Intel HT Technology gave an increase of 79% in the game frame rate, while moving to a six-core system with Intel HT Technology gave an additional 17%. When utilizing the full 8 cores on the Intel Core i7-5960x processor, the benchmark numbers showed a 27% performance increase compared to the same system running with 4 cores and Intel HT Technology.ii
Given that the same system was used for all tests, the performance gains are a measure of how well the game's workload is threaded and how well the system configures the work based on the number of available logical processors. In all tests the game benefits from the large 20mb even when running with a reduced core count. The performance numbers can’t be compared directly to a retail four core system due to the differences in frequency and cache. Because of the large cache in the Intel Core i7-5960x processor, performance is better than a similarly clocked system with four cores.
Conclusions
F1 2015 leads the industry in showing how a modern CPU can be used to improve a game’s audio and visuals. Balancing the load between the CPU and GPU can bring better visuals to a game without a corresponding need to upgrade the GPU. Through the use of a CPU-based particle system, Codemasters was able to add visuals that complemented the other work performed on F1 2015 that improved the cars’ handling, especially in wet conditions with a set of state-of-the-art weather effects and closely ties the new visuals to the game physics systems. By making the game engine fully utilize both Intel HT Technology and all available CPU cores, significant performance gains were realized, providing smooth gameplay even with challenging visuals.
About the Author
Leigh Davies is a Senior Application Engineer at Intel with over 15 years of programming experience in the PC gaming industry, originally working with several developers in the UK and then with Intel. He is currently a member of the European Visual Computing Software Enabling Team, providing technical support to game developers. Over the last few years Leigh has worked on a wide variety of technology enabling areas from graphics techniques (optimization, Order Independent Transparency, and Adaptive Volumetric Shadow Mapping) to multi-core scaling, plus enabling platform optimizations such as touch and sensor controls. For the last two years Leigh has worked on Windows (DirectX 11 and 12) and Android* (GLES 3.1).
Codemasters Credits
Below is a list of the people, who directly contributed to the multi-core optimizations and visual enhancements, performed for the PC platform.
Codemasters F1 Team:
Tom Hammersley, Principal Programmer
Leigh Bradburn, Principal Programmer
Andrew Wright, Principal Programmer
David Larsson, Experienced Programmer
Andrew Stewart, Senior VFX Artist
Adrian Smith, Principle Programmer
Lars Hammer, Senior Programmer
Russell Wood, Senior Programmer
Craig Hupin, Experienced Programmer
David Beirne, Experienced Programmer
Glenn McDonald, Senior Level Designer
Ricky O’Toole, Level Designer
With thanks to:
Richard Kettlewell
Robert Rodriguez
Ben Pottage
Peter Tolnay
References
http://www.formula1-game.com/us/home
https://en.wikipedia.org/wiki/Hyper-threading
https://software.intel.com/en-us/gpa
https://dev.windows.com/en-us/downloads/windows-10-sdk
i Intel technologies may require enabled hardware, specific software, or services activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer.
ii Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.