Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Optimizing Android* Game mTricks Looting Crown on the Intel® Atom™ Platform

$
0
0

Abstract

Games for smartphones and tablets are the most popular category on app stores. In the early days, mobile devices had significant CPU and GPU constraints that affected performance. So most games had to be simple. Now that CPU and GPU performance has increased, more high-end games are being produced. Nevertheless, a mobile processor still has less performance than a PC processor.

With the growth in the mobile market, many PC game developers are now making games for the mobile platform. However, traditional game design decisions and the graphic resources of a PC game are not a good fit for mobile processors and may not perform well. This article shows how to analyze and improve the performance of a mobile game and how to optimize graphic resources for a mobile platform, using mTricks Looting Crown as an example. The looting crown IA version is now released with the following link.

https://play.google.com/store/apps/details?id=com.barunsonena.looting

mTricks Looting Crown
Figure 1. mTricks Looting Crown

1. Introduction

mTricks has significant experience in PC game development using a variety of commercial game engines. While planning its next project, mTricks forecasted that the mobile market was ready for a complex MMORPG, given the performance growth of mobile CPUs and GPUs. So it changed the game target platform for its new project from the PC to mobile.

mTricks first ported the PC codebase to Android*. However, the performance was less than expected on the target mobile platforms, including an Intel® Atom™ processor-based platform (code named Bay Trail).

mTricks was encountering two problems that often face PC developers who transition to mobile:

  1. The low processing power of the mobile processor means that traditional PC graphic resources and designs are unsuitable.
  2. Due to capability and performance variations among mobile CPUs and GPUs, game display and performance vary on different target platforms.

2. Executive summary

Looting Crown is SNRPG (Social Network + RPG) style game, supporting full 3D graphics and various multi-play modes (PvP, PvE and Clan vs Clan). mTricks developed and optimized on a Bay Trail reference design, and the specification is listed in Table 1.

Table 1. Bay Trail reference design specification and 3DMark score

 Bay Trail reference design 10”
CPUIntel® Atom™ processor Quad Core 1.46 Ghz
RAM2GB
Resolution2560 x 1440
3DMark ICE Storm Unlimited Score15,094
Graphics score13,928
Physics score21,348

mTricks used Intel® Graphics Performance Analyzers (Intel® GPA) to find CPU and GPU bottlenecks during development and used the analysis to solve issues of graphic resources and performance.

The baseline performance was 23 fps, and Figure 2 shows GPU Busy and Target App CPU Load statistics during a 2 minute run. The average of GPU Busy is about 91%, and the Target App CPU Load is about 27%.

Intel® GPA System Analyzer
Figure 2. Comparing CPU and GPU load of the baseline version with Intel® GPA System Analyzer

3. Where is the bottleneck between CPU and GPU?

There are two ways to know where the bottleneck is between CPU and GPU. One is to use an override mode, and the other is to change CPU frequency.

Intel GPA System Analyzer provides the “Disable Draw Calls” override mode to help developers find where the bottleneck is between CPU and GPU. After running this override mode, compare each result with/without the override mode and check the following guidelines:

Table 2. How to analyze games with Disable Draw Calls override mode

Performance change for “Disable Draw Calls” override modeBottleneck
If FPS doesn’t change muchThe game is CPU bound; use the Intel® GPA Platform Analyzer or Intel® VTune™ Amplifier to determine which functions are taking the most time
If FPS improvesThe game is GPU bound; use the Intel GPA Frame Analyzer to determine which draw calls are taking the most time

Intel GPA System Analyzer can simulate the application performance with various CPU settings, which is useful for bottleneck analysis. To determine whether your application performance is CPU bound, do the following:

  1. Verify that your application is not Vertical Sync (Vsync) bound.
    Check the Vsync status. Vsync is enabled if you see the gray highlight  mTricks vsync in the Intel GPA System Analyzer Notification pane.
    • If Vsync is disabled, proceed to step 2.
    • If Vsync is enabled, review the frame rate in the top-right corner of the Intel GPA System Analyzer window. If the frame rate is around 60 FPS, your application is Vsync bound, and there is no opportunity to increase FPS. Otherwise, proceed to step 2.
  2. Force a different CPU frequency using the sliders in the Platform Settings pane (Figure 3) of the Intel GPA System Analyzer window. If the FPS value changes when you modify the CPU frequency, the application is likely to be CPU bound.

Platform Settings pane
Figure 3. Modify the CPU frequency in the Platform Settings pane

Table 3 shows the simulation results for Looting Crown. With “Disable Draw Calls” override on, the FPS remained unchanged. This would normally indicate the game was CPU bound. However, the “Highest CPU freq” override also didn’t change FPS, implying that Looting Crown was GPU bound. To resolve this, we returned to the data in Figure 2, which showed that the GPU load was about 91% and CPU load was about 27% on the Bay Trail device. The CPU could not be utilized well due to the GPU bottleneck. We proceeded with the plan to optimize the GPU usage first and then retest.

Table 3. The FPS result of the baseline version with Disable Draw Calls and Highest CPU Frequency.

Bay Trail deviceFPS
Original23
Disable Draw Calls23
Highest CPU freq.23

4. Identifying GPU bottlenecks

We found that the performance bottleneck was in the GPU. As a next step, we analyzed the cause of the GPU bottleneck with Intel GPA Frame analyzer. Figure 4 shows the captured frame information of the baseline version.

 Intel® GPA Frame Analyzer
Figure 4. Intel® GPA Frame Analyzer view of the baseline version

4.1 Decrease the number of draw calls by merging hundreds static mesh into one static mesh and using bigger texture.

4 and 5 show the information captured by Intel GPA Frame analyzer.

Table 4. The captured frame information of the baseline version

Total Ergs1,726
Total Primitive Count122,204
GPU Duration, ms23 ms
Time to show frame, ms48 ms

Table 5. Draw call cost of the baseline version

TypeErgTime(ms)%
Clear00.2 ms0.5 %
Ocean16 ms13.7 %
Terrain2~97720 ms41.9 %
Grass19~97718 ms39.0 %
Character, building and effect978~167619 ms40.6 %
UI1677~17251 ms3.4 %

Total time of “Terrain” is 20 ms while the time of “Grass” in the “Terrain” is 18 ms. It’s about 90% of “Terrain” processing time. So we analyzed further to see why it takes a lot of time for “Grass” processing.

Figures 5 and 6 show the output of the ergs for “Terrain” and “Grass”.

the terrain
Figure 5. The terrain

texture of grass
Figure 6. Texture of “Grass”

Looting Crown drew the terrain by drawing a small grass quad repeatedly. So the number of draw calls in “Terrain” was 960. The drawing time of one small grass is very small; however, the draw call itself has overhead, which makes it an expensive operation. So we recommended to decrease the number of draw calls by merging hundreds of static mesh into one static mesh and using bigger texture. Table 6 shows the changed result.

Table 6. Comparison of draw cost between small and big texture

Small texture, ms18 ms
Number of ergs960
Big texture, ms6 ms
Number of ergs1

the changed terrain
Figure 7. The changed terrain

Though we simplified, the tile-based terrain required a lot of draw calls, so we decreased the number of draw calls and saved 12 ms on drawing the “Grass”.

4.2 Optimizing graphics resources

Tables 7 and 8 show the new information captured by Intel GPA Frame analyzer after applying the big texture for grass.

Table 7. The captured frame information of the 1st optimization version

Total Ergs179
Total Primitive Count27,537
GPU Duration, ms24 ms
Time to show frame, ms27 ms

Table 8. Draw call cost of the 1st optimization version

TypeErgTime(ms)%
Clear02 ms10.4 %
Ocean186 ms23.6 %
Terrain1~17, 19, 23~9614 ms54.3 %
Grass196 ms23.2 %
Character, building and effect20~22, 97~1311 ms5.9 %
UI132~1781 ms5.7 %

We checked if the game is still GPU bound. We did the same measurement with “Disable Draw Calls” and “Highest CPU Frequency” simulation.

Table 9. The FPS result of 1st optimization version with “Disable Draw Calls” and “Highest CPU Frequency”

Bay Trail deviceFPS
Original40
Disable Draw Calls60
Highest CPU freq.40

In Table 9, “Disable Draw Calls” simulation increased the FPS number while “Highest CPU Frequency” simulation didn’t change the FPS number. So, we knew Looting Crown was still GPU bound. And we also checked CPU load and GPU Busy again.

 Intel® GPA System Analyzer
Figure 8. CPU and GPU load of the 1st optimization version with Intel® GPA System Analyzer

Figure 8 shows GPU load is about 99% and CPU load is about 13% on Bay Trail. CPU still could not be a source of speedup due to GPU bottleneck on Bay Trail.

Looting Crown was originally developed for PCs, so the existing graphic resources were not suitable for mobile devices, which have lower GPU and CPU processing power. We did several optimizations to the graphic resources as follows.

  1. Minimizing Draw Calls
    1. Reduced the number of materials: The number of object materials was reduced from 10 to 2.
    2. Reduced the number of particle layers.
  2. Minimizing the number of polygons
    1. Applied LOD (level of detail) for characters using the “Simplygon” tool.
      progressively reduced LOD

      Figure 9. A character with progressively reduced LOD

    2. Minimized number of polygons used for terrain: First, we minimized the number of polygons for faraway mountains that did not require much detail. Second, we minimized the number of polygons for flat terrain that could be represented by two triangles.
  3. Using optimized light maps
    1. Removed the dynamic lights for “Time of Day”.
    2. Minimized the light map size of each mesh: Reduced the number of light maps used for the background.
  4. Minimizing the changes of render states
    1. Reduced the number of materials, which also reduced render state changes and texture changes.
  5. Decoupling the animation part in static mesh
    1. Havok engine didn’t support a partial update of an animated part of an object. An object with only a small moving mesh was being updated even for the static mesh part of the object. So, we separated the animated part (smoke, red circle on Figure 10) from the rest of the object, dividing it into two separate object models.

decoupled animation
Figure 10. Decoupled animation of the smoke from the static mesh

4.3 Apply Z-culling efficiently

When an object is rendered by the 3D graphics card, the three-dimensional data is changed into two-dimensional data (x-y), and the Z-buffer or depth buffer is used to store the depth information (z coordinate) of each screen pixel. If two objects of the scene must be rendered in the same pixel, the GPU compares the two depths. The GPU overrides the current pixel if the new object is closer to the observer. So Z-buffer will reproduce the usual depth perception correctly. The process of Z-culling is drawing the closest objects first so that a closer object hides a farther one. Z-culling provides performance improvement on rendering of hidden surfaces.

In Looting Crown, there were two kinds of terrain drawing: Ocean drawing and Grass drawing. Because large portions of ocean were behind grass, lots of ocean areas were hidden. However, the ocean was rendered earlier than grass, which prevented efficient Z-culling. Figures 11 and 12 show the GPU duration time of drawing ocean and grass, respectively; erg 18 is for ocean and erg 19 is for grass. If grass is rendered before ocean, then the depth test would indicate that the ocean pixels would not need to be drawn. It would result in decreased GPU duration of drawing ocean. Figure 13 shows the ocean drawing cost on the second optimization. The GPU duration decreased from 6 ms to 0.3 ms.

ocean drawing cost first optimization
Figure 11. Ocean drawing cost of 1st optimization

grass drawing cost of first optimization
Figure 12. Grass drawing cost of 1st optimization

Ocean draw cost of second optimization
Figure 13. Ocean draw cost of 2nd optimization

Results

By taking these steps, mTricks changed all graphics resources to be optimized for mobile device without compromising graphics quality. Erg numbers were decreased from 1,726 to 124; Primitive count was decreased from 122,204 to 9,525.

mTricks Looting Crown
Figure 14. The change of graphics resource

Figure 15 and Table 10 show the outcome of all these optimizations. After optimizations, FPS changed from 23 FPS to 60 FPS on the Bay Trail device.

FPS Increase
Figure 15. FPS Increase

Table 10. Changed FPS, GPU Busy, and App CPU Load

 Baseline1st Optimization2nd Optimization
FPS23 FPS45 FPS60 FPS
GPU Busy(%)91%99%71%
App CPU Load(%)27%13%22%

After the first optimization, Bay Trail still was GPU bound. We did the second optimization to reduce the GPU workload by optimizing the graphic resources and z-buffer usage. Finally the Bay Trail device hit the maximum (60) FPS. Because Android uses Vsync, 60 FPS is the maximum performance on the Android platform.

Conclusion

When you start to optimize a game, first determine where the application bottleneck is. Intel GPA can help you do this with some powerful analytic tools.If your game is CPU bound, then Intel VTune Amplifier is a helpful tool. If your game is GPU bound, then you can find more detail using Intel GPA.To fix GPU bottlenecks, you can try to find an efficient way of reducing draw calls, polygon count, and render state changes. You can also check the right size of terrain texture, animation objects, light maps, and the right order of z-buffer culling.

About the Authors

Tai Ha is an application engineer focusing on enabling online games in APAC region. He has been working for Intel since 2005 covering Intel® Architecture optimization on Healthcare, Server, Client, and Mobile platforms. Before joining Intel, Tai worked for biometric companies based in Santa Clara, USA as a security middleware architect since 1999. He received his BS in Computer Science from Hanyang University, Korea.

Jackie Lee is an Applications Engineer with Intel's Software Solutions Group, focused on performance tuning of applications on Intel® Atom™ platforms. Prior to Intel, Jackie Lee worked at LG in the electronics CTO department. He received his MS and BS in Computer Science and Engineering from The ChungAng University.

References

The looting crown IA version is now released on Google Play:

https://play.google.com/store/apps/details?id=com.barunsonena.looting

Intel® Graphics Performance Analyzers
https://software.intel.com/en-us/vcsource/tools/intel-gpa

Havok
http://www.havok.com

mTricks
https://www.facebook.com/mtricksgame

Intel, the Intel logo, and Atom are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2014 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>