Quantcast
Channel: Intel Developer Zone Articles
Viewing all 3384 articles
Browse latest View live

Enhancing VR Immersion with the CPU in Star Trek™: Bridge Crew

$
0
0

View PDF [2MB]

Introduction

Traditionally in games, the CPU is not usually considered a large contributor to immersive and visually striking scenes. In the past, some few games exposed settings to users, allowing them to adjust CPU usage, but many developers believe it’s more trouble than it’s worth to implement multiple tiers of CPU based systems for different hardware levels. With this article, and the awesome work showcased in Star Trek™: Bridge Crew* in a partnership with Ubisoft’s Red Storm Entertainment*, we’re aiming to fix that misconception. Virtual Reality (VR) is a segment where the combination of environmental interaction, enhanced physics simulation, and destruction can become the frosting on the cake that keeps a player in your game and wanting more. Given the low-end hardware specifications required for Oculus*, it becomes more important than ever to eliminate the idea of keeping CPU work tailored to the minimum specification globally. Leveraging available system resources to enhance dynamism and immersion will help you create your ideal game while allowing as many players access as possible, and we’ve made it easier than ever.

This article will take you through each of the CPU-intensive features implemented in Star Trek™: Bridge Crew* with instructions on utilizing the systems they’ve been built upon. This is followed by with a brief section dedicated to determining how much CPU work is too much for each performance tier. The final section shows how to easily set up CPU performance categories to auto-detect where your end user’s hardware level sits.

Star Trek™: Bridge Crew* was built using Unity*, which will be the focus of this article, but all the concepts apply to other engines as well.

Check out the following video to see side-by-side comparisons of the game running these effects.

CPU Intensive Features in Star Trek™: Bridge Crew*

Bridge Damage – Combination of Physics Particles, Rigidbody Physics, and Realtime Global Illumination (GI)

Overview

The bridge of the USS Aegis is one of the main focal points of the game. Virtually all gameplay requires the player to be on the bridge, thus making it obvious that the bulk of the CPU work should be applied within it to give the player the most bang for their buck. The most time was spent focused on improving the bridge’s various destruction sequences adding elements of intensity to the scene. For example, when the bridge is damaged, big set pieces fly off; sparks ricochet off walls and floors; and fires spring up and throw a glow on surrounding set pieces not in direct view from the light source.

What Makes it CPU Intensive?

Applying damage to the bridge makes use of Unity’s* realtime Rigidbody physics, Particle Systems with large numbers of small particles with collision support enabled, as well as realtime global illumination (GI) updates created by the various fire and spark effects, which all scale across available CPU cores. Various debris objects are spawned that use Rigidbody physics during damage events, and the particle counts were pushed significantly higher when the high-end effects are active. World collision support for the particles was added along with collision primitives with the detail level needed to get the bouncing and scattering behavior desired for the sparks. Some of Unity’s* other particle features that use the CPU were added to enhance the scene, such as sub-emitters to add trails to fireballs and some sparks. The bridge damage particles were kept small in screen-coverage size to keep the GPU impact as low as possible while still achieving the desired look. When damage events occur, some of the lights and emissive surfaces flicker to simulate power interruption. The GI is updated while the lights are flickering and when there are fires active on the bridge. Next, we’ll go into each system and show how they can be leveraged separately.

Sparks

Overview

Unity’s* built-in Particle System component allows for a lot of variation in both aesthetics and behavior. It also just so happens that the built-in Particle System scales across available CPU cores very well under the hood. With the click of a button you can have your particle system collide and react to your environment, or if you want a more customized behavior, you can script the movement of each particle (more on this later). When using the built-in collision behavior shown below, the underlying engine will split the work up amongst available cores, allowing the system to go as wide as possible. Because of this, you can scale your particle counts based on the number of cores available, while also considering processor frequency and cache size. To activate collisions on your particles, simply go to the Particle System component of interest, check the Collision checkbox, and then select the desired settings associated with it.

There are quite a few options involved in the Collision settings group. The main setting to consider should be between colliding against the World or if you’d like to define a set of planes that the particles will collide with. The former setting will produce the most realistic simulation as virtually every existing collider in your scene will be considered in each particle update calculation, but this of course comes with additional CPU cost. Usually, games will define a set of key planes that will act as an approximation of the surrounding geometry to keep compute as low as possible to make room for other CPU intensive effects. The setting you choose depends entirely on the layout of your game and what you’d like to achieve visually. For example, the following defines three planes as colliders: a floor and two walls.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Assertions;

public class ParticleSystemController : MonoBehaviour
{
    public static ParticleSystemController Singleton = null;

    ParticleSystem[] ParticleSystems;
    public Transform[] CollisionPlanes;

    void Awake()
    {
        if(!Singleton)
        {
            Singleton = this;
            Debug.Log("Creating ParticleSystemController");
        }
        else
        {
            Assert.IsNotNull(Singleton, "(Obj:" + gameObject.name + ") Only 1 instance of ParticleSystemController needed at once");
            DestroyImmediate(this);
        }
    }

    public void Init()
    {
        ParticleSystems = gameObject.GetComponentsInChildren();
        Debug.Log("Initializing ParticleSystemController");
    }

    void Start()
    {
        SetCPULevel(CPUCapabilityManager.Singleton.CPUCapabilityLevel);
    }

    public void SetCPULevel(CPUCapabilityManager.SYSTEM_LEVELS sysLevel)
    {
        if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.HIGH)
        {
            for (int i = 0; i < ParticleSystems.Length; i++)
            {
                var particleSysMain = ParticleSystems[i].main;
                var particleSysCollision = ParticleSystems[i].collision;
                var particleSysEmission = ParticleSystems[i].emission;
                particleSysEmission.rateOverTime = 400.0f;
                particleSysMain.maxParticles = 20000;
                particleSysCollision.enabled = true;
                particleSysCollision.type = ParticleSystemCollisionType.World;
            }
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.MEDIUM)
        {
            for (int i = 0; i < ParticleSystems.Length; i++)
            {
                var particleSysMain = ParticleSystems[i].main;
                var particleSysCollision = ParticleSystems[i].collision;
                var particleSysEmission = ParticleSystems[i].emission;
                particleSysEmission.rateOverTime = 300.0f;
                particleSysMain.maxParticles = 10000;
                particleSysCollision.enabled = true;
                particleSysCollision.type = ParticleSystemCollisionType.World;
            }
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.LOW)
        {
            for (int i = 0; i < ParticleSystems.Length; i++)
            {
                var particleSysMain = ParticleSystems[i].main;
                var particleSysCollision = ParticleSystems[i].collision;
                var particleSysEmission = ParticleSystems[i].emission;
                particleSysEmission.rateOverTime = 200.0f;
                particleSysMain.maxParticles = 5000;
                particleSysCollision.enabled = true;
                particleSysCollision.type = ParticleSystemCollisionType.Planes;
                for (int j = 0; j < CollisionPlanes.Length; j++)
                {
                    particleSysCollision.SetPlane(j, CollisionPlanes[j]);
                }
            }
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.OFF)
        {
            for (int i = 0; i < ParticleSystems.Length; i++)
            {
                var particleSysMain = ParticleSystems[i].main;
                var particleSysCollision = ParticleSystems[i].collision;
                var particleSysEmission = ParticleSystems[i].emission;
                particleSysEmission.rateOverTime = 100.0f;
                particleSysMain.maxParticles = 3000;
                particleSysCollision.enabled = false;
            }
        }
    }
}

See more optimized version in CPUCapabilityTester sample

Realtime Global Illumination (GI)

Overview

Realtime GI is the simulation of light rays bouncing within a scene and indirectly illuminating objects. This feature was something the team really wanted to leverage because the big window at the front of the Aegis would allow for astral bodies and damage effects to update the interior of the bridge. Moving the Aegis in front of a massive sun or nebula changes the appearance of the bridge to reflect the incoming light, increasing immersion by giving the scene a cohesive look and making the vistas feel much more real.

What Makes it CPU Intensive?

Unity’s* realtime GI is computed heavily on the CPU and leverages a percentage of the available cores depending on the fidelity desired.

Is it Built into Unity*?

Yes. When the realtime GI effects are enabled, the application uses the highest CPU usage setting Unity* allows with an immediate update rate to get the best results.

How it’s Done

To enable this effect, check the Realtime Lighting checkbox in the Lighting window (Window> Lighting). (Note: Editor performance settings for Realtime GI were hidden in recent versions of Unity* and handled under the hood. Scripted update settings are still available – see sample for details) On older versions of Unity*, check the Precomputed Realtime GI checkbox (still within Window> Lighting). There are two settings which both heavily affect CPU usage. Realtime Resolution and CPU Usage.

  • Realtime Resolution determines how many texels per unit should be computed. Unity* published a tutorial that goes into detail on how to properly set this value. A useful rule of thumb is that visually rich indoor scenes require more texels per unit to achieve as much realism as possible. In large outdoor scenes, indirect lightning transitions are not as noticeable, allowing the compute power to be spent elsewhere.
  • CPU Usage determines how many of the engine’s available worker threads will be leveraged for the realtime GI computation. It is best practice to determine the amount of CPU power available on various system levels and set this accordingly. For lower-end systems it’s best to keep this low/medium; for higher-end systems it’s better to use high or unlimited. Descriptions of these settings can be found in the Unity* documentation shipped with versions that expose them.


Settings in Unity* 5.6.1f1


Settings in older versions of Unity*

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Assertions;

public class GIController : MonoBehaviour {

    public static GIController Singleton = null;

    void Awake()
    {
        if (!Singleton)
        {
            Singleton = this;
            Debug.Log("Creating GIController");
        }
        else
        {
            Assert.IsNotNull(Singleton, "(Obj:" + gameObject.name + ") Only 1 instance of GIController needed at once");
            DestroyImmediate(this);
        }
    }

    public void Init()
    {
        Debug.Log("Initializing GIController");
    }

    void Start () {
        SetCPULevel(CPUCapabilityManager.Singleton.CPUCapabilityLevel);
    }

    public void SetCPULevel(CPUCapabilityManager.SYSTEM_LEVELS sysLevel)
    {
        if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.HIGH)
        {
            DynamicGI.updateThreshold = 0;
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.MEDIUM)
        {
            DynamicGI.updateThreshold = 25;
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.LOW)
        {
            DynamicGI.updateThreshold = 50;
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.OFF)
        {
            DynamicGI.updateThreshold = 100;
        }
        Debug.Log("(" + gameObject.name + ") System capability set to: " + CPUCapabilityManager.Singleton.CPUCapabilityLevel + ", so setting GI update threshold to: " + DynamicGI.updateThreshold);
    }
}

Dynamic Asteroids

Overview

When the Aegis navigates asteroid fields, additional asteroids are generated outside the view frustum of the player and launched into view. These asteroids collide with existing in-place asteroids and kick off dust.

Many of the games maps also contain asteroid field generators; these field generators scatter large static asteroids within a cylindrical or spherical zone. When high-end CPU effects are enabled, these zones also place dynamic asteroids with Rigidbody physics at a certain distance away from the ship while it’s moving. This helps give the impression that the asteroid field is full of smaller fragments colliding with each other and the larger asteroids. There is, additionally, a small chance a dynamic asteroid will spawn with a velocity already applied to keep things moving and the scene active. Finally, some asteroids will break apart into smaller fragments when colliding with the player’s ship or other asteroids, while others will bounce off but remain intact.

These changes have the effect of pulling the player’s attention away from the skybox, creating a sense that the player truly is in space; all without disrupting gameplay.

What Makes it CPU Intensive?

Having large numbers of dynamic asteroid fragments flying around in asteroid fields using Rigidbody physics, instantiating un-pooled fragments while moving and generating additional fragments when asteroids break apart all use a lot of CPU time.

Is it Built into Unity*?

The dynamic asteroids use Unity’s* Rigidbody Physics and Particles Systems, but the system to generate the asteroids was written and customized by the Star Trek™: Bridge Crew* team. Check out the sample below to see how you can implement a similar system yourself.

How it’s Done

If the player’s machine is capable, previously static models in the scene that don’t need to remain static can have Rigidbody physics enabled. This can be done dynamically in script by adding new Rigidbody components to existing objects, or by generating prefabs of preconfigured objects on-the-fly. Dynamic objects and objects that can be interacted with do a lot to increase immersion in games, especially in VR.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Assertions;

public class StaticDynamicController : MonoBehaviour {

    public static StaticDynamicController Singleton = null;
    public GameObject[] PotentiallyDynamicObjects;
    int NumDynamicObjects = 0;

    void Awake()
    {
        if (!Singleton)
        {
            Debug.Log("Creating StaticDynamicController");
            Singleton = this;
        }
        else
        {
            Assert.IsNotNull(Singleton, "(Obj:" + gameObject.name + ") Only 1 instance of GIController needed at once");
            DestroyImmediate(this);
        }
    }

    public void Init()
    {
        Debug.Log("Initializing StaticDynamicController");
    }

    void Start () {
        SetCPULevel(CPUCapabilityManager.Singleton.CPUCapabilityLevel);
    }

    public void SetCPULevel(CPUCapabilityManager.SYSTEM_LEVELS sysLevel)
    {
        if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.HIGH)
        {
            NumDynamicObjects = PotentiallyDynamicObjects.Length;
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.MEDIUM)
        {
            NumDynamicObjects = PotentiallyDynamicObjects.Length / 2;
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.LOW)
        {
            NumDynamicObjects = PotentiallyDynamicObjects.Length / 3;
        }
        else if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.OFF)
        {
            NumDynamicObjects = 0;
        }

        Debug.Log("(Obj:" + gameObject.name + ") System capability set to: " + CPUCapabilityManager.Singleton.CPUCapabilityLevel + ", so setting number of dynamic objects to: " + NumDynamicObjects);

        for (int i = 0; i < NumDynamicObjects; i++)
        {

            Rigidbody objRigidBody = PotentiallyDynamicObjects[i].AddComponent();
            objRigidBody.useGravity = true;
            objRigidBody.mass = 10;
            PotentiallyDynamicObjects[i].AddComponent();
        }
    }
}

Cloud Wakes and Solar Flares

Overview

Cloud wakes increase immersion by creating the illusion that enemy ships and the Aegis are displacing dust as they move through space. Solar flares accomplish the same thing by distracting the eye from the skybox, making the player feel like they are in the far reaches of space.

What Makes it CPU Intensive?

The cloud wakes and solar flares use scripted particle behaviors which require updating the particles individually using a script on the main thread. Looping through several hundred to a few thousand particles and updating their properties through script uses a lot of CPU time, but allows custom behavior for particles that wouldn’t be possible using the normal particle system properties offered out of the box in Unity*. Keep in mind that this must currently be done on the main thread, so this system can’t go as wide on the cores as the previously mentioned particle collision system. Stay tuned for Unity’s* new C# job system mentioned at Unite Europe 2017 which will extend the Unity* API to allow better multi-threading in script code.

Is it Built into Unity*?

Cloud wakes and solar flares use Unity’s* Particle System, but how the particles move and change over time was scripted by Red Storm Entertainment. The wake effect emits particle trails from several emitter points on the ship using a single particle system. The size and lifetime of the particles in a single trail are based on its emitter. The trail particles are emitted in world space, but the emitter points stay attached to the ship so that they continue to emit from the correct locations as the ship turns and banks. The custom particle behavior script adds virtual “attractor” objects behind the ship that oscillate randomly to pull nearby particles towards them, introducing turbulence to the trails behind the ships while passing through clouds. The solar flares also use the attractor behavior to either splash the particles outward or pull them back towards the sun’s surface after initially having been emitted outward. The following simple example shows how to make all particles head towards the world origin.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class ParticleBehavior : MonoBehaviour {

    public ParticleSystem MyParticleSystem;
    ParticleSystem.Particle[] MyParticles = new ParticleSystem.Particle[4000];
    public float ParticleSpeed = 10.0f;

	void Update () {
        int numParticles = MyParticleSystem.GetParticles(MyParticles);

        for(int i = 0; i < numParticles; i++)
        {
            MyParticles[i].position = Vector3.MoveTowards(MyParticles[i].position, Vector3.zero, Time.deltaTime * ParticleSpeed);
        }

        MyParticleSystem.SetParticles(MyParticles, numParticles);
	}
}

Ship Destruction

Overview

The ship destruction feature enhances the game by giving players a more satisfying feeling when defeating an enemy. Traditionally in games, a trick is used to occlude exploding objects with an explosion effect to mask the popping effect when removing a discarded GameObject from the scene. With the available CPU power in higher-end setups, we can split the model into pieces and launch them all in different directions, and even add sub-destruction. Each piece can collide with scene dressings and then ultimately disappear or linger if the system can handle it.

What Makes it CPU Intensive?

The ships are broken into many different parts by the artists that all contain Rigidbody components, and animated via physics forces when they're initialized. Collision with other objects (i.e., asteroids, ships) was enabled to ensure realistic behavior when animated in the environment. Furthermore, each exploded ship part had particle trails attached to them.

Is it Built into Unity*?

The Rigidbody and physics aspect of this feature are entirely built in, and Unity*-specific methods are used to add explosion forces to the ship parts. Afterwards, they are animated and collide with objects using Unity’s* Rigidbody Physics system. A Unity* Particle System is used to emit particles that have sub-emitters to create trails behind the pieces, but the top-level particle positions are managed in script to ensure they remained attached to the exploded ship parts without worrying about parent coordinate spaces.

How it’s Done

Build out your models in pieces separated by various break points. Outfit each game object containing a mesh renderer in Unity* with a Rigidbody component. When the object should be destroyed, enable the Rigidbody components on each mesh and apply an explosive force to all of them. See Unity’s* Rigidbody documentation for more details.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Assertions;

public class ExplosionController : MonoBehaviour {

    // Explosion arguments
    public float ExplosiveForce;
    public float ExplosiveRadius;
    public Transform ExplosiveTransform;    // Centerpoint of explosion

    public Rigidbody BaseRigidBody;
    public GameObject[] PotentiallyDetachableCubes;
    List ObjRigidbodies = new List();
    bool IsCPUCapable = false;
    bool HasExploded = false;

	void Start ()
    {
        SetCPULevel(CPUCapabilityManager.Singleton.CPUCapabilityLevel);
    }

    public void SetCPULevel(CPUCapabilityManager.SYSTEM_LEVELS sysLevel)
    {
        // Only use if CPU deemed medium or high capability
        if (sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.HIGH
            || sysLevel == CPUCapabilityManager.SYSTEM_LEVELS.MEDIUM)
        {
            IsCPUCapable = true;

            // add rigidbodies to all little cubes
            for (int i = 0; i < PotentiallyDetachableCubes.Length; i++)
            {
                Rigidbody CurrRigidbody = PotentiallyDetachableCubes[i].AddComponent();
                CurrRigidbody.isKinematic = true;
                CurrRigidbody.useGravity = false;
                ObjRigidbodies.Add(CurrRigidbody);
            }
            Debug.Log("(ExplosionController) System capability set to: " + CPUCapabilityManager.Singleton.CPUCapabilityLevel + ", so object (" + gameObject.name + ") is destructible");
        }
        else
        {

            Debug.Log("(ExplosionController) System capability set to: " + CPUCapabilityManager.Singleton.CPUCapabilityLevel + ", so object (" + gameObject.name + ") not destructible");
        }
    }

    public void ExplodeObject()
    {
        HasExploded = true;
        if (IsCPUCapable)
        {
            BaseRigidBody.useGravity = false;
            BaseRigidBody.isKinematic = true;
            BoxCollider[] BaseColliders = GetComponents();
            for(int i = 0; i < BaseColliders.Length; i++)
            {
                BaseColliders[i].enabled = false;
            }
            for (int i = 0; i < ObjRigidbodies.Count; i++)
            {
                Rigidbody CurrRigidbody = ObjRigidbodies[i];
                CurrRigidbody.isKinematic = false;
                CurrRigidbody.useGravity = true;
                CurrRigidbody.AddExplosionForce(ExplosiveForce, ExplosiveTransform.position, ExplosiveRadius);
                ObjRigidbodies[i].gameObject.AddComponent();
            }
        }
        else
        {
            // Boring destruction implementation
            BaseRigidBody.AddExplosionForce(ExplosiveForce, ExplosiveTransform.position, ExplosiveRadius);
        }
    }

    void OnCollisionEnter(Collision collision)
    {
        if(!HasExploded)
        {
            ExplosiveTransform.position = collision.contacts[0].point;
            ExplodeObject();
        }
    }
}

CPU Capability Detection Plugin

Ok, so we’ve been through each of the features added to Star Trek™: Bridge Crew*, but how do we determine what our target system can handle? To make this as painless as possible, we’ve created an easy-to-use Unity* plugin with source. The code comes with example code written for both Unity* and native implementations and acts as a toolbox to easily get you system metrics to help you define your target system categories. Many of the above examples are integrated into the sample to make it easy to hit the ground running. Here are the steps:

  1. Define your CPU performance tiers.
    public enum SYSTEM_LEVELS
        {
            OFF,
            LOW,
            MEDIUM,
            HIGH,
            NUM_SYSTEMS
        };
  2. Set your CPU value thresholds. Various metrics are supplied from the plugin, such as logical/physical core count, max frequency, system memory, etc. However, you can always add your own if you’d like to consider other factors. For most basic uses, the supplied metrics should suffice.
            // i5-4590
            LowSettings.NumLogicalCores = 4;
            LowSettings.UsablePhysMemoryGB = 8;
            LowSettings.MaxBaseFrequency = 3.3;
            LowSettings.CacheSizeMB = 6;
    
            // i7 - 7820HK - Set to turbo mode
            MedSettings.NumLogicalCores = 8;
            MedSettings.UsablePhysMemoryGB = 8;
            MedSettings.MaxBaseFrequency = 3.9;
            MedSettings.CacheSizeMB = 8;
    
            // i7-6700k
            HighSettings.NumLogicalCores = 8;
            HighSettings.UsablePhysMemoryGB = 8;
            HighSettings.MaxBaseFrequency = 4.0;
            HighSettings.CacheSizeMB = 8;
    
  3. Initialize the plugin and determine if the user is running on an Intel® processor.
    void QueryCPU()
        {
            InitializeResources();
            if (IsIntelCPU())
            {
                // Your performance categorization code
            }
            else
            {
                Debug.Log("You are not running on an Intel CPU");
            }
        }
  4. Query the target system.
    StringBuilder cpuNameBuffer = new StringBuilder(BufferSize);
                GetProcessorName(cpuNameBuffer, ref BufferSize);
                SysLogicalCores = GetNumLogicalCores();
                SysUsablePhysMemoryGB = GetUsablePhysMemoryGB();
                SysMaxBaseFrequency = GetMaxBaseFrequency();
                SysCacheSizeMB = GetCacheSizeMB();
  5. Compare your threshold values to determine which previously defined performance tier the system tested belongs in.
    bool IsSystemHigherThanThreshold(SystemThreshold threshold)
        {
            if (threshold.NumLogicalCores < SysLogicalCores && threshold.MaxBaseFrequency < SysMaxBaseFrequency&& threshold.UsablePhysMemoryGB < SysUsablePhysMemoryGB && threshold.CacheSizeMB < SysCacheSizeMB)
            {
                return true;
            }
            return false;
        }
    SYSTEM_LEVELS MySystemLevel = SYSTEM_LEVELS.OFF;
    
    if (IsSystemHigherThanThreshold(HighSettings) || IsWhitelistedCPU(SYSTEM_LEVELS.HIGH))
            {
                MySystemLevel = SYSTEM_LEVELS.HIGH;
            }
            else if (IsSystemHigherThanThreshold(MedSettings) || IsWhitelistedCPU(SYSTEM_LEVELS.MEDIUM))
            {
                MySystemLevel = SYSTEM_LEVELS.MEDIUM;
            }
            else if (IsSystemHigherThanThreshold(LowSettings) || IsWhitelistedCPU(SYSTEM_LEVELS.OFF))
            {
                MySystemLevel = SYSTEM_LEVELS.LOW;
            }
            else
            {
                MySystemLevel = SYSTEM_LEVELS.OFF;
            }
    
            Debug.Log("Your system level has been categorized as: " + MySystemLevel);

Performance Profiling and Considerations

Just like with GPU work, we need to verify that our feature set’s combined CPU utilization doesn’t exceed our low, medium, and high targets to cause constant Asynchronous Spacewarp (trying very hard to resist terrible Star Trek™ pun) and reprojection triggers. We wanted to make sure that the game maintained a consistent 90 frames per second while still maximizing the CPU, no matter what machine the game was running on. The Star Trek™: Bridge Crew* team decided on three levels of feature sets: Off, Partial, and Full. So, we tested the Full group of features on a machine which matched with our Off threshold.


GPUView showing work distribution on desktop system with HSW i5-4590 CPU + GTX 1080 GPU

CPUGraphics CardScenarioConfigurationRunRefresh IntervalsNew FramesDropped FramesSynthetic Frames Generated

HSW i5-4590

GTX1080

Mission 5 after initial warp

Full Settings

1

11861

5993

58

5810

2

11731

6584

56

5091

3

11909

6175

101

5633

 

 

Averages

11833.67

6250.67

71.67

5511.33

Number of synthetic frames generated being non-zero indicates the CPU work exceeded 11.1 ms per frame threshold with the full feature set on the lower end CPU

The above GPUView screenshot shows ~22 ms of time passing from present to present (highlighted). Present indicates when the final frame has been generated and is ready for submission to the head mounted display (HMD). This can be thought of in terms of frame rate (converting to 45 fps). Going from 90 to 45 fps means that we are consistently triggering ASW with this configuration running on our ‘Off’ tier system. Looking at three test runs over Mission 5, we see an average of ~5.5k synthetic frames being generated because of ASW triggers. Squeezing these immersive features onto Oculus min-spec didn’t work out, as we expected. But rather than keeping feature sets off across all configurations, we bound feature sets to hardware levels we could determine at run-time to activate the appropriate set, allowing players with all hardware levels to experience the game as best as it should be experienced. If we look at the same configuration running on our high-end target (Intel® Core™ i7-7700K processor), we see things change.


GPUView showing work distribution on desktop system with KBL i7-7700K CPU + GTX 1080 GPU

CPUGraphics CardScenarioConfigurationRunRefresh IntervalsNew FramesDropped FramesSynthetic Frames Generated

KBL i7-7700k

GTX1080

Mission 5 after initial warp

Full Settings

1

11703

11666

37

0

2

11654

11617

37

0

3

11700

11672

28

0

 

 

Averages

11685.67

11651.67

34.00

0.00

Number of synthetic frames generated being zero indicates the CPU work never exceeded 11.1 ms per frame threshold with full feature set on the higher end CPU

With the additional logical cores, increased frequency, and bigger cache size of our high-end target, all the work can sprawl out and complete within the allotted 11.1 ms required to hit 90 fps. The average duration of CPU work per frame ranges from 9-10.3 ms from head to tail. This means we are pushing our high-end target nearly to its limit, but still maintaining a solid 90 fps and utilizing all of the resources available to us. We’ve hit the sweet spot! Ok, so we’ve got our ‘Off’ and ‘Full’ feature sets tested. At this point, we needed to select a subset of the ‘Full’ features to enable on the Intel Core i7-7700HK processor-based VR-ready notebooks. This is our mid-target for the ‘Partial’ feature set. We wanted to keep features that really affected the inside of the bridge, so we prioritized those and slowly removed the others one by one until we hit the sweet spot. Eventually, we only had to cut the dynamic wake effects and dynamic asteroids to comfortably push out 90 fps on the laptop. Here is a screen capture of GPUView* showing the ‘Partial’ feature set running on our test VR–ready notebook.


GPUView showing work distribution on VR Gaming Laptop with KBL i7-7820HK CPU + GTX 1080 GPU

CPUGraphics CardScenarioConfigurationRunRefresh IntervalsNew FramesDropped FramesSynthetic Frames Generated

KBL i7-7820HK

GTX1080

Mission 5 after initial warp

Full Settings

1

11887

11242

116

529

2

11881

11315

110

456

3

11792

10912

125

755

 

 

Averages

11853.33

11156.33

117.00

580.00

Number of synthetic frames generated being non-zero indicates the CPU work exceeded 11.1 ms per frame threshold with the full feature set on the VR ready laptop

CPUGraphics CardScenarioConfigurationRunRefresh IntervalsNew FramesDropped FramesSynthetic Frames Generated

KBL i7-7820HK

GTX1080

Mission 5 after initial warp

Partial Settings

1

11882

11844

38

0

2

10171

10146

25

0

3

11971

11933

38

0

 

 

Averages

11341.33

11307.67

33.67

0.00

Number of synthetic frames generated being zero indicates the CPU work never exceeded 11.1 ms per frame threshold with the partial feature set on the VR ready laptop

Conclusion

Overall, CPU usage increases the most from the use of more realistic, higher resolution simulations as well as the existence of more dynamic entities; physics simulations previously thought to be too expensive are now something that can be enabled on many CPUs. Additionally, various other CPU intensive systems like animation/Inverse Kinematics (IK), cloth simulation, flocking, fluid simulation, procedural generation, and more can be used to create a more rich and realistic world. The industry has had settings tiers for graphics for a while now and it’s time we start thinking the same way about CPU settings. When developing a game, think about all your untapped compute potential on different hardware levels and consider how it can harnessed to make your game something special. Check the links below for more information. Happy hunting.

  • Special Thanks to Kevin Coyle and the rest of the Red Storm Entertainment team who worked with us on this partnership and helped put together this article **

Additional Resources

“Set Graphics to Stun: Enhancing VR Immersion with the CPU in Star Trek™: Bridge Crew*”

The author presented the information in this article at Unite 2016.

Session Description– Many games and experiences these days put a huge emphasis on GPU work and let the many cores built in to modern mainstream CPUs sit idle on the sideline. This talk explores how Ubisoft's Red Storm studio and Intel partnered to push immersion as far as possible in Star Trek™: Bridge Crew using Unity* to take advantage of these available resources. Learn how you can achieve stunning visuals with minimal performance impact on the GPU in your own games!

Catlike Coding

Catlike Coding offers a number of great CPU/math-heavy tutorials that anybody can pick up and run with. The tutorials are Unity*-focused but can apply to any other engine as the meat of the content doesn’t depend on any particular API. It’s highly recommended for those interested in procedural generation, leveraging curves/splines, mesh deformation, texture manipulation, noise, and more.

Fluid Simulation for Video Games (Series)

This is a well-written tutorial on implementing fluid simulation for video games that leverage many cores. The article is great for beginners, walking them through everything from concept to implementation. By the end of the article the reader will have source code to add to their own engine and an understanding of how to manipulate the code to emulate various fluid types.

Link: https://software.intel.com/en-us/articles/fluid-simulation-for-video-games-part-1


How to tell CPU model, when running under hypervisor that spoofs CPUID

$
0
0

When running in a virtual machine, you may never be sure which physical CPU you are running on - hypervisor can pass anything as CPUID.
For best performance, it helps to use the best instruction set supported by a physical CPU - be it AVX512, AVX2, AVX, SSE4.1, AES-NI, or other accelerated instruction sets. Enhanced Platform Awareness features use top-down approach to close this gap, but bottom-up approach is also possible.

 

Domain Transform with MKL -- Fourier Transform & Discrete Cosine Transform

$
0
0

Domain transform is a group of mathematical calculations that transform a set of signal data from one domain into another. E.g., from time-domain, where signal changes over time, to frequency-domain, where signal lies within each given frequency band over a range of frequencies. In some cases, there are no latent information that can be yield by analyzing a signal in time-domain. However, information that hides in the signal can be easily found out by analyzing it in frequency-domain.

This document introduces 2 kinds of domain transforms: Fourier transform (FT, including Fast Fourier transform, FFT), and discrete cosine transform (DCT). As well as how to do these transforms using MKL API functions.

Contents in this article:

 

1.. Preface
2.. Introduction of domain transform
3.. FFT/DFT with MKL
4.. DCT with MKL
5.. Summary

 

Using GraphicsMagick with Intel® Integrated Performance Primitives

$
0
0

GraphicsMagick (http://www.graphicsmagick.org/) is a popular library for image processing, encoding, and decoding. The patch provided with Intel® IPP 2018 injects functions from the Intel® IPP library to C Magick API of the GraphicsMagick to boost performance with minimal user efforts.

Please refer to the Readme file of the Intel® IPP patch for GraphicsMagick for more detailed information about its features, supported functionalities and system requirements.

This document introduces Intel® IPP 2018 patch for GraphicsMagick. This patch enhances performance of several functions of GraphicsMagick.

This document is organized as the following.

  • Introduction.
  • Build GraphicsMagick with Intel® IPP library.
  • Performance & New Functionalities.

Intel Optimized Tensorflow Wheel Now Available

$
0
0

Intel's Tensorflow optimizations are now available for Linux as a wheel installable through pip.

For more information on the optimizations as well as performance data, see this blog post.

To install the wheel into an existing Python installation, simply run

# Python 2.7
pip install https://anaconda.org/intel/tensorflow/1.2.1/download/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl

# Python 3.5
pip install https://anaconda.org/intel/tensorflow/1.2.1/download/tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl

# Python 3.6
pip install https://anaconda.org/intel/tensorflow/1.2.1/download/tensorflow-1.2.1-cp36-cp36m-linux_x86_64.whl

To create a conda environment with Intel Tensorflow that also takes advantage of the Intel Distribution for Python’s optimized numpy, run

conda create -n tf -c intel python=<2|3> pip numpy
. activate tf
# Python 3.5
pip install https://anaconda.org/intel/tensorflow/1.2.1/download/tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl
# Python 2.7
pip install https://anaconda.org/intel/tensorflow/1.2.1/download/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl

testing points :50014

testing points :77351

testing points :76959


testing points :29037

testing points :28668

testing points :20838

testing points :75007

testing points :93443

testing points :22006

testing points :40392


Complexity Sciences Center, University of California, Davis

$
0
0

Principal Investigators:

Jim Crutchfield teaches nonlinear physics at the University of California, Davis, directs its Complexity Sciences Center, and promotes science interventions in nonscientific settings. He's mostly concerned with what patterns are, how they are created, and how intelligent agents discover them; see http://csc.ucdavis.edu/~chaos/

 

Description:

A novel approach to data-driven prediction and unsupervised learning of coherent structures in climate dynamics. Extend our unsupervised machine learning methods in two fundamental ways. The first is that our methods will facilitate pattern discovery—inferring both known patterns and novel, as-yet-unseen patterns and coherent structures from the data. Let the data to tell us the appropriate representations to use to describe patterns, as opposed to selecting a single favorite functional basis or trial-and-error tests to compare them. The second is to adapt our methods to spatiotemporal data—data in which spatial configurations (e.g., velocity vector fields) evolve over time. The goal is to implement structural inference in a principled way that naturally includes temporal dynamics. A wholly new approach, such as this, facilitates the discovery of emergent dynamical patterns in spatiotemporal data is ideally matched to the fundamental algorithmic challenges posed in climate modeling.

 

Related websites:

http://csc.ucdavis.edu/

http://csc.ucdavis.edu/~chaos/

http://informationengines.org/

Intel® Parallel Computing Center at University of Oxford

$
0
0

Principal Investigators:

Dr. Wood is an associate professor in the Department of Engineering Science at the University of Oxford, a Fellow of the Alan Turing Institute, a Governing Body Fellow of Kellogg College, and a founder of Invrea, Ltd and infinitemonkeys.ai. Formerly Dr. Wood was an assistant professor of Statistics at Columbia University and a postdoctoral fellow of the Gatsby Computational Neuroscience Unit of the University College London.  He received his PhD from Brown University in computer science and his BS from Cornell University.  Dr. Wood has raised over $5M from DARPA, BP, Google, Intel, and Microsoft.  Prior to his academic career Dr. Wood was a successful entrepreneur. 

Description:

In collaboration with NYU, LBNL, and Intel we propose to use probabilistic programming techniques to filter and explain high energy physics experiments by automatically inferring the structure of events, including the particles produced and their properties, directly from observed experimental results. The physics community already has a series of simulation software tools both for the underlying physics and for the modeling of the interaction of the underlying particles with experimental detector. Bringing these together using probabilistic programming powered by massively parallel high performance computing will enable us to tackle the fundamental inference problem in particle physics directly for the first time, offering a new way for particle physicists to tackle the detection of novel physics signatures, ultimately at Large Hadron Collider data and computation scale.

Related websites:

http://www.robots.ox.ac.uk/~fwood

Intel® Parallel Computing Center at The Molecular Sciences Software Institute

$
0
0

Principal Investigators:

Prof. T. Daniel Crawford’s research expertise includes the development of high-accuracy quantum chemical models for the spectroscopic properties of chiral molecules in both gas and liquid phases. For more than 20 years he has been a lead developer of the PSI quantum chemistry package, 55 which was one of the first electronic structure packages to be distributed under a fully open-source license and is used by thousands of molecular scientists worldwide. He is the 2010 winner of the Dirac Medal of the World Association of Theoretical and Computational Chemists (WATOC).

Description:

The Molecular Sciences Software Institute (MolSSI) is a new initiative funded by the U.S. National Science Foundation to serve as a nexus for science, education, and cooperation for the community of computational molecular scientists — a broad field that includes biomolecular simulation, quantum chemistry, and materials science. The MolSSI’s will provide software-engineering expertise, education, and leaderships to enable molecular scientists to tackle problems that are orders of magnitude larger and more complex than those currently within our grasp.   The MolSSI is a joint effort by Virginia Tech, Rice University, Stony Brook University, U.C. Berkeley, Rutgers University, the University of Southern California, Stanford University, and Iowa State University.

Related Websites:

molssi.org

Intel® Parallel Computing Center at University of California, Berkeley

$
0
0

Principal Investigators:

Jeffrey Regier is a postdoctoral research at UC Berkeley in the Department of Electrical Engineering and Computer Science . His research focuses on Bayesian modeling, variational inference, and optimization for large-scale scientific applications.  Jeff holds a PhD in statistics from UC Berkeley, as well as MS degrees in mathematics (UC Berkeley) and computer science (Columbia University). 

Description:

Astronomical surveys are the primary source of information about the Universe beyond our solar system. They are essential for addressing key open questions in astronomy and cosmology about topics such as the life-cycles of stars and galaxies, the nature of dark energy, and the origin and evolution of the Universe.

We are developing new methods for constructing catalogs of light sources, such as stars and galaxies, for astronomical imaging surveys. These catalogs are generated by identifying light sources in survey images and characterizing each according to physical parameters such as brightness, color, and morphology. Astronomical catalogs are the starting point for many scientific analyses, such as theoretical modeling of individual light sources, modeling groups of similar light sources, or modeling the spatial distribution of galaxies. Catalogs also inform the design and operation of follow-on surveys using more advanced or specialized instrumentation (e.g., spectrographs). For many downstream analyses, accurately quantifying the uncertainty of parameters' point estimates is as important as the accuracy of the point estimates themselves.

Our approach is based on Bayesian inference---a highly accurate method that is notorously demanding computationally. We use supercomputers containing the latest Intel hardware to quickly solve our Bayesian inference problems.

Related websites:

https://github.com/jeff-regier/Celeste.jl

https://people.eecs.berkeley.edu/~jregier/

The inside scoop on how we accelerated NumPy Umath functions

$
0
0

NumPy UMath Optimizations

One of the great benefits found in our Intel® Distribution for Python is the performance boost gained from leveraging SIMD and multithreading in (select) NumPy’s UMath arithmetic and transcendental operations, on a range of Intel CPUs, from Intel® Core™ to Intel® Xeon™ & Intel® Xeon Phi™. With stock python as our baseline, we demonstrate the scalability of Intel® Distribution for Python by using functions that are intensively used in financial math applications and machine learning:

One can see that stock Python (pip-installed NumPy from PyPI) on Intel® Core™ i5 performs basic operations such as addition, subtraction, and multiplication just as well as Intel Python, but not on Intel® Xeon™ and Intel® Xeon Phi™, where Intel Python adds at least another 10x speedup. This can be explained by the fact that basic arithmetic operations in stock NumPy are hard-coded AVX intrinsics (and thus already leverage SIMD, but do not scale to other ISA, e.g. AVX-512). These operations in stock Python also do not leverage multiple cores (i.e. no multi-threading of loops under the hood of NumPy exist with such operations). Intel Python’s implementation allows for this scalability by utilizing the following: respective Intel® MKL VML primitives, which are CPU-dispatched (to leverage appropriate ISA) and multi-threaded (leverage multiple cores) under the hood, and Intel® SVML intrinsics, a compiler-provided short vector math library that vectorizes math functions for both IA-32 and Intel® 64-bit architectures on supported operating systems. Depending on the problem size, NumPy will choose one of the two approaches. On much smaller array sizes, Intel® SVML outperforms VML due to VML’s inherent cost of setting up the environment to multi-thread loops. For any other problem size, VML outperforms SVML and this is thanks to VML’s ability to both vectorize math functions and multi-thread loops.

Specifically, on Intel® Core® i5 Intel Python delivers greater performance on transcendentals (log, exp, erf, etc.) due to utilizing both SIMD and multi-threading. We do not see any visible benefit of multi-threading basic operations (as shown on the graph) unless NumPy arrays are very large (not shown on the graph). On Xeon®, the 10x-1000x boost is explained by leveraging both (a) AVX2 instructions in transcendentals and (b) multiple cores (32 in our setup). Even greater scalability of Xeon Phi® relative to Xeon is explained by larger number of cores (64 in our setup) and a wider SIMD.

The following charts provide another view of Intel Python performance versus stock Python on arithmetic and transcendental vector operations in NumPy by measuring how close UMath performance is to respective native MKL call:

Again, on Intel® Core™ i5, the stock Python performs well on basic operations (due to hard-coded AVX intrinsics and because multi-threading from Intel Python does not add much on basic operations) but does not scale on transcendentals (loops with transcendentals are not vectorized in stock Python). Intel Python delivers performance close to native speeds (90% of MKL) on relatively big problem sizes. While running our umath optimization benchmarks on different architectures, it was discovered that the performance of the umath functions did not scale as one would expect on the Intel® Xeon Phi™. We identified an issue with Intel® OpenMP that made the MKL VML function calls perform poorly in multiprocessing mode. Our team is working closely with the Intel® MKL and Intel® OpenMP teams to resolve this issue.

To demonstrate the benefits of vectorization and multi-threading in a real-world application, we chose to use the Black Scholes model, used to estimate the price of financial derivatives, specifically European vanilla stock options. A Python implementation of the Black Scholes formula gives an idea of how NumPy UMath optimizations can be noticed at the application level:

One can see that on Intel® Core™ i5, the Black Scholes Formula scales nicely with Intel Python on small problem sizes but does not perform well on bigger problem sizes, which is explained by small cache sizes. Stock Python does marginally scale due to leveraging AVX instructions on basic arithmetic operations, but this is a whole different story on Intel® Xeon™ and Intel® Xeon Phi™. With Intel Python running the same Python code on server processors, much greater scalability on much greater problem sizes is delivered. Intel® Xeon Phi™ scales better due to bigger number of cores and as expected, the stock Python does not scale on server processors due to the lack of AVX2/AVX-512 support for transcendentals and no multi-threading utilization.

Viewing all 3384 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>