Chat Heads is a sample that uses the Intel® RealSense™ SDK to overlay background segmented (BGS) player images on a 3D scene or video playback in a multiplayer setting. The code is written in C++ and uses DirectX*.
In this article, we demonstrate a novel Intel RealSense SDK use case that can improve the e-sport experience of a game by overlaying players’ background segmented video streams on the game. This example will help you understand the various pieces in the implementation (using the Intel RealSense SDK, multiplayer networking, and media encode and decode), their interactions, and the resulting performance.
Figure 1:Screenshot of the sample with two players with a League of Legends* video clip playing in the background.
Installing, Building, and Running the Sample
Download the sample at: https://github.com/GameTechDev/ChatHeads
The sample has many dependencies. It uses RakNet for networking, the Theora Playback Library to play back ogg videos and ImGui for the UI. These are included in the source code.
Windows Media Foundation* (WMF) is a required dependency for encoding and decoding the BGS video streams. The WMF runtime/SDK should be installed by default with a Windows* 8 or greater system. If it is not already installed, install the Windows SDK.
Building and Running the Sample:
Install the Intel® RealSense™ SDK (v5 or higher) prior to building the sample. The header and library include paths use the RSSDK_DIR environment variable, which is set during the SDK installation.
The solution file is at ChatheadsNativePOC\ChatheadsNativePOC and should build successfully with VS2013 and VS2015.
Install the Intel® RealSense™ Depth Camera Manager, which includes the camera driver, before running the sample. The sample has been tested on Windows 8.1 and Windows 10 using both the external and embedded Intel® RealSense™ cameras.
When you start the sample, the option panel shown in Figure 2 displays:
Figure 2:Option panel at startup.
There are four named sections that comprise three actions to take for startup:
- Scene selection. Select between League of Legends* video, Hearthstone* video and a CPUT (3D) scene. Click the Load Scene button to render the selection. This does not start the Intel RealSense software; that happens in a later step.
- Resolutions. The Intel RealSense SDK background segmentation modality supports a handful of profiles (color stream resolutions). Setting a new resolution results in a shutdown of the current Intel RealSense SDK session and initializes a new one.
- Is Server / IP Address. If running as the server, select the Is Server box and then click Start. If running as a client, enter the IP and then click start. This initializes the network and Intel RealSense SDK and plays the selected scene. The maximum number of connected machines (server and client(s)) is hardcoded to 4 in code (NetworkLayer.h)
Note: While a server and client can be started on the same system, they cannot use different profiles (color stream resolutions). Attempting to do so will crash the Intel RealSense SDK runtime since two different profiles can’t run simultaneously on the same camera.
After the network and Intel RealSense SDK initialize successfully, the panels shown in Figure 3 display:
Figure 3:Chat Heads option panels.
The Option panel has multiple sections, each with their own control settings. The sections and their fields are:
- Chat Head Option Panel
- Scenes - Select a scene, and then click Load Scene to render it.
- Resolutions - Select a resolution for the background modality. Click Set Resolution to apply the resolution.
Note: the client and server cannot use different resolutions when run on the same machine.
- BGS/Media controls
- Show BGS Image - If disabled, the color stream is simply used (even though BGS processing still happens). This affects the remote Chat Heads as well (that is, if both sides have the option disabled, you’ll see the background in the video stream).
- Pause BGS - Pause the BGS modality (CPU work for segmentation doesn't happen).
- BGS frame skip interval - The frequency at which the BGS algorithm runs. Enter 0 to run every frame, 1 to run once in two frames, and so on. The limit exposed by the RSSDK is 4.
- Encoding threshold - The encoding threshold is an 8-bit value that determines which pixels are background pixels. See the Implementation section for details.
- Decoding threshold - The decoding threshold is an 8-bit value that determines which pixels are background pixels. See the Implementation section for details.
- Size/Pos controls
- Size - Click/drag within the boxes to resize the sprite. Use it with different resolutions to compare quality.
- Pos - Click/drag within the boxes to reposition the sprite.
- Network control/information
- Network send interval (ms) - Time, in milliseconds, of how often video update data is sent.
- Sent - Graph of data sent by a client or server.
- Rcvd - Graph of data received by a client or server. Clients send their updates to the server, which then broadcasts it to the other clients. For reference, to stream 1080p Netflix* video, the recommended b/w required is 5 Mbps (640 KB/s).
- Metrics
- Process metrics
- CPU Used - The BGS algorithm runs on several Intel® Threading Building Blocks threads and in the context of a game, can use more CPU resources than desired. Play with the Pause BGS and BGS frame skip interval options and change the Chat Head resolution to see how it affects the CPU usage.
- Process metrics
Implementation with Intel® RealSense™ Camera
Since the Intel RealSense software updates its color buffer on an interval basis, with AcquireFrame() blocking until all color samples are ready, it is costly to execute it on the application thread. Thus, all calls to the Intel RealSense SDK happen in a separate thread. The blocking nature of AcquireFrame() also means that synchronization primitives between the application and RealSense thread is not necessary. The networking thread takes care of handling incoming messages and is woken up by the app thread on every Update().
Figure 4 shows the post-initialization interaction and data flow between these systems (threads).
Figure 4: Interaction flow between local and remote Chat Heads.
Color Conversion
The color conversion process prior to encode combines two BGRA pixels into one YUYV pixel. The Intel RealSense camera BGS image uses PXCImage::PixelFormat::PIXEL_FORMAT_RGB32 with alpha set to 0 for background pixels. That format maps directly to the DirectX texture format: DXGI_FORMAT_B8G8R8A8_UNORM_SRGB.
However, YUYV doesn't have alpha, so we use a simple hack of setting Y, U, and V channels to 0 for background pixels. The YUYV bitstream is then encoded using WMF’s H.264 encoder. While decoding, the decoded YUYV values can be non-zero for background pixels because of lossy compression. The specific way to work around the lack of an alpha channel is using encoding and decoding thresholds (exposed in the UI).
If the segmented image alpha is less than the encoding threshold for either of the BGRA pixels, the resulting YUYV pixel is set to 0. If the Y1, U, Y2, and V channels of the decoded pixel are less than the decoding threshold, the resulting BGRA pixels have alpha set to 0. When the decoding threshold is set to 0, you'll notice green highlights around the remote player(s). This is due to the math converting YUYV to BGR (0 YUYV doesn’t correspond to black; it is green).
Bandwidth
The amount of data sent depends on the network send interval and local chat head resolution. The bandwidth varies from ~10 KBps (80 kbps) to ~100 KBps (800 kbps) by reducing the send interval from 70 ms to 10 ms for a 320x240 resolution. Increasing the resolution doesn’t linearly increase the amount of encoded data sent, since a good chunk of the image is the background (YUYV set to 0) which results in much better compression.
Performance
The sample uses Intel® Instrumentation and Tracing Technology (Intel® ITT) markers and Intel® VTune Amplifier XE to help measure and analyze performance. To enable them, uncomment
//#define ENABLE_VTUNE_PROFILING // uncomment to enable marker code
In the file
ChatheadsNativePOC\itt\include\VTuneScopedTask.h
and rebuild.
A concurrency view with varying BGS work (taken on an Intel® Core™ i7-4770R processor with 8 logical cores) is shown below.
Conclusion
The Chat Heads usage is targeted at improving the game experience without sacrificing the game’s quality or performance, both as a player seeing friends’ reactions as you play and as a spectator seeing pro players react as they play. This sample lets you tinker with some of the options and judge the resulting experience.
Acknowledgements
A huge thanks to Jeff Laflam for pair-programming the sample. Thanks also to Brian Mackenzie for the WMF based encoder/decoder implementation and Doug McNabb for CPUT clarifications.