Executive Summary
On Windows* OS, Direct3D is usually used for video processing. However, there are still many applications using OpenGL* for its cross-platform capability in order to maintain the same GUI and look and feel from platform to platform. Recent Intel graphics drivers support NV_DX_interop to enable D3D to OpenGL surface sharing which can be used in conjunction with Intel® Media SDK. Intel® Media SDK can be configured to use Direct3D and with the introduction of NV_DX_interop, Intel Media SDK's frame buffer can be used by OpenGL without expensive texture copying from GPU to CPU back to GPU. This sample code and white paper demonstrates how to setup Intel® Media SDK to use D3D for encoding and decoding, do the color conversion from NV12 color space (Media SDK's natural color format) to RGBA space (OpenGL's natural color format,) followed by mapping the D3D surface to OpenGL texture. This pipeline completely bypasses copying the textures from GPU to CPU which used to be one of the biggest bottleneck using OpenGL with Intel® Media SDK.
System Requirements
The sample code is written using Visual Studio* 2013 with the multi-purposes of (1) demonstrating Miracast and (2) Intel® Media SDK / OpenGL texture sharing utilizing Intel® Media SDK of which the decoded surfaces are shared with OpenGL textures with 0 copying, making it very efficient. MJPEG decoder is HW accelerated for Haswell and later processors and software decoder is automatically used within Media SDK for the earlier processors. In any case, it requires MJPEG capable camera (either onboard or USB webcam.)
Most of the techniques used in the sample code and white paper should be applicable to Visual Studio 2012 with an exception of identifying Miracast connection type. The sample code is based on Intel® Media SDK 2014 for Client and can be downloaded from the following link (https://software.intel.com/sites/default/files/MediaSDK2014Clients.zip.) Upon installing the SDK, a set of environment variables will be created for the Visual Studio to find the correct paths for the header files and libraries.
Application Overview
The application takes the camera as an MJPEG input and goes through a pipeline to decode MJPEG video, encode the stream to H264, followed by H264 decoder. The MJPEG camera stream (after decoding) and final H264 decoded streams are displayed to MFC based GUI. On Haswell systems, 2 decoders and 1 encoder (1080P resolution) runs sequentially for the readability, but they are quite fast due to HW acceleration and the camera speed is the only limit for fps. In a real world scenarios, the encoders and decoders should run in separate threads and the performance shouldn't be a problem.
On a single monitor configuration, camera feed is displayed in PIP on top of H264 decoded video in the OpenGL based GUI (Figure 1.) When Miracast is connected, the software automatically identifies the Miracast connected monitor and displays a full screen window to fill the H264 decoded video, while the main GUI displays the raw camera video – so that original vs. encoded video can clearly show the difference. Finally, View->Monitor Topology menu can not only detect the current topology of the monitors, it can also change the topology. Unfortunately, it cannot initiate Miracast connection. It can be done only by OS charm menu (slide in from the right -> Devices -> Project) and there is no known API to make a Miracast connection. Interestingly, you can disconnect Miracast by setting the monitor topology to internal only. If multiple monitors are connected by wires, the menu can change the topology any time.
Figure 1. Single Monitor Topology. MJPEG camera is shown in the lower right corner. H264 encoded video fills up the GUI. When multi-monitor is enabled such as Miracast, the software detects the change and MJPEG camera and H264 encoded video are separated to each monitor automatically.
Main Entry Point for the Pipeline Setup
The sample code is MFC based and the main entry point for setting up the pipeline is CChildView::OnCreate (), initializing the camera, MJPEG to H264 transcoder, and H264 decoder followed by binding the textures from transcoder and decoder to OpenGL renderer. Transcoder is just a subclass of the decoder adding the encoder on the top of the base decoder. Finally, OnCreate starts a thread that keeps pumping up the camera feed which is serialized. Upon reading the camera feed in the worker thread, it sends the message to OnCamRead function which decodes MJPEG, encodes to H264, decodes H264 and updates the textures to OpenGL renderer. At the top level, the whole pipeline is very clean and simple to follow.
Initializing Decoder / Transcoder
Both decoder and transcoder is initialized to use D3D9Ex. Intel® Media SDK can be configured to use software, D3D9, or D3D11. In this sample, D3D9 is used for the ease of color conversion. Intel® Media SDK's natural color format is NV12 and either IDirect3DDevice9::StretchRect or IDirectXVideoProcessor::VideoProcessBlt can be used to convert the color space to RGBA. For simplicity, this white paper is using StretchRect, but VideoProcessBlt is generally recommended because it has nice additional capability for post processing. Unfortunately, D3D11 doesn't support StretchRect and color conversion can be convoluted. Also, the decoder and transcoder in this paper uses separate D3D device for various experiments such as mixing software and hardware, but D3D device can be shared between the two to conserve the memory. Once the pipeline is setup this way, the output of the decoding is set to (mfxFrameSurface1 *) type. This is simply a wrapper for D3D9 and mfxFrameSurface1-> Data.MemId can be casted to (IDirect3DSurface9 *) and subsequently used by StretchRect or VideoProcessBlt in CDecodeD3d9::ColorConvert function after the decoding. Media SDK's output surface is non-sharable, but a color conversion is necessary anyways to be used by OpenGL, and a sharable surface is created to store the result of color conversion.
Initializing Transcoder
The result of the transcoder's decode will be directly fed into the encoder and ensure that MFX_MEMTYPE_FROM_DECODE is used when allocating the surface.
Binding Textures between D3D and OpenGL
The code to bind the texture is in CRenderOpenGL::BindTexture function. Ensure that WGLEW_NV_DX_interop is defined, then use wglDxOpenDeviceNV, wglDXSetResourceShareHandleNV, followed by wglDXRegisterObjectNV. This will bind the D3D surface to OpenGL texture. It doesn't automatically update the textures however and calling wglDXLockObjectsNV / wglDXUnlockObjectsNV will update the texture (CRenderOpenGL::UpdateCamTexture and CRenderOpenGL::UpdateDecoderTexture.) Once the texture is updated, you can use it like any textures in OpenGL.
Things to Consider for Multi-Monitor Topology Change
In theory, it may seem simple enough to put up another window to an external monitor and control it based on the topology change detection. In reality however, it can take a while for the OS to initiate the switch until the monitor configuration is completed and the content is shown. When combined with encoder / decoder / D3D / OpenGL and all the nice things that come with it, it can be quite complicated to debug it. The sample tries to re-use most of the pipeline during the switch, but it may be actually easier to close down the whole pipeline and re-initiate it because a lot of things can go wrong when adding a monitor can take more than 10 seconds – even for HDMI or VGA connection.
Future Work
The sample code for this white paper is written for D3D9 and doesn't include D3D11 implementation. It's not clear what the most efficient way is to convert the color space from NV12 to RGBA in the absence of StretchRect or VideoProcessBlt. The paper / sample code will be updated when D3D11 implementation is ironed out.
Contributions
Thanks to Petter Larsson, Michel Jeronimo, Thomas Eaton, and Piotr Bialecki for their contributions to this paper.
Intel, the Intel logo and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others
Copyright© 2013 Intel Corporation. All rights reserved.