Applying Intel® RealSense™ SDK Face Scans to a 3D Mesh

This sample uses the Intel® RealSense™ SDK to scan and map a user’s face onto an existing 3D character model. The code is written in C++ and uses DirectX*. The sample requires Intel® RealSense™ SDK R5 or greater, which can be found here.

The sample is available at https://github.com/GameTechDev/FaceMapping.

Scanning

The face scanning module is significantly improved in the WM5 SDK. Improvements include:

Improved color data
Improved consistency of the scan by providing hints to direct the user’s face to an ideal starting position
Face landmark data denoting the positions of key facial features

These improvements enable easier integration into games and other 3D applications by producing more consistent results and requiring less user modification.

The scanning implementation in this sample guides the user’s head to a correct position by using the hints provided from the Intel® RealSense™ SDK. Once the positioning requirements are met, the sample enables the start scan button.

This sample focuses on the face mapping process, and therefore the GUI for directing the user during the scan process is not ideal. The interface for an end-user application should better direct the user to the correct starting position as well as provide instructions once the scan begins.

The output of the scan is an .OBJ model file and an associate texture which will be consumed in the face mapping phase of the sample.

Figure 1: The face scanning module provides a preview image that helps the user maximize scan coverage.

Figure 2: Resulting scanned mesh. The image on the far right shows landmark data. Note that the scan is only the face and is not the entire head. The color data is captured from the first frame of the scan and is projected onto the face mesh; this approach yields high color quality but results in texture stretching on the sides of the head.

Face Mapping

The second part of the sample consumes the user’s scanned face color and geometry data and blends it onto an existing head model. The challenge is to create a complete head from the scanned face. This technique displaces the geometry of an existing head model as opposed to stitching the scanned face mesh onto the head model. The shader performs vertex displacement and color blending between the head and face meshes. This blending can be performed every time the head model is rendered, or a single time by caching the results. This sample supports both approaches.

The high-level process of this mapping technique includes:

Render the scanned face mesh using an orthographic projection matrix to create a displacement map and a color map.
Create a matrix to project positions on the head model onto the generated displacement and color maps. This projection matrix accounts for scaling and translations determined by face landmark data.
Render the head model using the projection matrix to map vertex positions to texture coordinates on the displacement and color maps.
Sample the generated maps to deform the vertices and color the pixels. The blending between the color map and the original head texture is controlled by an artist-created control map.
(Optional) Use the same displacement and blending methodologies to create a displaced mesh and single diffuse color texture that incorporates all blending effects.

Art Assets

The following art assets are used in this sample:

Head model. A head model the face is applied to. The model benefits from higher resolution in the facial area where the vertices are displaced.

Feature map. Texture mapped to the UVs of the head model that affects the brightness of the head.

Detail map. Repeated texture that applies additional detail to the feature map.

Color transfer map. Controls blending between two base skin tones. This allows different tones to be applied at different locations of the head. For example, the cheeks and ears can have a slightly different color than the rest of the face.

Control map. Controls blending between the displacement and color maps and existing head model data. Each channel of the control map has a separate purpose:

The red channel is the weight for vertex Z displacement. A weight of zero uses the vertex position of the head model, a weight of one modifies the Z vertex position based on the generated displacement map, and intermediate values result in a combination of the two.
The green channel is the weight for blending between the head diffuse color and the generated face color map. Zero is full head diffuse and one is full face color.
The blue channel is an optional channel that covers the jawbone. This can be used in conjunction with the green channel to allow a user’s jawbone color to be applied instead of the head model’s diffuse color. This might be useful in the case where the user has facial hair.

All maps are created in head model UV space.

Figure 3: Head model asset with a highly tesselated face. The scanned face will be extruded from the high resolution area.

Figure 4: Feature map (left) and detail map (right). The detail map is mapped with head model UVs but repeated several times to add detail.

Figure 5: Color transfer map (left) and the color transfer map being applied drawn on the head model (right). This map determines the weights of the two user-selected skin colors.

Figure 6: The control map (left) and the control map applied to the head model. The red channel is the area that should be affected by the displacement map. The green channel is the area that should receive the scanned color map. The blue channel represents the jawbone area. Using the color map in the jawbone area can be used to capture distinct jawbone features such as facial hair.

Displacement and Color Maps

The first step of the face mapping process is generating a displacement and color map based on the scanned face mesh. These maps are generated by rendering the face mesh using an orthographic projections matrix. This sample uses multiple render targets to generate the depth displacement map and the color map in a single draw call. It sets the projection matrix so that the face is fully contained within the viewport.

Figure 7: The displacement and color maps generated from the scanned face mesh.

Map Projection Matrix

Now we use the landmark data from the scanned face model and the head model to create a transformation matrix to convert from head model vertex coordinates in model space to texture coordinates in the displacement and color map space. We’ll call this the map projection matrix because it effectively projects the displacement maps onto the head model.

The map projection matrix consists of a translation and a scale transformation:

Scale transform. The scaling factor is calculated by the ratio of the distances between the eyes of the scanned face mesh (in projected map coordinates) and eyes of the head model.
Translation transform. The vertex translation is calculated using the head model and scanned face mesh landmark data. The translation makes the point directly between the eyes of the head model to align with the respective point on the displacement map. To calculate this respective point, we use the left and right eye landmarks to calculate the center point and then transform it by the orthographic projection matrix used when generating the displacement and color maps.
Rotation transform. This sample assumes that the scanned face mesh is axially aligned and does not require a rotation. The sample includes GUI controls for introducing rotation for artistic control.

Figure 8: Generated color map (left) being orthographically projected onto the head model. The map is translated and scaled so that the yellow anchor points between the eyes align.

Rendering

The sample applies the generated displacement and color maps at render time in vertex and pixel shaders.

Vertex Shader

The vertex shader displaces the model’s Z coordinate based on the displacement map. The displacement map texture coordinates are sent to the pixel shader where they’re used to sample the color map for blending.

The vertex shader steps include:

Transform vertex position by the map projection matrix to get color/displacement map texture coordinates.
Sample the displacement map texture at calculated coordinates.
Convert the displacement sample to a model space Z value. The range and scale of the displacement map are passed through a constant buffer.
Blend between the displaced Z value and the original Z value based on the control map’s red component. The control map lets the artist decide what vertices get displaced and allows for a gradual, smooth transition to the displaced position.
Pass the displacement map UV coordinates to the pixel shader to be used to sample the color map.

Pixel Shader

The pixel shader uses the control map’s green channel to blend between the head color and the generated color map texture. Because the sample allows the user to change the head color to better match the scanned face color, the color is blended into greyscale art assets in the pixel shader. The skin color is calculated for each pixel by blending between two user-selected colors based on the color transfer map. That skin color is multiplied by the greyscale intensity to produce a final head color.

Figure 9: Demonstration of head model blending without applying the displacement or color maps.

Figure 10: Final result composited in real-time using the sample.

Exporting

This technique applies several layers of blending inside the pixel shader as well as modifies the vertex position in the vertex shader each time the model is rendered. The sample also supports exporting the composited texture and the deformed mesh to an .OBJ file.

The entire compositing and deformation process still occurs on the GPU using a variation of the original shaders. The new vertex shader uses Direct3D* stream-output support to capture the deformed vertices. The vertex shader also uses the input texture coordinates as the output position; this effectively renders a new UV mapped texture.

Once composited, the model can be rendered with lower overhead and without any custom shaders, allowing it to easily be loaded by 3D modeling tools and game engines.

Figure 11: Exported .OBJ model (left), composited head texture UV mapped to the head model’s UVs.