Shwetha Doss, Senior Application Engineer, Intel Corporation
Harshit Shrivastava, Founder and CEO, Intugine Technologies
Abstract
Intel® RealSense™ technology helps developers enable a natural user interface (NUI) for their gesture recognition platforms. The gesture recognition platform seamlessly integrates with Intel RealSense technology for NUI across segments of applications on Microsoft Windows* platforms. The gesture recognition platform handles all interactions with the user and the Intel® RealSense™ SDK, ensuring that no code changes are required for individual applications.
This paper highlights how Intugine (http://www.intugine.com/) enabled its gesture recognition platforms for Intel® RealSense™ technology. It also discusses how the same methodology can be applied to other applications related to games and productivity applications.
Introduction
Intel® RealSense™ technology adds “human-like” senses to computing devices. Intel® is working with OEMs to create future computing devices that will be able to hear, see, and feel the environment, as well as understand human emotion and a human’s sensitivity to context. These devices will interact with humans in immersive, natural, and intuitive ways.
Intel® RealSense™ technology understands four important modes of communication: hands, the face, speech, and the environment around you. This multi-modal processing will enable the devices to behave more like humans.
The Intel® RealSense™ Camera
The Intel® RealSense™ camera uses depth-sensing technology so that computing devices see more like you do. To harness the possibilities of the Intel® RealSense™ technology, developers need to use the Intel® RealSense™ SDK along with the Intel® RealSense™ camera. There are two camera options: theF200 and the R200. These Intel-developed depth cameras support full VGA depth resolution, full 1080p RGB resolution, and require USB 3.0. Both cameras support depth and IR processing at 640×480 resolution at 60 frames per second (FPS).
There are many OEM devices with integrated Intel® RealSense™ cameras available, including Ultrabooks*, tablets, notebooks, 2 in1s, and all-in-one form factors.
Figure 1. Intel® RealSense™ cameras.
Figure 2. The Intel® RealSense™ camera (F200).
The infrared (IR) laser projector on the Intel RealSense camera (F200) sends non-visible patterns (coded light) onto the object. The IR camera captures the reflected patterns. These patterns are processed by the ASIC, which assigns depth values to each pixel to create a depth video frame.
Applications see both depth and color video streams. The ASIC syncs depth with color stream (texture mapping) using a UVC time stamp and generates data flags for each depth value (valid, invalid, or motion detected.) The range of the F200 camera is about 120 cm.
Figure 3. The Intel® RealSense™ camera (R200).
The R200 camera actually has three cameras providing RGB (color) and stereoscopic IR to produce depth. With the help of a laser projector, the camera does 3D scanning for scene perception and enhanced photography. The inside range is approximately 0.5–3.5 meters, and the outside range is up to 10 meters.
Intel® RealSense™ SDK
The Intel® RealSense™ SDK includes a set of pattern detection and recognition algorithm implementations exposed through standardized interfaces. These algorithms implementations enable the application developer’s focus to move from coding the algorithm details to innovating on the usage of these algorithms.
Intel® RealSense™ SDK Architecture
The SDK library architecture consists of several components. The essence of the SDK functionalities lays in the I/O modules and the algorithm modules. The I/O modules retrieve input from the input device or send output to an output device.
The algorithm module includes various pattern detection and recognition algorithms related to face recognition, gesture recognition, and speech recognition.
Figure 4. The Intel® RealSense™ SDK architecture.
Figure 5. The Intel® RealSense™ SDK provides 78-point face landmarks.
Figure 6. The Intel® RealSense™ SDK provides skeletal tracking.
Intugine Nimble*
Intugine Nimble* is a high-accuracy, motion-sensing wearable device. The setup consists of a USB sensor and two wearable devices: a ring and a finger clip. The sensor tracks the movement of rings in 3D space with sub-millimeter accuracy and low latency. The device works on computer vision, where the rings do a certain patterned emission in a narrow nanometer bandwidth, and the sensor is coupled to see only that wavelength. The software algorithm sitting on the host device recognizes the emitted pattern and tracks the rings individually. The software generates the coordinates of the rings at a high frame rate of over 60 coordinates per second, for each ring.
Figure 7. The Intugine Nimble* effectively replaces the mouse and keyboard.
I.
Applications With Nimble
Some of the available applications that Nimble can control are games such as Fruit Ninja*, Angry Birds*, and Counter-Strike* and utility applications such as Microsoft PowerPoint* and media players. These available applications are currently controlled by mouse and keyboard inputs. To control them with Nimble, we need to generate the keyboard and mouse events programmatically.
The software module that takes care of the keyboard and mouse events is called the interaction layer. Nimble uses a proprietary software interaction layer to interact with existing games and applications. The interaction layer maps the user’s fingertip coordinates to the application/OS recognizable mouse and keyboard events.
Nimble with the Intel® RealSense™ SDK
The Intel® RealSense™ SDK can detect IR emissions of 860 nm. The patterned emission of Nimble rings can be customized to a certain wavelength range. Replacing the emission source in the ring by an 860 nm emitter, the ring emits similar patterns in the 860 nm range. The Intel® RealSense™ SDK can sense these emissions, which can be taken as an image stream and then tracked using the SDK. By implementing Nimble pattern recognition and tracking algorithms in the Intel® RealSense™ SDK, we get the coordinates of individual rings at 60 FPS.
Intel® RealSense™ SDK’s design avoids most of lens and curvature defects, which allows a better scaled motion tracking of Nimble rings. The IR resolution of 640×480 generates refined spatial coordinate information. The Intel® RealSense™ SDK supports up to 300 FPS in the IR stream, which provides almost zero latency in Nimble’s tracking and provides an extremely responsive experience.
Nimble technology is designed to track only the emissions of rings and thus misses the details of skeletal tracking that might be required for a few applications.
Figure 8. The Intugine Nimble* along with Intel® RealSense™ technology.
Value proposition for Intel® RealSense™ Technology
Nimble along with Intel® RealSense™ technology can support a wide range of existing applications. Currently over 100 applications are working seamlessly without needing any source-code modifications. And potentially most of the Microsoft* Windows and Android* applications can work with this solution.
Currently the Intel® RealSense™ camera (F200) supports a range of 120 cm. With the addition of Nimble, this range can extend to over 15 feet.
Nimble allows sub-millimeter accurate finger tracking within a range of 3 feet and sub-centimeter accurate tracking within a range of 15 feet. This enables many high-accuracy games and applications to be used with better control.
Nimble along with Intel® RealSense™ technology reduces the application latency to less than 5 milliseconds.
Nimble along with Intel® RealSense™ technology can support multiple rings together; we have tested up to eight rings with Intel® RealSense™ technology.
Summary
Nimble’s interaction layer along with Intel® RealSense™ technology can help add gesture support to any application without any changes to the source code. Using this technology, applications in Windows* and Android* platforms can add gesture support with minimal efforts.
For More Information