This year’s first release (2016 R1, a.k.a. v8) of the Intel® RealSense™ SDK comes with a bundle of improvements, but I’m most interested in the JavaScript interface—specifically how I can quickly get and use 3D camera data in a web-based game. Bob Duffy wrote a solid article about this, but some things have changed and I want to use as little code as possible to try out this approach (to simplify refactoring proof-of-concept test code into larger systems).
In this article, I’ll tap the face tracking functionality to gather the bounding box and nose point data of a user, for potential control and/or in-game avatar application (for example, using the nose to target, the player’s head essentially becomes the joystick, or having the player’s character in-game reflect their movement capitalizes on the atypical interface to create an engaging environment). Fortunately, the SDK includes a selection of predefined algorithms available to simplify the process.
Getting Started
First we need to set up the following:
- Intel® RealSense™ camera – Install your camera’s most recent Intel® RealSense™ Depth Camera Manager (DCM).
- Intel RealSense SDK – Download the most recent version of the SDK.
- Web App Runtime – Download and install the most recent version (currently v8), and then restart the browser.
- File location – To fit the demo most easily, copy realsense.js from the SDK file tree to the desktop. With the current version’s default path, it’s in C:\Program Files (x86)\Intel\RSSDK\framework\common\JavaScript\
- Text editor – Pretty much any editor will work.
- File creation – Make a new html file at the same location as your realsense.js.
- Contents – Copy the sample code below into your file, or (preferably) write it by hand as we walk through it.
Sample Code
To illustrate the simplicity of our task, here’s the full source (aside from a disclaimer comment in the file itself):
<!doctype HTML><html><head><title>RS-JS Face Tracking</title><script type="text/javascript" src="https://autobahn.s3.amazonaws.com/autobahnjs/latest/autobahn.min.jgz"></script><script src="realsense.js"></script></head><body><canvas id="mainCanvas" width=600 height=400 style="border:1px solid black"></canvas><script> var can = document.getElementById("mainCanvas"); var ctx = can.getContext("2d"); ctx.textAlign = "center"; // Set text to be centered at draw point var scale = 500; // Scale nose point movement to significance var rsf = intel.realsense.face; var sm, fm, fc; // Sense Manager, Face Module, Face Config var onFaceData = function(sender,data) { if (data.faces.length>0) { //Ignore empty screens var face = data.faces[0]; // Use first face visible var rect = face.detection.boundingRect; // Get face bounding box var n = face.landmarks.points[29].world; // Get face landmark data var px = rect.x + rect.w/2 + n.x*scale; // Anchor to bounding box var py = rect.y + rect.h/2 - n.y*scale; // Invert y-axis shift ctx.clearRect(0,0,can.width,can.height); // Clear canvas each frame ctx.strokeRect(rect.x,rect.y,rect.w,rect.h); // Show bounding box ctx.fillText(Math.round(n.z*100),px,py); // draw z value at nose point } } intel.realsense.SenseManager.createInstance().then(function (instance){ sm = instance; return rsf.FaceModule.activate(sm); // Activate face module }).then(function (instance) { fm = instance; fm.onFrameProcessed = onFaceData; // Set face data handler return fm.createActiveConfiguration(); // Configure face tracking }).then(function (instance) { fc = instance; fc.detection.isEnabled = true; // Enable face detection return fc.applyChanges(); }).then(function (result) { fc.release() return sm.init(); // Sense manager initialization }).then(function (result) { return sm.streamFrames(); // Start Streaming }); window.onblur=function(){ // Pause face module when window loses focus if(fm != undefined){ fm.pause(true); } } window.onfocus=function(){ // Unpause face module when window regains focus if(fm != undefined){ sm.captureManager.device.restorePropertiesUponFocus(); fm.pause(false); } } window.beforeunload = function(){ // Release sense manager on window close if(sm != undefined){ sm.release().then(function(){ sm=fm=undefined; }); } } </script></body></html>
Depending on your experience and background, the code above may be easily readable or it might look like an overwhelming wall of code. If it makes sense to you, get started and have fun; the rest of this article will focus on digesting what we have here.
Functional Breakdown
HTML section
<!doctype HTML><html><head><title>RS-JS Face Tracking</title><script type="text/javascript" src="https://autobahn.s3.amazonaws.com/autobahnjs/latest/autobahn.min.jgz"></script><script src="realsense.js"></script></head><body><canvas id="mainCanvas" width=600 height=400 style="border:1px solid black"></canvas><script>… </script></body></html>
This is literally all the HTML code we have on the page. Inclusion of the autobahn script is vital for some under-the-hood legwork the SDK does to make the camera work in a browser, via the other inclusion—the realsense.js file copied earlier.
The only real element displayed on the page is the canvas, arbitrarily designated here to display the data as it’s captured. More information on the basics of this approach is available on IDZ.
The remainder of our code will go in the third script tag. Placing it in the head would be ideal, but it’s positioned at the bottom of the body to simplify some loading matters. Note: This (as well as the IDZ link above) is another example of “hackathon style” JavaScript, where the code’s form isn’t intended to survive the demo; once it’s tweaked to serve your purposes, harvest the functional bits for integration into a more scalable and proper structure.
Variables and Usage
var can = document.getElementById("mainCanvas"); var ctx = can.getContext("2d"); ctx.textAlign = "center"; // Set text to be centered at draw point var scale = 500; // Scale nose point movement to significance var rsf = intel.realsense.face; var sm, fm, fc; // Sense Manager, Face Module, Face Config var onFaceData = function(sender,data) { … }
The first few lines are preparation for our specific example, where we’ll use the aforementioned canvas; the scale variable is an arbitrary inclusion to make the user’s nose point movement more noticeable.
Another preferential convention: the use of short variable names keeps later code from exploding with long strings of namespaces and attributes.
The onFaceData function is where all our custom code will reside. We pass this function to the SDK so it knows how we want to handle the data—thus we call it a “handler” function. We’ll get to its content in a bit; first we need to get the camera sending data.
Intel® RealSense™ Camera Initialization
intel.realsense.SenseManager.createInstance().then(function (instance){ sm = instance; return rsf.FaceModule.activate(sm); // Activate face module }).then(function (instance) { fm = instance; fm.onFrameProcessed = onFaceData; // Set face data handler return fm.createActiveConfiguration(); // Configure face tracking }).then(function (instance) { fc = instance; fc.detection.isEnabled = true; // Enable face detection return fc.applyChanges(); }).then(function (result) { fc.release(); return sm.init(); // Sense manager initialization }).then(function (result) { return sm.streamFrames(); // Start Streaming });
It looks chaotic at first, but it’s really a nice series of sequential function calls.
- Create Sense Manager, saved as “sm”.
- Using sm, activate the Face Module, saved in “fm”.
- Set our data handler and use fm to create an active face configuration, “fc”.
- Activate face detection in fc, then release it (as we have no further configuration changes to make).
- We can then initialize sm and start capturing data.
This is the entire initialization process. For other modules or senses, the pattern stays generally the same.
Camera Handling
window.onblur=function(){ // Pause face module when window loses focus if(fm != undefined){ fm.pause(true); } } window.onfocus=function(){ // Unpause face module when window regains focus if(fm != undefined){ sm.captureManager.device.restorePropertiesUponFocus(); fm.pause(false); } } window.beforeunload = function(){ // Release sense manager on window close if(sm != undefined){ sm.release().then(function(){ sm=fm=undefined; }); } }
We need these additional functions to stop the camera when it’s not being used—specifically pausing the module when the window loses focus, unpausing when focus is regained, and releasing the resources if the window will be closed. With the boilerplate stuff out of the way, we can move on to our custom code.
Data Handling
The SDK attempts to recognize landmark points on a detected face, as illustrated in the image below. For this demo, we are interested in the landmark point at index 29—the tip of the nose.
Figure 1:Index of each landmark point displayed at its location on a face.
var onFaceData = function(sender,data) { if (data.faces.length>0) { //Ignore empty screens var face = data.faces[0]; // Use first face visible var rect = face.detection.boundingRect; // Get face bounding box var n = face.landmarks.points[29].world; // Get face landmark data var px = rect.x + rect.w/2 + n.x*scale; // Anchor to bounding box var py = rect.y + rect.h/2 - n.y*scale; // Invert y-axis shift ctx.clearRect(0,0,can.width,can.height); // Clear canvas each frame ctx.strokeRect(rect.x,rect.y,rect.w,rect.h); // Show bounding box ctx.fillText(Math.round(n.z*100),px,py); // draw z value at nose point } }
When we receive a frame of data, we check to make sure there is a face detected and lock on to the first one we see. This face data structure contains significantly more information than we’re looking for, so we just want to pull out the bounding box (“rect”) and the nose point (“n”).
The px and py variables make up the location I’ve chosen to represent the nose point in our space: based on the center point of the bounding box, the nose point’s displacement is scaled up and saved to its new x- and y-coordinates.
Those variables provide the location to draw our nose point’s “z value” – depth/distance from the screen (it is multiplied by 100 for readability, as the raw value is normalized to a decimal value between 0 and 1). This and the bounding box are drawn to the canvas after it is cleared.
Next Steps
Hopefully this demo gave you enough awareness of the system—and a useful starting point—for you to try it out and adapt it to your work. Give it some platform detection and error handling, change the structure so it can keep drawing without updated data, try using more of the landmarks, or do whatever you need to do to keep processing minimal.
This SDK can be used in many ways to augment input in apps and games. Remember to keep the user’s range of motion in mind to make it natural, their exertion of energy in mind to make it comfortable, and maintain frame rate to keep it smooth. Effective application throughout the user experience is vital for this kind of interface, which will only grow in utility and ubiquity.
References
Intel® RealSense™ SDK 2016 R1 Documentation: https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?doc_face_face_tracking_and_recognition.html
Intel® RealSense™ Depth Camera Manager (DCM): https://downloadcenter.intel.com/download/25044/Intel-RealSense-Depth-Camera-Manager-DCM-
Intel® RealSense™ SDK: https://software.intel.com/en-us/intel-realsense-sdk/download
RealSense - Your Face as a Game Controller using Javascript: https://software.intel.com/en-us/blogs/2014/12/08/realsense-your-face-as-a-controller-using-javascript?language=en
What's New - Intel RealSense SDK (Windows) v8 (2016 R1) release: https://software.intel.com/en-us/blogs/2016/01/28/new-realsense-v8
About the Author
Brad Hill is a software engineer at Intel in the Developer Relations Division. Brad investigates new technologies on Intel® hardware and shares the best methods with software developers through the Intel® Developer Zone and at developer conferences. His focus is on turning students and developers into game developers and helping them change the world.