Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Moo-O* Uses Intel® RealSense™ to Put Child in Stories for Beginning Readers

$
0
0

Download  Moo-O Case Study.pdf

With an eye toward the future—and toward understanding the huge transformative potential of facial analysis and gesture control technologies in education—Singapore-based EyePower Games created its web-based teaching and learning application Moo-O* using the Intel® RealSense™ SDK. Moo-O, which means “puppet” in Mandarin, is easy-to-use reading software. The application detects a child’s face, tracks its motion, and then uses this data to animate characters in a story. And children learn to read as they use Moo-O to collaboratively create stories in which they’re the on-screen stars.

Founded in 2004, EyePower Games decided early to focus on developing software for facial detection and tracking. After some initial research and development, the team stumbled upon OpenCV, an open-source computer vision library from Intel, which provided many useful functions, including face detection in a video frame.

“We started to adopt the library for our works,” said Benson Loo, EyePower CEO and cofounder. “At that time, OpenCV was still in its early beta stage. To optimize our software for speed, we wrote our software in C/C++ because video processing is always CPU intensive. While seeking an application for an early version of our software, we chose to focus on the education market, specifically the instruction of English as a second language (ESL).”  


Figure 1: Children use Moo-O* to collaboratively create stories in which they’re the on-screen stars; they also learn to read.

Using Face Detection and Gesture Recognition Code in the Intel® RealSense™ SDK

After a few years of work, EyePower eventually had a working face-detection application, though certain obstacles proved difficult to surmount. It turned out, for example, that OpenCV could only detect the position of the face in the video frame. The library could not provide information about the face’s yaw, pitch, and roll, so the on-screen effects could not be made to map precisely onto a user’s face.

“It took a few hundred milliseconds to perform face detection in a single video frame; we could only process a few frames per second on a standard CPU, which caused the video to be choppy during real-time tracking,” said Loo. “Our own custom algorithm solved that by combining OpenCV’s face-detection function with other real-time optical tracking techniques to track the face across the video frames once the initial face is detected.”

It was impressive work, and the algorithm worked well and fast enough for translation and for the roll movement (when the face rotates in the plane facing the camera) of the face. However, it would typically lose the tracking during pitch and yaw movements. As a result, a user’s face was often masked by the face of the animated character, which of course failed to achieve the experience Loo was trying to deliver—that of being transported into the on-screen story.  

The face-detection and tracking code in the Intel® RealSense™ SDK helped EyePower solve many of these problems. The PXC[M]FaceConfiguration interface in the SDK documentation tracks 78 facial landmark points supporting avatar creation, emotional recognition, and facial animation. It also detects head orientation along 3D axes—yaw, pitch, and roll—and thus can be used to control the side-to-side motions of a character. Indeed, one of the main pleasures of playing with Moo-O is not only seeing your face atop a character, but also bringing that character to life by wagging your head from side to side. The facial landmark points are also used to extract facial contours and calculate the transformation to map other graphics onto the face.

Thanh Hai Pham, cofounder and CTO of EyePower Games, recommends that developers looking to use the Intel RealSense SDK to create gesture-recognition games begin with the sample code in the SDK. They need to make sure it works correctly in their applications, and then slowly start to make changes in adapting the SDK code. “Test as you go,” said Pham. “That way, you're more likely to know when things break, so you can stop to make fixes before moving on and considering the bigger scheme of things in your development project.”  

To convert Intel RealSense SDK raw images into a Windows Presentation Foundation (WPF) bitmap image for rendering, EyePower used the following code, annotated here by Loo. 

//FusionEngine is our own image processing library.
FusionEngine engine = new FusionEngine();

//The main RealSense processing thread where we extract out the face image and face transformation from each video frame
while (!m_stop)
{
    if (pp.AcquireFrame(true) < pxcmStatus.PXCM_STATUS_NO_ERROR) break;

    //adding the face landmarks to our own array of face points structure
    var landmarks = face.QueryLandmarks();
    if (landmarks!=null)
    {
        PXCMFaceData.LandmarkPoint[] points;
        var res = landmarks.QueryPoints(out points);

        facePoints.Clear();
        for (int i = 0; i < points.Length; i++)
        {
            //Remember some special landmarks point (eyes, nose and chin)
            if (points[i].source.alias == PXCMFaceData.LandmarkType.LANDMARK_EYE_LEFT_CENTER)
                this.LeftEye = new FacePoint(points[i]);

	//Remember the right eye position
            if (points[i].source.alias == PXCMFaceData.LandmarkType.LANDMARK_EYE_RIGHT_CENTER)
                this.RightEye = new FacePoint(points[i]);

	//Remember the chin position
            if (points[i].source.alias == PXCMFaceData.LandmarkType.LANDMARK_CHIN)
                this.Chin = new FacePoint(points[i]);

	//Remember the tip of the nose.
            if (points[i].source.alias == PXCMFaceData.LandmarkType.LANDMARK_NOSE_TIP)
                this.NoseTip = new FacePoint(points[i]);

            facePoints.Add(new FacePoint(points[i]));
        }
    }

    //acquire video frame data from RealSense.
    status = sample.color.AcquireAccess(PXCMImage.Access.ACCESS_READ, PXCMImage.PixelFormat.PIXEL_FORMAT_RGB24, out imageData);
    if (status < pxcmStatus.PXCM_STATUS_NO_ERROR)
    {
        Trace.WriteLine("Failed to acquire color image");
        continue;
    }

    try
    {
   	//Convert the RealSense’s frame data to a raw byte array
        byte[] arr = imageData.ToByteArray(0, w * h * 3);

        sample.color.ReleaseAccess(imageData);

        ///Use our custom algorithm to extract just the face image out from the image data
        this.faceData = ExtractFace(arr);

        if (FrameAcquired!=null) //for
        {
            //Event callback for our application thread. Note that this is still in the thread of RealSense, so
            //we cannot update UI here.
            FrameAcquired(arr, null);
        }

        ... //code for acquiring hand position, gestures, etc.

“The data points and the depth of information that the Intel RealSense SDK provided was a fantastic benefit,” he said. “It actually gave us a look at what we might want to do with Moo-O in the future, as well. The Intel RealSense SDK would make it possible for us to implement a 3D version of Moo-O, which we are excited about and actively exploring.”

For now, when mapping the graphics onto the user’s head, which has six degrees of freedom in the video frame, Loo and his team just needed three landmark points for calculation: the user’s two eyes and chin.

EyePower worked to incorporate Intel RealSense SDK-enabled gesture recognition into a related product, Moo-O Goose Nursery Rhymes, which was unveiled in early March at the 2015 Mobile World Congress in Barcelona. In the new product, children can use hand gestures to trigger certain actions in the story’s characters. 


Figure 2: Benson Loo presenting Moo-O Goose Nursery Rhymes* at Mobile World Congress 2015.

“For example, in the ‘Hey Diddle Diddle’ nursery rhyme, opening and closing your hand will make the main character in the story shine brighter,” said Loo. “It thus provides an additional dimension of engagement and expression.”


Figure 3: Loo demonstrating how, by opening his/her hand, a user can make the main character shine brighter in EyePower’s rendition of the “Hey Diddle Diddle” nursery rhyme.

The capturing of a gesture by the Intel RealSense SDK and triggering a click-event on the UI is shown in the following code, also annotated by Loo:

public Dictionary<string, DateTime> GestureStartTime = new Dictionary<string,DateTime>();

/// <summary>
/// Whether a given gesture is on
/// </summary>
public Dictionary<string, bool> GesturesOn = new Dictionary<string, bool>();

//The main RealSense processing thread
//Inside this thread, we only set the flag on/off for each gesture first. The main application thread will
//check the flags and perform action accordingly.
while (!m_stop)
{
    //Get the fire gestures from RealSense SDK
    int gestures = handOutput.QueryFiredGesturesNumber();

    //Structure to query more data of each gesture.
    PXCMHandData.GestureData gestureData;

   //If there are some gestures fired
    if (gestures > 0)
    {
        for (int i = 0; i < gestures; i++)
        {
	//Get more data of the given gesture
            status = handOutput.QueryFiredGestureData(i, out gestureData);
            if (status == pxcmStatus.PXCM_STATUS_NO_ERROR)
            {
                PXCMHandData.IHand handData;
                status = handOutput.QueryHandDataById(gestureData.handId, out handData);
                if (status == pxcmStatus.PXCM_STATUS_NO_ERROR)
                {
		//Get the body side of the hand with the gesture
                    var side = handData.QueryBodySide();

                    //Invert the side returned by the SDK.
                    string sideName = side == PXCMHandData.BodySideType.BODY_SIDE_RIGHT ? "Left": "Right";
                    string name = gestureData.name + sideName;

                    if (gestureData.state == PXCMHandData.GestureStateType.GESTURE_STATE_START)
                    {
                        GestureStartTime[gestureData.name + sideName] = DateTime.Now;
                        GesturesOn[gestureData.name + sideName] = true;
                    }
                    else if (gestureData.state == PXCMHandData.GestureStateType.GESTURE_STATE_END)
                    {
                        GesturesOn[gestureData.name + sideName] = false;
                    }
                }
                else
                {
                    Trace.WriteLine("QueryHandDataById error " + status);
                }
            }
            else
            {
                Trace.WriteLine("QueryFiredGestureData error " + status);
            }
        }
    }


    this.dispatcher.Invoke(new Action(() =>
    {
        if (FrameUpdated != null)
        {

Loo and his team discovered some nuances in using the SDK, which they reported to Intel as part of the dialogue that takes place with many of Intel’s software-enabling efforts. In Loo’s experience, a key part of the success with the SDK was figuring out which gestures to include in the application, since response times for calibration and gesture detection varied for composite gestures, such as making a fist. A workaround, Loo said, was to build the application so that any sequence of gestures began with a simple initial state. For example, a user would have to first show the full palm before squeezing the hand to make a fist. (Note that the performance of gesture recognition and removal of false positives is expected to improve in future SDK releases.)

Moo-O also uses a separate thread for all Intel RealSense SDK processing and feeds the processed data to the application’s main thread, helping to simplify the integration effort so developers don’t have to make many changes to the structure of their application.

The flowchart in Figure 4 shows the order in which the Intel RealSense SDK processing thread executes, and it illustrates how the SDK fits into the main thread of the application.


Figure 4: The execution order of the Intel® RealSense™ SDK processing thread.

Picking the Right Platforms and Dealing with Privacy Issues

Versions of Moo-O are available on Windows*, Mac OS X*, and iOS*. The EyePower development team used the languages and frameworks expected on these platforms: C++ for video processing and C#/WPF for the UI (on Windows); Objective-C* and the Cocoa* framework (on Mac OS X and iOS). Incorporating the Intel RealSense SDK into the development flows for both platforms proved easy.

Loo noted that any developer who wants his or her software to be used in schools simply must have Windows and Mac OS X versions. “That covers 100 percent of the school market,” he said. “In Asia, nearly all of the schools are using Windows. In the United States, it’s mostly Windows with a small fraction of Mac users too, and many schools use machines that are several years old. The main issue with technology in schools is always cost.”

As a small company, EyePower sought opportunities to write cross-platform code and considered using game engines such as Cocos2D*, Unity*, and even the hybrid HTML5/native stack, PhoneGap*. However, Loo and his colleagues chose to write for each platform separately to leverage the video-processing capabilities that each OS offers. They made this choice because they needed to delve deeply into specific system libraries for processing video, as well as work with peripheral devices such as webcams and microphones, and these libraries usually have limited support in those cross-platform engines. “The bottom line,” said Loo, “is that it is sometimes hard to push the limits when writing cross-platform code.”

EyePower’s developers write unit tests for code that have complex logic involved, such as the libraries used for calling the web APIs for user authentication, uploading videos, and checking for updates. The team relies on manual testing of the UI and most new features. Even so, some issues do not surface through any of this in-house testing, such as conflicts with other third-party software (that is, codecs on Windows) or with certain hardware. For those issues, EyePower must rely on log files submitted by pilot users to detect possible causes.

So far, EyePower has mostly sold bulk Moo-O licenses to schools through a network of resellers. The company is now pondering how to move toward a traditional app-store-type model, selling directly to individuals, including teachers. This new approach will apply to both Moo-O and EyePower’s new Intel RealSense SDK-inspired product, which is still under wraps.

“We are still thinking about how to do this,” said Loo. “It’s uncommon for schools to have machines that can fully support [Intel] RealSense technology, especially cameras that can handle all the nuances of gesture recognition. Maybe we’ll buy the hardware and then loan it to a pilot school so they can try our software.”

Given that the game encourages the social sharing of the stories users create, Loo has been navigating varying attitudes and regulations about student privacy. As with nearly all apps associated with creative expression, Moo-O makes it easy for individual consumers using its software to publish their finished videos online and share them on social media channels. However, managing all the necessary permissions relating to sharing videos made by students in the classroom can be a challenge, particularly in the United States.

To address this issue, EyePower built its own portal for hosting videos on its website. Students uploading to the portal don’t have the option to share their work. Teachers are granted additional permissions and, with the appropriate consent given by parents and administrators, can make the videos public.

What’s Next for Moo-O

Loo travels widely to promote his company’s software, and his enthusiasm for the technology is quite visible. He declares that EyePower’s focus on teaching kids to read—ESL and otherwise—is a calling for him as he lists the new features and projects he is working on for 2015.

One feature is “remote reading.” It would allow, for example, a third grader living in Oregon to collaborate with her grandmother in California to create a story, provided that both users had computers that meet the minimum system requirements. After logging in, they could simply read and perform their parts of a story while in front of their webcams. Moo-O software will do all the necessary stitching together to create the final video.

“To achieve that, we implemented our own cloud storage platform to synchronize the project files across the users, something like Dropbox*,” said Loo. “In a collaborative project, each recorded video is time-stamped, uploaded to the server, and then automatically downloaded to other project members’ computers. So everyone on the project always has the latest recordings automatically.”


Figure 5: The Moo-O* remote reading feature, with help from teachers in Australia, Indonesia, the Netherlands, Singapore, and the United States.

Another new feature, which EyePower calls Moo-O Retold, allows students to create their own stories by swapping and reusing scenes, rewriting the words, and so on. This feature is available now and was developed based on feedback from teachers using Moo-O. It is intended to spur creativity by allowing more do-it-yourself freedoms to mash-up existing graphics and other story elements. Sampling, after all, isn’t just for those creating new music.

Additional Resources


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>