Transforming the UI—Designing Tomorrow’s Interface Today (5 of 5):
Natural Interaction with Intuitive Computer Control
By Dominic Milano
Human-like senses on devices with Intel® RealSense™ technology are broadening perceptual computing. Explore apps that will let you navigate through virtual worlds with gestures and words.
Advances in perceptual computing are bringing human-like senses to devices. Able to see and hear the world around them, devices equipped with Intel® RealSense™ technology recognize hand and finger gestures, analyze facial expressions, understand and synthesize spoken words, and more. As a result, people are enjoying more natural interactivity—pushing, pulling, lifting, and grabbing virtual objects; starting, stopping, and pausing content creation and entertainment apps; and navigating virtual worlds without touching their devices.
In this article, the last in a five-part series, the Virtual Air Guitar Company (VAGC) and Intel software developers share insights on using Intel RealSense technology to move human-computer interaction (HCI) beyond the keyboard and mouse.
Virtual Air Guitar Company
Founded in 2006 by computer vision and virtual reality researchers based in Espoo, Finland, VAGC specializes in creating unique motion games and applications that utilize full-body actions and precise fingertip control. The indie studio’s portfolio includes console and PC games as well as Windows* and Android*perceptual computing applications.
For Aki Kanerva, lead designer and founder, the natural interaction (NI) models enabled by Intel RealSense technology make computer use easier and more enjoyable. Although most of us learned to use computers that relied on traditional keyboard and mouse or trackpad-based interaction, touch screens opened a world of new possibilities. “Give a touch-screen device to a two-year-old and they’ll be able to use it almost immediately,” Kanerva said. “There’s no learning curve to speak of, which makes devices more fun to use. And even for serious tasks, you can work more intuitively.”
Whether you’re using a mouse, trackpad, or touch screen, the interaction takes place in two dimensions. Natural interaction based on gestures or voice commands provides a greater degree of freedom. When gesturing with your hands, you’re in 3D space, which gives six degrees of freedom and enables new use cases. “Let’s say you’re cooking or using a public terminal and don’t want to touch the device. Gesture and voice commands provide touch-free interaction,” said Kanerva.
Natural interaction can also be used to perform complex tasks, eliminating the need to memorize keyboard shortcuts. Or as Kanerva put it, “Natural interaction enables complex controls without being complicated.”
Simple Isn’t Easy
Kanerva spent years researching human-computer interaction models and knows from experience how difficult it can be to distill complex tasks and make them appear easy to do. The mouse and keyboard act as middlemen and translate user intention. In a game, for example, moving a mouse left or right could translate to directing a character’s movement. “It takes practice to correctly make that motion.” Kanerva said. “With natural interaction, you’re removing the middlemen. With five or six degrees of freedom, tracking the moves you make with your hands—up, down, left, right, forward, back—translates to character motion on the screen in a more intuitive manner.”
“A ‘natural interface’ can combine all six degrees of movement in such a way that the user never has to think about individual commands,” he continued. Creating a UI that responds to such a complex set of user actions is no simple feat. And Kanerva advises against designing UIs capable of responding to every possible motion.
The key is to create experiences in which the interface lets the users know what they can do before they do it. “Limit interaction to only the things that are necessary for your application, and focus your coding efforts on making those things feel responsive.”
Working with Intel, VAGC is developing an app that will let users fly a virtual helicopter through any location (Figure 1) using hand gestures tracked with an Intel®RealSense™ 3D camera. “By design, our helicopter cannot do barrel rolls—rotate around its axis. That could be fun, but adding a control for doing such a roll would have overlapped with the other controls, so we didn’t include that capability.”
Figure 1: The Helicopter’s Point of View
Anatomy of the Gesture-controlled App
The Web-based app will be for Microsoft Internet Explorer, Google Chrome, and Mozilla Firefox on Windows*. The app uses a browser extension to provide localization services, was written in JavaScript and HTML, and runs in the browser. Its code is reliable and lightweight.
“This project has a lot of moving parts,” Kanerva explained. “The Web itself is a moving target, especially when it comes to extensions because we needed to write a different extension for each supported browser.”
Intel supplied VAGC with invaluable feedback and VAGC reciprocated by demonstrating real use cases that helped the Intel team refine the Intel RealSense SDK. “One of the early challenges with the SDK was that it had difficulty seeing a flat hand held vertically,” Kanerva explained (see Figure 2). “You cannot prevent a user from doing that.” Thanks to VAGC’s input, the latest version of the SDK supports that condition.
Figure 2: Camera view of vertical flat hand
Testing, Testing, Testing
Previous motion games by VAGC employed a proprietary automated test suite, but Kanerva prefers a different tact when testing gesture interaction. “With natural interaction, users have a knack for doing the unexpected, so we hire a usability testing firm.” Kanerva said that the best results come from shooting video of experienced and inexperienced users playing through the flight.
Tutorials, in spite of being expensive to produce, are an essential ingredient for NI projects. “Because the range of possible movements is so broad, you need very clear tutorials or animations that give users a starting point,” Kanerva said. “Put hand here. Hold in position 30cm (12 inches) from the screen. Now do this... You must be very specific.” Kanerva advises developers to plan tutorials early in the design process. “It’s tempting to leave tutorials to the last minute, but they are critical. Even if you need to change them several times throughout your development workflow, tutorials should be an integral part of your process.”
Lessons Learned
Summarizing his experience with the Intel RealSense SDK and Intel RealSense3D camera, Kanerva offered these rules of thumb for developers:
- Design applications that don’t lend themselves to—or aren’t even possible with— traditional input modalities.
- Emulating traditional input is a license for disaster.
- Design natural interactions specific to your use case.
- For control devices in flight simulator designs, arrow keys aren’t as effective as joysticks, and gesture control is an effective alternative.
- Try to be continuous and minimize control latency.
- Don’t wait long or expect a gesture to be complete before translating motion into your experience.
- Provide immediate feedback—user input should be reflected on screen at all times.
- Remember not to fatigue your users.
For game developers, continuous feedback is key—games typically use button pushes to produce actions. Mapping traditional game input to natural interaction creates too much latency and requires a steep learning curve.
Regarding fatigue, Kanerva counsels that you design controls that don’t require a user to be still for a long time. “I like the term ‘motion control,’ because it’s a reminder that you want users to move around so they’re not getting tired. We don’t ask users to rotate their wrist at right angles. For example, to fly forward, they simply point the hand straight while keeping it relaxed. Avoid strain and remind users it’s their responsibility to take breaks.”
Natural Entertainment
“Natural interaction is great for entertainment apps,” Kanerva concluded. His company got its name from their first app, which let users play air-guitar chords and solos using nothing more than hand gestures. “Hollywood often depicts people using gestures to browse data. It looks cool and there’s a wow factor, but simplicity driven by natural interaction will make computing accessible to a wider user base.”
To that end, the idea of having Intel RealSense technology embedded in tablets, 2 in 1s, and all-in-one devices thrills Kanerva. “Ubiquitous access to natural interaction through Intel RealSense technology will be great for marketing and sales, but it will be priceless to developers.”
Driving Windows with Hand Gestures
Yinon Oshrat spent seven years at Omek Studio, the first developer to use the nascent Intel RealSense SDK, before being purchased by Intel. As a member of the Intel RealSense software development team, Oshrat has been working on a standalone application called Intel® RealSense™ Navigator. “Intel RealSense Navigator controls the Windows UI, enhancing current touch-based experiences and enabling hand gesture-based interaction.” The app allows users to launch programs and scroll through selections using hand gestures (Figure 3).
Figure 3: Natural human-computer interaction
Intel RealSense Navigator relies on the Intel® RealSense™ 3D Camera to track hand gestures at distances up to 60 cm (24 inches). “Think of Navigator as a driver and a mouse,” Oshrat said. “It’s an active controller. To use it, you simply enable it for zooming, mouse simulation, and so on.”
Building Blocks for Developers
In creating Intel RealSense Navigator, the Intel team has been providing building blocks for the Intel RealSense SDK that will give developers the ability to implement the same experiences that Navigator enables in other standalone Windows applications.
Like other modules of the Intel RealSense SDK, the Intel RealSense Navigator module is in both C and C++. JavaScript support is planned for a future release.
Describing Intel RealSense Navigator gesture support, Oshrat said, “Think of it as a language. Tap to select. Pinch to grab. Move left or right to scroll. Move hand forward or back to zoom in or out. A hand wave returns users to the Start Screen.” Intel RealSense Navigator ships with video tutorials that demonstrate how to make accurate hand gestures. “The videos are much more effective than written documentation.”
Inventing a Language
For more than a year, Oshrat and his colleagues designed and tested gestures by literally approaching people on the street and watching them react to a particular motion. “We worked in cycles, experimenting with what worked, what users’ expectations were versus what we thought they’d be. We discovered what felt natural and what didn’t.”
For example, people interpreted three “swipes” in a row as a “wave” and not a swipe. “That taught us to separate those gestures,” Oshrat said. It surprised him that defining effective gestures was so difficult. “It’s not programming. It’s working with user experiences.”
Lessons Learned
For developers interested in implementing gesture control in their applications using Intel RealSense Navigator, Oshrat offered this advice:
- Utilize the gesture models and building blocks supplied with the SDK; they will save you time by jump-starting your project with ideas that have been carefully vetted.
- Take advantage of Intel support channels such as user forums.
Like Aki Kanerva, Oshrat advises developers to think of gesture-based interaction as something completely different from mouse and keyboard-based interactions. “Don’t try to convert a mouse/keyboard experience to gestures. Movement takes place at a completely different speed when you’re gesturing versus using a mouse.”
Oshrat also noted that user behavior changes based on input modality. “We built a game where users had to run and jump to the side,” he explained. “On screen, a sign directed users to GO LEFT. When using the game controller, users noticed signs and instructions in the area surrounding the avatar. But when their body was used as a controller (gesture control), they were so focused on the avatar that they ignored everything else in the game. They even ran head-on into a sign despite the fact that instructions filled 90 percent of the screen!”
Current and Future Use Cases
For Oshrat, one of the more practical applications of Intel RealSense Navigator that he saw involved controlling a Power Point* presentation without a mouse or physical controller. “You can just wave your hand and change slides back and forth.” It’s also easy—and handy—to control a computer screen when you’re on the phone.
Asked whether gestures could be useful in controlling a video-editing application, Oshrat said, “Yes! That’s a holy grail. Many people want to do that. We’ve been working with games and gesture control of other applications for eight years. We’re heading in the right direction for video editing and similar experiences.”
What’s next? Oshrat envisions a future in which users no longer have to learn a new gesture vocabulary. “Our devices will know us better and understand more about what we want. Gestures, voice commands, facial analysis, 3D capture and share... all of the capabilities enabled by Intel RealSense technology will open even more exciting possibilities.”
Resources
Explore Intel RealSense technology further, learn about Intel RealSense SDK for Windows, and download a Developer Kit here.
Is your project ready to demonstrate? Join the Intel® Software Innovator Program. It supports developers who have forward-looking projects and provides speakership and demo opportunities.
Read part 1, part 2, part 3, and part 4 of this “Transforming the UI—Designing Tomorrow’s Interface Today” series.