“Flashcard” Speech Recognition App Using Intel® RealSense™ SDK

Abstract

The Flashcard code sample demonstrates some of the speech recognition features in the Intel® RealSense™ SDK for Windows*. The SDK includes speech modules for integrating dictation and verbal command control in your applications. These two modes of operation provide the following:

Dictation - The SDK module returns the user’s dictated sentence.
Command and Control - The application defines a list of words as the command list and the SDK module recognizes speech based solely on the command list.

The flashcard app uses the Command and Control mode to accept verbal input from the user. It does not demonstrate any Dictation features. The app displays simple multiplication problems and matches the user’s spoken response to the correct answer.

Introduction

This code sample demonstrates the basics of using the Command and Control speech recognition capabilities of the SDK. The app displays randomly generated multiplication problems and waits for verbal input from the user.

Figure 1: The Flashcard sample recognizes spoken numbers as input

If the user says the correct answer as shown in Figure 1, the app responds by displaying the user’s answer in green and indicating “Correct!” on the screen. After a short delay, the app displays another randomly created multiplication problem and awaits a response from the user.

Figure 2: Incorrect answers are displayed in red

If the user says the incorrect answer as shown in Figure 2, the app responds by displaying the user’s answer in red and shows the correct answer on the screen.

Purpose

The purpose of this code sample is to distill the complexities of the SDK down to the basics of using the speech recognition module and present this information in a simple use case scenario.

Development Environment

The sample app can be built using Microsoft Visual Studio* Express 2013 for Windows Desktop or the professional versions of Visual Studio 2013.

Configuring the Speech Recognition Module

A method named ConfigureRealSense() is called on startup to prepare the app for accepting speech commands from the user. This method performs the following actions:

Instantiates session and audio source objects
Selects the audio device
Sets the audio recording volume
Creates a speech recognition instance
Initializes the speech recognition module
Builds and sets the active grammar
Displays device information

The sample app selects the first audio device (index 0) from the audio source device list; however, the SDK provides a mechanism to scan and enumerate audio devices on the computer to allow a user to select the desired input device. This technique is shown in the SDK documentation.

In the sample app the recording volume is set to a fixed value, but in a full-featured app it is recommended to provide a control for setting this parameter and give visual feedback indicating if the user's volume is adjusted adequately.

Handling Speech Recognition Events

An OnRecognition() event handler is implemented to capture data from the speech recognition module when active recognition results are available. The RecognitionData structure passed to the handler describes details of the recognition event (e.g., confidence, sentence, etc.)

The sample app uses a fixed threshold for evaluating the confidence level returned by the speech recognition module; however, the SDK documentation suggests that you “use thresholding to increase or decrease certain aspect of voice recognition. For example, your application may expose a graphical user interface control to let the user adjust what is the acceptable recognition rate. The application can use 50% as the baseline.”

Setting the Active Grammar

When using the Command and Control mode, the speech recognition module uses a list of commands (referred to as the “grammar”) and ignores any words or phrases not contained in the list. The commands can be loaded using either the BuildGrammarFromStringList() method to define the list programmatically or the BuildGrammarFromFile() method to read the grammar from a Java* Speech Grammar Format (JSGF) file. We use the latter method so that we can take advantage of a shorthand for our grammar, and not have to enter all possible answer numbers as distinct strings.

The Flashcard app uses the SDK’s BuildGrammarFromFile() method to open the grammar.jsgf file and build its grammar using the file’s contents. (For more information on the JSGF file format refer to http://www.w3.org/TR/jsgf/). The contents of grammar.jsgf are shown in the following table.

#JSGF V1.0;
grammar Digits;
public <Digits> = ( <digit> ) + ;<digit> = ( zero | one | two | three | four | five | six | seven | eight | nine | ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen | seventeen | eighteen | nineteen | twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety );

The notation used in this code sample is similar to examples shown in the SDK’s documentation (RSSDK_DIR\doc\PDF\sdkmanuals.pdf), and you are encouraged to review this for a more thorough explanation of the different formatting options that are available. The <digit> rule identifies the grammar that will be used by the speech recognition module. The “+” sign signifies that whatever comes before it should occur one or more times. This format permits not only single words like “four” to be recognized, but also accepts phrases like “forty four”.

Check It Out

Download the app and learn more about how speech recognition works in the Intel RealSense SDK for Windows.

About Intel® RealSense™ Technology

To get started and learn more about the Intel RealSense SDK for Windows, go to https://software.intel.com/en-us/intel-realsense-sdk

About the Author

Bryan Brown is a software applications engineer in the Developer Relations Division at Intel.

“Flashcard” Speech Recognition App Using Intel® RealSense™ SDK

Abstract

Introduction

Purpose

Development Environment

Configuring the Speech Recognition Module

Handling Speech Recognition Events

Setting the Active Grammar

Check It Out

About Intel® RealSense™ Technology

About the Author

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112