Speech recognition (“SR”), or automatic speech recognition (“ASR”), involves a computerized process that identifies spoken words. There are many uses for speech recognition, including speech transcription, speech translation, ability to control devices and software applications by voice, call routing systems, voice search of the Internet, etc. Speech recognition systems can optionally be paired with spoken language understanding systems to extract meaning and/or commands to execute when interacting with systems.
Speech recognition systems are highly complex and operate by matching an acoustic signature of an utterance with acoustic signatures of words. This matching can optionally be in combination with a statistical language model. Thus, both acoustic modeling and language modeling are used in the speech recognition process. Acoustic models can be created from audio recordings of spoken utterances as well as associated transcriptions. The acoustic model then defines statistical representations of individual sounds for corresponding words. A speech recognition system uses the acoustic model to identify a sequence of sounds, while the speech recognition system uses the statistical language model to identify possible word sequences from the identified sounds.
Speech recognition providing voice-activated or voice command functionality enables speakers to control devices and systems by speaking various instructions. For example, a speaker can utter a command to execute a specific task or utter a query to retrieve specific results. Spoken input can follow a rigid set of phrases that perform specific tasks, or spoken input can be natural language, which is interpreted by a natural language unit of a speech recognition system. Voice command functionality is becoming increasingly popular on portable devices, especially battery-powered portable devices such as cell phones, laptops, and tablet computers.
Generally, a real-time/front-end speech recognition system works such that text results are inserted into an edit control and command results are executed. If there are recognition errors, applications may provide functionality to edit the text, select recognition alternatives from a list, and/or undo text changes or command effects. In other words, the recognition errors are first applied, and then corrected/remedied depending on application-provided functionality. This may be particularly problematic in case of applications that do not provide high usability in text editing (especially with respect to correction) and commands whose effects are not easily undone. Some prominent examples include misrecognition of a voice command as text or text as voice commands and confusion of different voice commands.