1. Technical Field
This invention relates to the field of computer speech recognition and more particularly to a method and system for correcting incorrectly recognized voice commands.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry and command and control.
With regard to command recognition, in the simplest possible command and control grammar, each function that the system can perform has no more than one speech phrase associated with it. At the other extreme is a command and control system based on natural language understanding (NLU). In an NLU system, the user can express commands using natural language, thereby providing total linguistic flexibility in command expression. Current command and control systems are beyond the simple one-function-one-speech-phrase grammar and are beginning to incorporate NLU.
Speech recognition systems, including NLU systems, have a difficult time accurately recognizing all the words spoken by a user. Speech recognition systems may incorrectly recognize words due to the dictation techniques and wide variety of pronunciations, accents and divergent speech characteristics of each individual speaker. For example, the speaker may speak very rapidly or softly, slur words or mumble. When transcribing speech dictation, this may result in: spoken words being converted into different words ("hold" recognized as "old"); improperly conjoined spoken words ("to the" recognized as "tooth"); and spoken words recognized as homonyms ("boar" instead "bore"). However, when controlling and navigating through speech-enabled applications by voice, incorrect recognition or non-recognition typically results in the execution of unintended commands or no command at all.
To rectify incorrectly recognized voice commands, conventional speech recognition systems include a user-initiated interface or window containing a list of possible commands. The list may be a listing of the entire speech command vocabulary, or a partial listing constrained by acoustic, language or context modeling techniques known in the art. The constrained lists are much more user friendly, since the speaker does not have to read through a lengthy list to find an intended command. These constrained lists can be generated, for example, by executing an algorithm, as is known in the art, one much like a spell checking program in word processing applications, to search a command grammar for words with similar characteristics as the incorrectly recognized words. Once the list is generated, the user may select the intended command by voice or input device. Alternatively, the user may key in the desired command in a text field within the user interface.
These command-listing methods can be effective for standard speech recognition systems, both informing the speaker of available and likely voice commands, as well as providing a simple means of executing the command. However, for NLU systems and more sophisticated systems that can recognize many hundreds of commands, command listing is impractical and cumbersome. For example, an NLU system may recognize spoken commands such as: "open the file please."; would you kindly get the file for me?; "hey, computer, open the filet"; and "I want to see the contents of the file." As can be seen, the phrasing can vary greatly and the numerous possible commands for performing each desired function would be too great to list.
Accordingly, there is a need to provide a quick and simple method of correcting incorrectly recognized voice commands in speech recognition systems, and for natural language understanding systems in particular.