(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of embedded speech recognition systems and more particularly to detecting speech recognition errors in an embedded speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Speech recognition systems programmed or trained to the diction and inflection of a single person can successfully recognize the vast majority of words spoken by that person.
In operation, speech recognition systems can model and classify acoustic signals to form acoustic models, which are representations of basic linguistic units referred to as phonemes. Upon receipt of the acoustic signal, the speech recognition system can analyze the acoustic signal, identify a series of acoustic models within the acoustic signal and derive a list of potential word candidates for the given series of acoustic models. Subsequently, the speech recognition system can contextually analyze the potential word candidates using a language model as a guide.
The task of the language model is to express restrictions imposed on the manner in which words can be combined to form sentences. The language model can express the likelihood of a word appearing immediately adjacent to another word or words. Language models used within speech recognition systems typically are statistical models. Examples of well-known language models suitable for use in speech recognition systems include uniform language models, finite state language models, grammar based language models, and m-gram language models.
Notably, the accuracy of a speech recognition system can improve as word combination statistics collected in a language model are refined during the operation of the speech recognition system. That is, the speech recognition system can observe speech dictation as it occurs and can modify the language model statistics in the language model as correct combinations of words are observed. In consequence, when a misrecognition occurs, it is important to update the language model in order to properly reflect an accurate combination of words as specified by the user. In order to update the language model, however, generally the user first must inform the speech recognition system that a misrecognition has occurred.
In an embedded computer system, typical personal computing peripherals such as a keyboard, mouse, display and graphical user interface (GUI) often do not exist. As such, the lack of a conventional mechanism for interacting with a user can inhibit effective user interaction with an embedded computer system. This problem can become exacerbated where a speech recognition system is an operational component of an embedded computer system. In particular, without an effective mechanism for notifying a speech recognition system when a misrecognition has occurred, the speech recognition system cannot appropriately update the corresponding speech recognition system language model so as to reduce future misrecognitions.
An embedded speech recognition system in accordance with the inventive arrangements can include an embedded computer system; a speech recognition system configured for operation in the embedded computer system; a speech-enabled application for processing text converted in the speech recognition system; and, misrecognition error logic for notifying the speech recognition system when a misrecognition error has occurred. The embedded speech recognition system can further include an activatable error notification button coupled to the embedded computer system, the button triggering the misrecognition error logic when activated. The embedded computer system in the embedded speech recognition system can include a central processing unit (CPU); memory; audio circuitry; and, an audio input device. An audio output device optionally can be included. In addition, the embedded speech recognition system can further include at least one speech recognition language model stored in the memory.
A method for processing a misrecognition error in an embedded speech recognition system during a speech recognition session can include speech-to-text converting audio input in the embedded speech recognition system based on an active language model, the speech-to-text conversion producing speech recognized text; presenting the speech recognized text through a user interface; detecting a user-initiated misrecognition error notification; and, responsive to detecting the error notification, providing the audio input and a reference to the active language model to a speech recognition system training process associated with the embedded speech recognition system.
Importantly, the detecting step can include receiving a hardware-generated notification caused by the activation of an error notification button. Alternatively, the detecting step can include receiving a software-generated notification caused by the receipt of a error notification speech command. An exemplary error notification speech command can include, xe2x80x9cRecognition Errorxe2x80x9d or xe2x80x9cMisrecognitionxe2x80x9d. Finally, the providing step can include storing the audio input; storing a reference to the active language model; and, providing the stored audio input and reference to the training process subsequent to the speech recognition session.