1. Technical Field
This invention relates to the field of embedded speech recognition systems and more particularly to processing speech recognition errors in an embedded speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Speech recognition systems programmed or trained to the diction and inflection of a single person can successfully recognize the vast majority of words spoken by that person.
In operation, speech recognition systems can model and classify acoustic signals to form acoustic models, which are representations of basic linguistic units referred to as phonemes. Upon receipt of the acoustic signal, the speech recognition system can analyze the acoustic signal, identify a series of acoustic models within the acoustic signal and derive a list of potential word candidates for the given series of acoustic models. Subsequently, the speech recognition system can contextually analyze the potential word candidates using a language model as a guide.
The task of the language model is to express restrictions imposed on the manner in which words can be combined to form sentences. The language model can express the likelihood of a word appearing immediately adjacent to another word or words. Language models used within speech recognition systems typically are statistical models. Examples of well-known language models suitable for use in speech recognition systems include uniform language models, finite state language models, grammar based language models, and m-gram language models.
Notably, the accuracy of a speech recognition system can improve as the acoustic models for a particular speaker are refined during the operation of the speech recognition system. That is, the speech recognition system can observe speech dictation as it occurs and can modify the acoustic model accordingly. Typically, an acoustic model can be modified when a speech recognition training program analyzes both a known word and the recorded audio of a spoken version of the word. In this way, the speech training program can associate particular acoustic waveforms with corresponding phonemes contained within the spoken word.
In a traditional computing systems in which speech recognition can be performed, extensive training programs can be used to modify acoustic models during the operation of speech recognition systems. Though time consuming, such training programs can be performed efficiently given the widely available user interface peripherals which can facilitate a user's interaction with the training program. In an embedded computing device, however, typical personal computing peripherals such as a keyboard, mouse, display and graphical user interface (GUI) often do not exist. As such, the lack of a conventional mechanism for interacting with a user can inhibit the effective training of a speech recognition system because such training can become tedious given the limited ability to interact with the embedded system. Yet, without an effective mechanism for training the acoustic model of the speech recognition system, when a speech recognition error has occurred, the speech recognition system cannot appropriately update the corresponding speech recognition system language model so as to reduce future instances of future misrecognitions.