(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of speech recognition computer applications, and more specifically, to a system for improving the command recognition accuracy of a speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which acoustic signals, received via a microphone, are converted into words by a computer. Once recognized, the words may be used in a variety of computer software applications for purposes such as document preparation, data entry and command and control. Speech recognition is generally a difficult problem due to the wide variety of pronunciations, accents and speech characteristics of individual speakers. Consequently, sets of constraints are used to make decisions about the words a user spoke.
Typical speech dictation recognition systems use two sets of constraints, namely, an acoustic model and a language model. The acoustic model considers the sounds that make up the words and the language model considers the grammatical context in which the words are used. These models are often used to help reduce the search space of possible words and to resolve ambiguities as between similar sounding words. Such models tend to be statistically-based systems and can be provided in a variety of forms. The simplest language model, for example, can be specified as a finite state network, where the permissible words following each word are given explicitly. However, more sophisticated language models have also been developed which are specified in terms of a context-specified grammar.
When using a speech recognition system to control system and software application operation and navigation, a set of commands is formulated for specific tasks and functions. Each command is typically one or two words or a short phase representing a common expression for performing a given operation. Typical speech command recognition systems can have a large number of such commands. So that the speaker does not have to memorize the precise phrasing of the commands, sophisticated systems also recognize alternate expressions having the same meaning as a known command. Typically, language models, as used for dictation recognition, are employed to constrain the spoken commands syntactically.
However, because the commands, and their synonymous counter-parts, are typically one or two words, syntax language models are often ineffective. Thus, conventional speech recognition systems rely heavily on acoustic models to select one of a set of commands, and as a result, they have difficultly recognizing the spoken commands. For example, if the spoken command sounds similar to other commands, the command recognition system may execute an unintended command, or the recognition system may not execute any command at all. In either case, the speaker will have to re-dictate the command or enter it with another input device.
Accordingly, it would be desirable to provide a system for improving the recognition accuracy of spoken commands for controlling system and application operation.
The present inventors have determined that the context in which a spoken command is executed can be utilized as a surrogate for the language models used for dictation recognition. In particular, event-based data structures, indicative of the context in which the command is given, are used as constraints in the recognition process. Thus, the present invention provides a system for improving command recognition accuracy of speech recognition systems.
Specifically, the present invention operates in a computer system for speech recognition operating at various states and running a program to perform various events. The method and system is performed by monitoring the events and states and receiving a processed command corresponding to a spoken command. The processed command is analyzed according to one or more acoustic models to identify a probable acoustic match. Likewise, the command is analyzed according to at least one of the events and states to identify a probable context match. Based upon the probable acoustic and context matches, the system provides a recognized command.
The present invention provides the object and advantage of accurately recognizing spoken system and application control commands. The present invention provides accurate speech command recognition even if the spoken command is only a single word.
The states and events can include system control activity, active applications, prior commands and an event queue. Thus, the present invention provides an additional object and advantage in that the one or more context constraining parameters exist on, or can be performed by, the computer system without the need for the speech recognition system to supply additional data and utilize storage space or computer memory.
The analysis of the system states and events to identify the probable context match can be accomplished using a statistical modeling technique. And, past events and states can be used to modify the statistical model. Thus, the present invention affords the further object and advantage of providing a statistical model tailored to the command choices of a given speaker or set of speakers using the system.
These and other objects, advantages and aspects of the invention will become apparent from the following description. In the description, reference is made to the accompanying drawings which form a part hereof, and in which there is shown a preferred embodiment of the invention. Such embodiment does not necessarily represent the full scope of the invention and reference is made therefore, to the claims herein for interpreting the scope of the invention.