N/A
N/A
Speech recognition systems provide computers with the ability to listen to user speech and determine what is said. Accordingly, these systems have the potential to revolutionize the way in which humans interact with computers. Current technology does not generally support unconstrained speech recognition, which would include the ability to listen to any speech in any context and transcribe it accurately. To achieve reasonable recognition accuracy and response time, current speech recognizers constrain what they listen for by using xe2x80x9crecognition grammarsxe2x80x9d, which are often simply referred to as xe2x80x9cgrammarsxe2x80x9d. Grammars define the words that may be spoken by a user and the patterns in which they must be spoken.
The two major types of grammars are rule grammars, which are also known as command and control grammars or regular grammars, and dictation grammars, which are also known as statistical language models. A rule grammar specifies a set of commands a user might say. For example, a simple window control rule grammar might enable a user to speak such utterances as xe2x80x9copen a filexe2x80x9d, xe2x80x9cclose the windowxe2x80x9d, and similar commands. A dictation grammar imposes fewer restrictions on what can be said, and accordingly is typically more complex than a rule grammar. Dictation grammars are typically used for free text entry in applications such as electronic mail and word processing.
Existing speech application development systems are often based on complex software architectures and Application Programming Interfaces (APIs) which require highly-specialized programmers, and which also make integration of speech input with traditional user interface modalities, such as a mouse or a keyboard, difficult and unreliable. In many existing systems, a single grammar or small set of grammars is defined for all parts of all applications. There is a strong tendency for such monolithic grammars to become large. As a result, they are difficult to update because of unexpected interactions, and also difficult to pass from one developer to another. Furthermore, because the speech design process is separate from other parts of the development process, the two efforts may easily lose synchronization, causing unexpected errors that may be difficult to trace. Accordingly, there is a need for a system for incorporating speech recognition into application and/or system service computer programs which does not require or rely on development and maintenance of a single, monolithic recognition grammar. The system should be relatively easy to use and maintain, while still allowing for sophisticated and complex user inputs.
In accordance with principles of the invention, a system and method for incorporating speech recognition into a computer program is disclosed. The disclosed system includes a number of speech controller modules corresponding to program components within the computer program. Each speech controller module supports a speech recognition grammar having at least one rule, where the speech recognition grammar provides an interface to operations on the corresponding program component. The rules of the speech recognition grammar associate spoken commands with data and state stored in the corresponding program component, and with the functional capabilities of the program component.
Individual rules may include speakable tokens, which are portions of speech that are recognizable by a software or hardware speech recognizer. A rule may also include a reference to another rule local to the same recognition grammar, or to a rule in a different speech recognition grammar. Where the reference is to a rule in a different speech controller module, there is said to be a xe2x80x9clinkxe2x80x9d to the other speech controller module based on this reference. In this way, the disclosed system allows rules from the same or different grammars to be combined together, in order to build complex grammars that combine the functionality of multiple components.
Each speech controller module may further include a list of references to rules in grammars stored in one or more other speech controller modules. Through this feature, rules defined by other speech controller modules may be conveniently referenced, as if they were local rules, by a rule in the speech controller module in which they are listed.
In one embodiment, each speech controller module operates to dynamically enable one or more rules defined in its grammar loaded into the speech recognizer, in response to detecting the occurrence of an associated enabling condition. The speech recognizer, for example, activates specific enabled rules or grammars that have been loaded into the speech recognizer in response to predetermined events. The speech controller module receives a recognition result from the speech recognizer indicating that the speech recognizer has detected one or more tokens associated with an enabled rule. In response to receipt of the recognition result, a speech controller module operates to invoke a method on data within the corresponding program component, and passes the result on to other speech controller modules that are linked to the recognition rule corresponding to the result.
In another embodiment, a command from the computer program, modification of data, and/or change of state in the corresponding program component, may cause a speech controller module to modify a grammar it contains dynamically either in part or in whole. For example, a grammar may be modified to associate one or more speakable tokens with data stored within the corresponding program component in response to such a command.
Thus there is provided a system for incorporating speech recognition into computer programs which does not require or rely on development and maintenance of a single, monolithic recognition grammar. The disclosed system is relatively easy to use and maintain, and allows for sophisticated and complex user inputs.