The invention relates to interactive computer systems in which a user provides commands to a target computer program executing on the computer system by way of an input device. The input device may be, for example, a keyboard, a mouse device, or a speech recognizer. For each input device, an input signal generated by the input device is translated into a form usable by the target computer program.
An interactive computer system in which the user can provide commands by speaking the commands may consist of a processor executing a target computer program having commands identifying functions which can be performed by the target computer program. The computer system further includes a speech recognizer for recognizing the spoken commands and for outputting command signals corresponding to the recognized commands. The speech recognizer recognizes a spoken command by measuring the value of at least one feature of an utterance during each of a series of successive time intervals to produce a series of feature signals, comparing the measured featured signals to each of a plurality of acoustic command models to generate a match score for the utterance and each acoustic command model, and outputting a command signal corresponding to the command model having the best match score.
The set of utterance models and words represented by the utterance models which the speech recognizer can recognize is referred to as the system vocabulary. The system vocabulary is finite and may, for example, range from one utterance model to thousands of utterance models. Each utterance model may represent one word, or may represent a combination of two or more words spoken continuously (without a pause between the words).
The system vocabulary may contain, for example, utterance models of all of the commands to which the target computer program is capable of responding. However, as the number of utterance models increases, the time required to perform utterance recognition using the entire system vocabulary increases, and the recognition accuracy decreases.
Generally, a target computer program has a series of active states occurring over a series of time periods. For each active state, there may be a list of active state commands identifying functions which can be performed in the active state. The active state commands may be a small subset of the system vocabulary. The translation of an uttered command to a form usable by the target computer program in one state of the target computer program may be different from the translation of the same command in another state of the target computer program.
In order to improve the speed and accuracy of the speech recognizer, it is desirable to restrict the active vocabulary of utterance models which the speech recognizer can recognize in any given time period to the active state commands identifying functions which can be performed by the target computer program in that time period. To attempt to achieve this result, the speech recognizer may be provided with a finite state machine which duplicates the active states and transitions between active states of the target computer program.
In practice, it has been found impossible to build a finite state machine for the speech recognizer which exactly duplicates the active states and transitions between active states of the target computer program. The target computer program not only interacts with the user, but also interacts with data and other devices of the computer system whose states cannot be known in advance.
For example, a command to load a file will cause a computer program to make a transition to one state if the file exists, or to a different state if the file does not exist. However, the speech recognizer finite state machine must be built with some assumption that the file exists or does not exist. If a command to load a file is spoken to the computer program using the speech recognizer, then the speech recognizer finite state machine may or may not track the computer program state correctly, depending on whether that file exists or does not exist. If the speech recognizer finite state machine assumes that the file exists, but in fact the file does not exist, then the speech recognizer state machine will enter a state different from the state of the target computer program. As a result, the target computer program can no longer receive valid input from the speech recognizer.