The present invention relates to text generating systems that generate text from a voice input and more particularly to a text generating system that generates text from a voice input and also accepts voice generated control commands with the ability to distinguish between the input and the commands.
Voice controlled text generating systems generally include a microphone for detecting speech signals. A speech signal processor transforms the detected speech signals into a representation for recognition by a processor (e.g. short term spectral cross-sections). A speech signal processor transmits the processed signals to a speech event analyzer which generates a set of recognition candidates in response to each detected speech event. A recognition candidate is a vocabulary item which is stored in a system memory (not shown) that is similar to the detected speech event which represents the spoken word or phrase. The system creates a candidate set that includes all of the recognition candidates, or in other words, the candidate set includes all known vocabulary items which are sufficiently similar to the detected speech event that the speech event analyzer 16 decides that there is a high degree of probability that the speech event is an instance of the vocabulary item represented by the recognition candidate.
In order to enable the system to choose the most appropriate candidate, the system assigns a recognition score to each candidate. The recognition score indicates the likelihood that the speech event is an instance of the candidate, and after processing is complete, the recognition candidate with the highest recognition score is designated the "Best Match". The system then selects the "Best Match" as the candidate representing the chosen vocabulary item.
After the best match candidate has been selected, the system translates the candidate and transmits the translated candidate to the application. In other words, the translation is the input to the application that has been designated as the input to be sent when the candidate is chosen as best match for a particular state of the recognizer. As a result, in theory, there is a specified translation for each combination of best match vocabulary item and recognizer state. Often, of course, a translation is simply the spelling of the best match word or phrase, but it can be any legal input to the application.
In addition to including the capability of accepting voice input and deriving the corresponding text to that voice input, it is also desirable to be able to control the system through the use of voice commands. In such a system, the voice commands actuate assigned tasks in response to the voice commands. This is especially important for a system designed for use by handicapped individuals or for a system designed for use by an individual who does not have free use of his/her hands because the hands are occupied with another task during use of the system. Moreover, when a text generating system is used for dictation, the person dictating usually can not efficiently use a keyboard, and voice operated commands would greatly facilitate use of the system.
Known systems treat verbal input as typed input, or in other words, convert the speech into keystrokes. A speaker, however, supplies input to the system in word units, and verbal commands are more easily understood in terms of word units rather than characters. For this reason, known systems do not make effective usage of vocal commands, especially commands involving backtracking through documents.
Another problem with known systems is that they assume that each input character is correct, and as a result the systems do not efficiently correct mistakenly translated verbal input.
It is therefore a principal object of the present invention to provide a system and method for generating text from a voice input that organizes and records information about system state and verbal and non-verbal input.
Another object of the present invention is to provide a system and method for generating text from a voice input that reliably and effectively implements system functions which make it possible for the user to inform the system of misrecognitions; for the system to undo the effects of said misrecognitions; for the user to control the application by referring directly to earlier events in the dictation process; and for the system to control and modify the recognition of speech, including the ability to learn from earlier misrecognitions.
A still further object of the present invention is to provide a system and method for generating text from a voice input that organizes the process of speech dictation to computerized systems, and the response of those systems, into similar structures which can be used to effectively control and modify system operation.
A further object of the present invention is to provide a system and method for generating text from a voice input that groups and organizes the various inputs in a manner that facilitates retrieval of any input.