The present invention relates to the field of voice or speech recognition and more specifically to speech recognition in a network.
Speech recognition is the process by which an audio input signal is received and the verbal content of the input signal is determined. The verbal content is then further processed to obtain the desired action. The verbal content can be a transcription of the speech of the input or merely a general statement of content. Additionally, the speech recognizer itself can be located at the user terminal, as in U.S. Pat. No. 5,111,501 or in the network as in U.S. Pat. No. 4,922,519.
Since error exists in the determination of verbal content, systems have been established whereby the user is asked to repeat the request if the speech recognizer is unsure of the content. In this regard, a study of customer interaction with speech recognizers is reported in xe2x80x9cServing Customers With Automatic Speech Recognitionxe2x80x94Human-Factors Issuesxe2x80x9d by Wattenbarger et al, ATandT Tech. J., May/June 1993, pp. 28-41.
The present invention is directed to a speech recognizer system for use with a network comprising a first terminal from which an input signal is generated onto the network, speech recognizer means for estimating verbal content of the input signal, feedback means for directing an output signal comprising one or more approximations to the first terminal and confirmation means including selection means at the first terminal for confirming the correct approximation.
Preferably, the speech recognition means creates output signals which are digital to increase transmission speed. It is also preferred that the first terminal have a visual display on which to display the estimate, either one approximation at a time, all at once or a number therebetween. Of course, an audio feedback of the estimate may be preferred in such situations as a car phone where the user cannot easily and safely view a visual display or at a terminal without a visual display.
The present invention further includes a method for speech recognition including the steps comprising placing an input signal onto a network, estimating the verbal content of the input signal on the network and transmitting the estimate back to the first terminal for confirmation. Of course, if the speech recognizer is certain of the speech content from the input signal, feedback of an estimate to the first terminal need not be performed.
This invention also provides for error reduction in speech recognition systems comprising the steps of providing an estimate of the verbal content of an input signal to the user and receiving confirmation of a correct approximation from the user.