The present invention relates to speech recognition systems, and, more particularly, to an acoustic speech recognizer system and method.
Speech recognition systems are known which allow vocal inputs to supplement or supplant other methods for inputting data and information, for example, to computer systems. One such system is the Bell Labs Acoustic Speech Recognizer (BLASR), available from LUCENT TECHNOLOGIES, INC., which may be used to implement an Internet and/or World Wide Web browser responsive to vocal commands, as described in commonly-assigned, U.S. patent application Ser. No. 09/168,405 of Michael Brown et al., entitled WEB-BASED PLATFORM FOR INTERACTIVE VOICE RESPONSE (IVR), filed Oct. 6, 1998, which is incorporated herein by reference.
However, speech recognition systems with barge-in capabilities mix different speech during barge-in, which badgers a speech recognition server with meaningless speech packets, and so increases the processing load of the client.
An acoustic speech recognizer system integrates a barge-in detector with an adaptive speech endpoint detector for detecting endpoints; that is, the initiation and termination of speech, to permit barge-in regardless of the intensity of conflicting output speech, by using continuously adapted barge-in thresholds. Advantageously, badgering of the speech processors is avoided. The adaptive speech endpointer detector is used in speech recognition applications, such as telephone-based Internet browsers, to determine barge-in events during the processing of speech. Continuous operation may also be performed by the adaptive speech endpoint detector to implement a voice activated web browser without the need for extraneous commands such as a push-to-talk command.
More specifically, the endpointer system includes a signal energy level estimator for estimating signal levels in speech data; a noise energy level estimator for estimating noise levels in the speech data; and a barge-in detector for increasing a threshold used in comparing the signal levels and the noise levels to detect the barge-in event in the speech data corresponding to a speech prompt during speech recognition.