The speech recognition system usually encounters various problems caused by the environments, such as background noise and the channel effect, or other factors of the speakers, such as the accent and the speaking rate, so that the input speech is beyond the recognition capability of the system. Prior researches proposed various improvements over the recognition capability, however, with only limited results.
U.S. Pat. No. 6,272,460, “Method for Implementing a Speech Verification System for Use in a Noisy Environment”, disclosed a system including a speech verifier in the front stage of the system. As shown in FIG. 1, a speech verifier 100 includes a noise suppressor 110, a pitch detector 120, and a confidence determiner 130. The object is to rid of noises and obtain the pitch. The pitch value is translated into a time-variant confidence index for determining whether the input signal at a certain time is a speech. The confidence index is transferred to the recognizer for assisting the recognition.
U.S. Pat. No. 6,272,461 emphasized the speech detection and the assistance in speech recognition of all the input signals regardless of whether the input signals are beyond the acceptable range.
The current speech recognition or dialog system does not have the capability for sensing the environment of the usage. This implies that the system will blindly try to recognize the speech and generate an output no matter how harsh the usage environment is and no matter how the task is beyond the system capability. As a result, the user may receive an erroneous answer. This not only wastes the system resource, but also leads to potentially severe outcomes.
Take the auto-attendant as an example. When the caller uses the extension number inquiry system from a noisy subway station or on the busy street, the environmental noise will affect the signal-to-noise ratio (SNR) so that the SNR is too low and beyond the system capability. The system will perform the speech recognition process and generates a wrong extension number. At the end, the caller will need to request a customer service representative for the assistance. This scenario shows the waste of system resource and the failure of saving the manpower.
On the other hand, if the system can determine whether the input signal is within the recognizable range before the system starts the actual recognition process, the recognizable signals can be passed for recognition while the unrecognizable signals can be responded with appropriate actions. In this manner, the possibility of successful speech recognition will increase.