1. Field of the Invention
In general, the present invention is directed to speech recognition systems. In particular, the present invention is directed to automatic detection of speech recognition errors.
2. Discussion of the Related Art
Methods for automatic speech recognition are often utilized in speech recognition systems. Applications of speech recognition systems are, for example, dictating systems or automatically operating telephone exchanges.
It is especially critical in speech recognition that the correct expressions of the correct speaker are recognized. This is problematical insofar as an ambient noise in which clear speech constituents are contained can be interpreted such by a speech recognition system as though they derived from the speaker of the speech actually to be recognized. In order to prevent a mix-up, a method is herewith disclosed for distinguishing the correct form the incorrect spoken language. In particular, the level of the speaker whose speech is to be recognized is usually clearly higher than speech from the unwanted noise, which usually comes from the background. The volume level of the speaker whose speech is to be recognized can thus be used to distinguish this from the background noise.
Given previously known methods for the automatic recognition of speech recognition errors are frequently caused by unwanted noises. A distinction is made between two types of unwanted noises, namely the speech of another speaker that is in fact usually correctly recognized but that is not to be assigned to the voice signal of the actual speaker and a back-ground noise not representing a voice signal such as, breathing sounds, that is incorrectly recognized as speech. The unwanted noises represent a considerable source of error in the automatic recognition of speech.
In order to avoid such errors, speech recognition systems are trained to the speech of the individual speakers, so that the speech recognition system can determine whether the acoustic signal derives from the speaker or is a background noise. Speech recognition systems having frequently changing speakers cannot be trained for every individual speaker. Given a speech recognition system integrated in a telephone system, thus, it is impossible to carry out a training phase lasting a number of minutes for every caller before the caller can speak his message, which often lasts only a fraction of a minute.