In speech recognition systems articulated sounds or utterances, respectively, are converted in written language by interpreting a respective speech signal. Misinterpretations which are usually referred to as recognition errors frequently occur with state-of-the-art speech recognition systems when used in a noisy environment. Ambient noise superimposing an input speech signal either modifies the characteristic of the input signal or may mistakenly be interpreted as a phoneme by a speech recogniser.
In order to detect if misrecognitions occur, so called confidence measures are used. A confidence measure judges the reliability with which a word or sub-word corresponds to a particular part of a signal. The word or sub-word is then accepted or rejected by the recognition process on the base of the confidence measure calculated for it.
As many different expressions sound very similar, there are often several alternatives possible for interpreting a certain utterance. To decide for one in particular, a confidence measure is e.g. calculated as the likelihood with which a certain expression corresponds to a respective utterance. This is usually accomplished by some form of special statistical hypothesis testing. These processes are usually very complicated, particularly as a phoneme can undergo certain acoustic variations under the influence of neighbouring phonemes, an effect which is known as coarticulation.
But also non-speech events, like the above mentioned ambient noise superimposing a speech signal result in an acoustic variation of the speech signal. A correct identification of the word or sub-word being the speech signal's written equivalent is therefore an elaborate task which is yet not been brought to a satisfactory solution.
It is therefore an object of the present invention to propose a system for improving the detection of recognition errors in a speech recognition system.