1. Technical Field
The invention relates to the field of automatic speech recognition and more particularly to generating a reference adapted to an individual speaker.
2. Discussion of the Prior Art
During automatic speech recognition a spoken utterance is analyzed and compared with one or more already existing references. If the spoken utterance matches an existing reference, a corresponding recognition result is output. The recognition result can e.g. be a pointer which identifies the existing reference which is matched by the spoken utterance.
The references used for automatic speech recognition can be both speaker dependent and speaker independent. Speaker independent references can e.g. be created by averaging utterances of a large number of different speakers in a training process. Speaker dependent references for an individual speaker, i.e., references which are personalized in accordance with an individual speaker's speaking habit, can be obtained by means of an individual training process. In order to keep the effort for the training of speaker dependent references low, it is preferable to use a single word spoken in isolation for each speaker dependent reference to be trained. The fact that the training utterances are spoken in isolation leads to problems for connected word recognition because fluently spoken utterances differ from utterances spoken in isolation due to coarticulation effects. These coarticulation effects deteriorate the accuracy of automatic speech recognition if speaker dependent references which were trained in isolation are used for recognition of connected words. Moreover, even if connected words have been trained, a user's voice may change, e.g. due to different health conditions, which also deteriorates the accuracy of automatic speech recognition which is based on speaker dependent references. The accuracy of automatic speech recognition is generally even lower if speaker independent references are used, especially when the utterances are spoken in a heavy dialect or with a foreign accent. The accuracy of automatic speech recognition is also influenced by the speaker's acoustic environment, e.g. the presence of background noise or the use of a so-called hands free set.
In order to improve the recognition results of automatic speech recognition, speaker adaptation is used. Speaker adaptation allows to incorporate individual speaker characteristics in both speaker dependent and speaker independent references. A method and a device for continuously updating existing references is known from WO 95/09416. The method and the device described in WO 95/09416 allow to adapt existing references to changes in a speaker's voice and to changing background noise. An adaptation of an existing reference in accordance with a spoken utterance takes place each time a recognition result which corresponds to an existing reference is obtained, i.e., each time a spoken utterance is recognized.
It has been found that speaker adaptation of existing references generally improves the accuracy of automatic speech recognition. However, the accuracy of automatic speech recognition using continuously adapted references generally shows fluctuations. This means that the recognition accuracy does not continuously improve with each adaptation process. To the contrary, the recognition accuracy may also temporarily decrease.
There is, therefore, a need for a method and a device for generating an adapted reference for automatic speech recognition which is less prone to a deterioration of the recognition accuracy.