Speaker-independent continuous speech recognition is ideal for man/machine communication. However, the state-the-art modeling techniques still limit the decoding accuracy of such systems. An inherent difficulty in statistical modeling of speaker-independent continuous speech is that the spectral variations of each phone unit come not only from allophone contextual dependency, but also from the acoustic and phonologic characteristics of individual speakers. These o speaker variation factors make the speaker-independent models less effective than speaker-dependent ones in recognizing individual speakers' speech.
In order to improve speaker-independent continuous speech recognition, it is of great interest to incorporate efficient learning mechanisms into speech recognizers, so that speaker adaptation can be accomplished while a user uses the recognizer and so that decoding accuracy can be gradually improved to that of speaker-independent recognizers.
In the parent application, of which this application is a continuation-in-part, a speaker adaptation technique based on the decomposition of spectral variation sources is disclosed. The technique has achieved significant error reductions for a speaker-independent continuous speech recognition system, where the adaptation requires short calibration speech from both the training and test speakers. The current work extends this adaptation technique into the paradigm of self-learning adaptation, i.e. no adaptation speech is explicitly required from the speaker, and the spectral characteristics of a speaker are learned via statistical methods from the incoming speech utterances of the speaker during his normal usage of the recognizer.