Speaker adaptation technology has been paid more and more attention to in recent years. In such a technology, speaker independent (SI) codebook is modified with specific speaker data for the purpose of acquiring speaker adapted (SA) codebook to improve recognition performance.
In the case that there are enough training data for a certain speaker, speaker dependent (SD) codebook may be acquired by using a traditional training method for the current speaker data. Since speech characteristics of the current speaker may be well reflected by the SD codebook, good performances may usually be realised. However, in some cases, the speaker data are not enough to train a robust SD model, such that the speaker adaption is required to avoid deficient training situations. Compared to the large amount of data needed by the SD codebook for training, only a small amount of data are required by the speaker adaption to achieve an improved performance.
In speaker adaptation, adaptive data are used to adjust the SI codebook to be accorded with the characteristics of the current speaker. Since the SI codebook acquired by the traditional training method is inevitably affected by the training set characteristics, adaptive effect may be less pronounced when the training set is mismatched to the adaptive data. The more the original codebook is speaker-independent, the more quickly the current speaker characteristics are approached by the speaker adaptation. Different models are built for SI codebook and every speaker characteristic in the training set by codebook training in combination with the speaker adaption respectively, thus acquiring a SI codebook with improved speaker independence.
Currently the speaker adaption is mainly realised in two manners. The first manner is the speaker adaptation based on a characteristic layer. The main idea is to construct a transformation method by using the characteristic parameters of speech signals. Speaker-related characteristics are transformed into speaker independent characteristics, and then the speaker independent characteristics are input into the speaker independent model for recognition, thus realising speaker adaptation. The second manner is the speaker adaptation based on a model layer. The speech data of the speaker are used to adjust the speaker independent model, and different acoustical models are acquired by adapting from different speakers and used for recognition, thus realising the speaker adaption.
However, procedure of the speaker adaption described above is complex and usually needs decoding in twice, and the procedure requires a relative long time, thus having a low efficiency. Moreover, a contradiction between a limited speech data and a large number of parameters required by the adaption may result in a poor performance.