Speaker dependent systems, such as speaker verification and speaker-dependent speech recognition, are trained by a specific user who will be using the system. During the training process, speech models are created. These systems are usually capable of achieving a relatively high rate of recognition. The rate of recognition is determined according to the number of incidences of accepting a spoken word that should have been rejected or rejecting a spoken word that should have been accepted. However, over time the voice of the user changes and therefore the rate of recognition of the system may then decrease below an acceptable level.
Speaker adaptation refers to the process of adapting speaker-dependent speech models obtained by the user so that they more accurately model the changes in the user's voice. Two types of models that may be used in speaker verification and speech recognition systems: stochastic models such as the Hidden Markov Model (HMM) and template models, such as dynamic time warping (DTW).
In the HMM method, the continuous changes in the user's voice may be taken into consideration to adapt the HMM speech models using maximum a-posteliori (MAP) adaptation. In contrast, conventional DTW processes do not perform adaptation of the DTW speech models (reference templates) due to the non-statistical nature of the DTW method.
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.