Automatic speech recognition (ASR) systems have some level of pre-training where the system is taught to associate speech and words. The pre-training will be performed using speech from one or more speakers. Rarely, do the training conditions such as the speakers or the environment match the conditions under which the system is really used. Such speaker and environment mismatches between training and test degrade the ASR performance dramatically, making it difficult to deploy speech recognition technology in many applications.
It is possible to adapt a system to operate under different conditions by modelling a relationship between the training conditions and the “real-use” conditions. Such modelling has been achieved by compensating for the differences in the environments and separately compensating for the differences between the speech of the user and the speech used to pre-train the system.