1. Technical Field
The present disclosure relates to automatic speech recognition and more specifically to personalization of acoustic models for automatic speech recognition.
2. Introduction
Speech recognition applications rely on speech recognition models. Often, a generic speech model is used to recognize speech from multiple users. However, a single canonical model that represents all speakers generically is not well suited to many individuals in a given population. Individual speakers diverge from such a generic speech model in subtle and not so subtle ways. Thus, one possible approach is complete personalization, or providing a personal speech recognition model for each speaker. Current automatic speech recognition systems often associate the personalized speech model with a specific device, such as a cellular phone. For example, an automatic speech recognition system can identify the cellular phone by using caller ID, and then retrieve the personalized speech model associated with that cellular phone.
However, in reality the device, such as the cellular phone, is shared among family members and friends. Additionally, people use such device in a variety of different ways in many varying environments resulting in a wide variety of background noise types and usage modes. For example, the microphone can be near the mouth with dominant mouth noises, or far from the mouth with dominant background noises. In such varying environments, oftentimes the personalized speech model can perform much worse than generic speaker independent model.
Furthermore, current acoustic model adaptation algorithms assume consistent conditions in the adaptation data as well as the recognition data. The data collected from a communication device varies over time, but generally the conditions that affect speech recognition include: different user, how close or far the mouth is from the communication device, background noise, and different recording venue such as office, home, or vehicle. Conventional adaptation approaches do not work in such situations as the dominant condition would dominate the adaptation resulting in an inappropriate model for many conditions.