1. Technical Field
The present application relates to speech recognition and more specifically to personalized speech recognition.
2. Introduction
Speech recognition applications rely on speech recognition models. Often, a generic speech model is used to recognize speech from multiple users. However, a single canonical model that represents all speakers generically is not well suited to many individuals in a given population. Individual speakers diverge from such a generic speech model in subtle and not so subtle ways. Thus, one possible approach is complete personalization, or providing a personal speech recognition model for each speaker. However, this solution has several flaws. Complete personalization for every speaker uses prohibitive resources, such as processing power, storage, and bandwidth. Further, not every speaker requires a personal speech recognition model. A generic model may adequately serve many speakers who are close to the generic model, so it is wasteful to provide a personal speech recognition model for those speakers. Another problem with personal speech recognition models for all users is that each model must be generated by a training phase with each individual speaker. For these reasons, speech recognition personalization is typically either viable with only a very small user base or not viable at all.