1. Technical Field
The present disclosure relates to speech processing and more specifically to combining speech recognition models for a specific domain in place of creating a new model for the specific domain.
2. Introduction
When recognizing speech, speech recognition models help to narrow the focus to a particular speech recognition domain. Different domains help a speech recognizer to deal with specific types of statements, a specific vocabulary, and so forth. In a perfect world where unlimited storage, bandwidth, processing power, time, and other resources are available, a speech recognizer would have access to a customized speech recognition model for every possible domain of interest in order to achieve optimal speech recognition accuracy for that domain. However, this approach requires domain-specific data, which is usually unavailable or very expensive to collect, and a staggering amount of computing resources. Speech recognition with close, but related speech recognition models can provide some useful results, but are not optimal because of differences between the actual domain of the speech and the domain used to recognize the speech.
Further, if a sufficient amount of domain-specific data is available, then a domain-specific model can be built. But often an insufficient amount of domain-specific data is available or is too expensive to gather or produce. The challenge is to provide a customized model with as little domain-specific data as possible. Speech recognition models have been merged in the past, but have been merged inflexibly for only a single application or speech recognition domain.