Automatic speech recognition (ASR) often requires large datasets of well-maintained and annotated utterances to train the models used to accurately identify the words that users speak. Gathering such large datasets is often time consuming, and maintaining such datasets requires large expenditures of computer storage space.
Further, individual ASR systems are frequently trained for a single domain (such as a given user's voice, a given compression codec, a given microphone setup, a certain environment, etc.), and new models need to be trained for each domain to accurately interpret speech received within that domain. This requires the gathering and storage of ever more and ever larger datasets to accurately create models for use in identifying speech in different domains, causing longer training times for the machine-learning programs used for speech-recognition.
Additionally, creating a different model for each different domain requires the ability to prepare for many domains, such as thousands or millions of domains when considering the diversity of individuals, environments where the individuals speak, accents, etc. What is needed is the ability to leverage existing, reliable domains to train and operate other domains.