Intelligent automated assistants (or virtual assistants) provide an intuitive interface between users and electronic devices. These assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can access the services of an electronic device by providing a spoken user input in natural language form to a virtual assistant associated with the electronic device. The virtual assistant can perform natural language processing on the spoken user input to infer the user's intent and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more functions of the electronic device, and a relevant output can be returned to the user in natural language form.
In support of virtual assistants, speech-to-text transcription (e.g., dictation), and other speech applications, automatic speech recognition (ASR) systems are used to interpret user speech. These recognizers are expected to handle a wide variety of speech input, including a variety of different types of spoken requests for virtual assistants. Examples include speech and spoken requests related to web searches, knowledge questions, sending text messages, posting to social media networks, and the like. In addition, it is desirable that virtual assistants be sympathetic and fun to talk with, which can depend on having relevant and current knowledge.
Virtual assistant and speech transcription services, however, can become outdated as relevant language and knowledge changes. ASR systems and natural language understanding (NLU) systems can work well for predetermined training language, but ASR systems can have limited and relatively static vocabularies while NLU systems can be limited by expected word patterns. These systems can thus be ill-equipped to handle new names, words, phrases, requests, and the like as they are encountered or to handle fluctuations in popular terms, and updating the systems to accommodate changing language can be tedious and slow. As such, system utility can be impaired, and the user experience can suffer as a result.
Accordingly, without identifying and accommodating changes in relevant names, words, phrases, requests, and the like, speech recognizers can suffer poor recognition accuracy, which can limit speech recognition utility and negatively impact the user experience.