Intelligent automated assistants (or virtual assistants) provide an intuitive interface between users and electronic devices. These assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can access the services of an electronic device by providing a spoken user input in natural language form to a virtual assistant associated with the electronic device. The virtual assistant can perform natural language processing on the spoken user input to infer the user's intent and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more functions of the electronic device, and a relevant output can be returned to the user in natural language form.
In support of virtual assistants and other speech applications, automatic speech recognition (ASR) systems are used to interpret user speech. Some ASR systems are based on the weighted finite state transducer (WEST) approach. Many such WEST systems, however, include static grammars that fail to support language changes, introduction of new words, personalization for particular speakers, or the like. In virtual assistant applications—as well as other speech recognition applications—utility and recognition accuracy can be highly dependent on how well an ASR system can accommodate such dynamic changes in grammars. In particular, utility and accuracy can be impaired without the capacity to quickly and efficiently modify underlying recognition grammars during runtime to support such dynamic grammars.
Accordingly, without adequate support for dynamic grammars, WFST-based ASR systems can suffer poor recognition accuracy, which can limit speech recognition utility and negatively impact the user experience.