Speech recognition systems, or automatic speech recognizers, have become increasingly important as more and more computer-based devices use speech recognition to receive commands from a user in order to perform some action as well as to convert speech into text for dictation applications or even hold conversations with a user where information is exchanged in one or both directions. Such systems may be speaker-dependent, where the system is trained by having the user repeat words, or speaker-independent where anyone may provide immediately recognized words. Some systems also may be configured to understand a fixed set of single word commands or short phrases, such as for operating a mobile phone that understands the terms “call” or “answer”, or an exercise wrist-band that understands the word “start” to start a timer for example. Other systems may have an extensive vocabulary such as for voice activated search engines.
Thus, automatic speech recognition (ASR) is desirable for wearables, smartphones, and other small devices. Due to the computational complexity of ASR, however, many small devices with ASR systems, and especially those with large vocabularies, are server based such that the computations are performed remotely from the device which can result in a significant delay and/or significant battery usage due to communication via WI-FI or other wireless communication methods. Other ASR systems have on-board vocabularies and computation ability. In these cases, small audio devices such as wearables or smartphones often have very limited temporary memory capacity to hold the vocabularies used by a decoding transducer such as a weighted finite state transducer (WFST) by one example. Specifically, ASR on small devices, or even larger devices such as servers but with dedicated memories, is often restricted to a relatively limited standard vocabulary and grammar that is accessible or placed on all devices using a certain ASR application. Typically, the memory capacity is too small to add customized dynamic vocabularies and WFSTs to use those dynamic vocabularies that are unique to a specific device. Such desired dynamic vocabularies may be in the form of lists of names, phone numbers, email addresses, or other information from a contact list on a device, or music or video descriptions from media applications on the device, and so forth.