Speech recognition systems, or automatic speech recognizers, have become increasingly important as more and more computer-based devices use speech recognition to receive commands from a user in order to perform some action as well as to convert speech into text for dictation applications or even hold conversations with a user where information is exchanged in one or both directions. Such systems may be speaker-dependent, where the system is trained by having the user repeat words, or speaker-independent where anyone may provide immediately recognized words. Some systems also may be configured to understand a fixed set of single word commands or short phrases, such as for operating a mobile phone that understands the terms “call” or “answer”, or an exercise wrist-band that understands the word “start” to start a timer for example. Other systems may have an extensive vocabulary such as for voice activated search engines.
Thus, automatic speech recognition (ASR) is desirable for wearables, smartphones, and other small devices. Due to the computational complexity of ASR, however, many small devices with ASR systems, and especially those with large vocabularies, are server based such that the computations are performed remotely from the device which can result in a significant delay and/or significant battery usage due to communication via WI-FI or other wireless communication methods. Other ASR systems have on-board computation ability. In these cases, small audio devices such as wearables or smartphones often have very limited temporary memory capacity to hold the vocabularies used by a decoding transducer such as a weighted finite state transducer (WFST) by one example. Specifically, ASR on small devices is often restricted to a limited vocabulary as the memory capacity is too small to hold the WFSTs necessary for large vocabulary speech recognition.