Speech recognition systems can largely be classified into two types of systems: a rule-based system that utilizes a small-vocabulary and produces highly accurate results, or an open-ended, statistical-based system that utilizes a vast vocabulary and provides a breadth of recognition at a cost of specific accuracy.
Speech recognition has become a useful tool on smaller form devices, such as mobile phones, tablet computers, wearable devices (e.g., smart watches, etc.), portable media players, etc. Users may use a speech recognition system with various types of applications to perform actions, answer questions, make recommendations, etc. For example, a user may speak a command to launch a text messaging application, speak a text message, and then speak a command to send the text message. Speech recognition on such devices may be constrained by hardware, software, and/or processing/memory capabilities. Accordingly, smaller form devices may comprise a rule-based speech recognition system as opposed to a large-vocabulary model that allows for open-ended speech because of the amount of memory and processing power such a system may consume.
A large-vocabulary speech recognition system may be available on a separate system, for example, on a remote server. Some smaller form devices may rely on a network-based large-vocabulary speech recognition system to perform recognition; however, access to a network may not always be available, and hosting a large-vocabulary system on a smaller form device may not be feasible from a computational perspective.
Oftentimes, there exists a need to combine advantages of both types of speech recognition systems, for example, where a portion of a spoken utterance from a user may need to be matched with high accuracy, and another portion of the spoken utterance may need to be more inspecifically matched. It is with respect to these and other considerations that the present invention has been made.