Computing devices can be used to process a user's spoken commands, requests, and other utterances into written transcriptions. In a common application, a user can speak into a microphone of a computing device, and an automatic speech recognition module executing on the computing device can process the audio input and determine what the user said. Additional modules executing on the computing device can process the transcription of the utterance to determine what the user meant and/or perform some action based on the utterance.
Automatic speech recognition systems typically include an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which subword units (e.g., phonemes) correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine which of the hypotheses generated using the acoustic model is the most likely transcription of the utterance based on lexical features of the language in which the utterance is spoken.
Some speech recognition systems are configured to spot particular keywords in a user utterance. Recognition of such keywords can trigger other actions. For example, an automatic speech recognition system may be used to process utterance audio and generate a transcript of the utterance. The system can then determine whether the transcript includes a particular keyword that the system is configured to spot. If the keyword has been positively spotted, another application or process may be initiated.