Automatic speech recognition is an area of technology which transforms the lexical content of human speech into an input form (e.g., a character string) that can be read by computers. The process of automatic speech recognition typically includes several operations, including: generating a language model that contains a plurality of words in a corpus, training an acoustic model to create statistical representations of one or more contrastive units of sound (called “phonemes” or simply “phones”) that make up each word in the corpus, building a decoding network using the language model and the acoustic model, and finally decoding human speech.
Recognition of speech commands is a specific application of automatic speech recognition technology. Specifically, recognition of speech commands allows a user to input commands by speaking a phrase (e.g., into a microphone) rather than interacting with a device through conventional physical user input apparatus, such as a mouse, keyboard, touch screen, and so on. The decoding network translates the spoken phrase into an input form and attempts to match the input form to an input command. When the input form is recognized as a command, the device triggers an operation corresponding to the command.
As an example, a device may have a wake-up operation in which the device transitions from a “sleep mode” (e.g., a power save mode, or a screen-saver mode) to an “active mode” of normal use. Several user inputs may suffice to trigger the wake operation, such as a mouse click and/or one or more speech commands corresponding to the wake-up operation (e.g., “wake-up” or “turn on”). When a user speaks the words “turn on,” the device triggers the wake up operation.
The quality of a speech command recognition system is often measured by two metrics: a false acceptance rate and a false rejection rate. The false acceptance rate is a measure of a rate at which audio input received by the device is mistakenly interpreted as a speech command (e.g., when no such command has been uttered). The false rejection rate is a measure of a rate at which the device fails to recognize speech commands that have been delivered. Using conventional methods of speech command recognition, the false acceptance rate is generally unacceptably high, especially in noisy environments (e.g., environments with a high level of background noise). This leads to frustrations on the part of the user and lack of utilization of speech command recognition systems.