Computing devices can be used to detect particular words or phrases in audio data (e.g., “wake” words or other keywords or phrases spoken to initiate interaction with the computing device). In a typical implementation, a computing device can continuously monitor an input stream or receive a batch of input data. This input stream may be, for example, an audio stream from a microphone. The computing device can determine whether a portion of the input stream is likely to contain information corresponding to the word or phrase to be detected. For example, the computing device can make a preliminary determination as to whether a particular portion of the input stream includes any speech at all. Once the computing device has made this preliminary determination, the computing device can then perform other processing, such as automatic speech recognition (“ASR”), to determine which words or phrases are present in the input stream in order to determine whether the particular word or phrase to be detected is present. In some cases, the computing device may detect evidence of a particular word or phrase in an input stream without actually decoding or recognizing specific words or phrases using ASR. For example, the computing device may use a classifier that accepts input features derived from the input stream and determines whether the input stream is likely to include the word or phrase.
A user's experience with the above detection system can be defined in terms of performance latencies and detection errors. These can include false acceptances and false rejections. False acceptances occur when the detection system erroneously hypothesizes from the input data that the user is trying to initiate communication with the computing device. False rejections occur when the detection system fails to respond to user communication directed at the computing device. Detection systems may use the concept of a confidence score when detecting the target item, object, or pattern. Higher confidence in the accuracy of the detection or rejection can be reflected by a higher confidence score, while lower confidence in the accuracy of the detection or rejection can be reflected by a lower confidence score.