Computing devices can be used to recognize faces, voices, handwriting, and other objects, patterns and the like. In a typical implementation, a computing device can continuously monitor a particular input stream (e.g., a video stream from a video camera or an audio stream from a microphone), or receive a batch of similar input data. The computing device can determine whether a portion of the input is likely to contain information corresponding to the target item, object, or pattern to be detected. For example, the computing device can determine whether a particular portion of the input stream is likely include to any face, any speech, or any handwriting at all. Once this preliminary determination has been made, the computing device can then perform other processing or cause other processing to be performed. For example, the computing device may perform recognition of which particular face, voice or other target is present in the input, rather than detecting that any face/voice/etc. is present in the input.
A user experience with such a detection system can be defined in terms of performance latencies and detection errors, such as false positives and false negatives. Detection systems may use the concept of a confidence score when detecting the target item, object, or pattern. Higher confidence in the detection can be reflected by a higher confidence score, while loser confidence in the detection can be reflected by a lower confidence score. The detection system may use a confidence score threshold to determine when the target has been detected and additional processing should therefore occur.