1. Field of the Invention
The invention generally relates to a device for the detection of the start and end of a segment containing speech within an input audio signal which contains both speech segments and nonspeech noise or background segments.
2. Description of Related Art
Detection of speech in real time is a necessary component for many devices, including but not limited to voice-activated tape recorders, answering machines, automatic speech recognizers, and processors for removing speech from music. Many of these applications have noise inseparably mixed with the speech. Detection of speech requires a more sophisticated speech detection capability than provided by conventional devices that simply detect when energy level rises above or falls below a preset threshold.
In the field of automatic speech recognition, the speech detection component is most critical. In practice, more speech recognition errors arise from errors in speech detection than from errors in pattern matching, which is commonly used to determine the content of the speech signal. One proposed solution is to use a word spotting technique, in which the recognizer is always listening for a particular word. However, if word spotting is not preceded by speech detection, the overall error rate can be high.
Many speech detection devices are based on a certain parameter of the input, such as energy, pitch, and zero crossings. The performance of the speech detector depends heavily on the robustness of that parameter to background noise. For real time speech detection, the parameters must be quickly extracted from the signal.