A device may detect the presence of a voice component in an input signal including both the voice component and a noise component. The voice component may include, for example, sound generated by a person when a person speaks, music, or other transient sounds (e.g., rustle of paper or other sound). The noise component may be generated, for example, as background noise (e.g., constantly present background sounds such as fan noise, road noise, and the like).
When a device detects that a voice component is present in an input signal, another circuit may process the input signal. Such a detection scheme may have application in several areas such as voice activation recording used in recording devices or in speech recognition where a detection function precedes a recognition function. For example, in a recording device, a detector device may detect the presence of a voice component in an input signal, and a recording circuit may record the input signal on a media when the detector device determines that voice component is present in the input signal.
Envelope-based signal detection is one prior art scheme for determining the presence of a voice component in an input signal as illustrated in FIG. 1. FIG. 1 includes a graph representing input signal 40 (illustrated as a solid line) along amplitude and time axis, and a corresponding envelope signal 20 (illustrated as a dashed line). If envelope signal 20 is at a level greater than a threshold level 10, a detector device may indicate that a voice component is present in input signal 40.
Input signal 40 may be characterized by periods of voice (illustrated during T2, T4, T6, T8, and T10 of FIG. 1), silence (illustrated during T1, T9, and T11), and non-silence gaps (T3, T5, and T7). Voice periods may correspond to a time period during which a voice component is present, as for example, when a person is speaking. Silence periods may be defined as absence of audible sound as experienced by a person or recording instrument, and may correspond to a time period when a speaker may in fact not be speaking.
Non-silence gaps are short duration periods without a voice component, which may be naturally present in between words or even within a word spoken by a person. Non-silence gaps may be of the order of fraction of a millisecond duration to a few milliseconds. In comparison, silence gaps may be much longer in duration. During both silence and non-silence gap periods (T1, T3, T5, T7, T9, and T11), noise component is illustrated in FIG. 1. As will be appreciated, during voice periods, input signal 40 may include a voice component super-imposed over a noise component.
It may be a requirement that envelope signal 20 remain at a high level during non-silence gap periods so as to enable a detector device to indicate that voice is present during non-silence gap periods. By so indicating, input signal 40 may be recorded (or otherwise processed) during non-silence gap periods also, which may result in accurate reproduction of voice captured in input signal 40. Without such recording of non-silence gaps, an audio sound reproduced may be inaccurate and sound unnatural.
To generate envelope signal 20 which remains at a high level during non-silence gap periods, a prior detector device may use components such as analog filters to generate envelope signal 20. As is well known in the art, envelope signal 20 generated by such detector devices may gradually decay in response to sudden reductions in instantaneous level of input signal 40. Thus during periods T3 and T5, envelope signal 20 remains high, and the detector device may indicate that a voice component is present during the corresponding periods.
However, the rate of decay may not be accurately related to the silence and non-silence gaps. Therefore if the decay is made too fast, non-silence gaps are detected as silence as illustrated during time T9. If the decay is made too small, the silence gaps may be missed and mis-identified as voice periods.
Moreover, such a detector device may not quickly respond to changes in input signal 40, envelope signal 20 may not rise to a sufficiently high level immediately when a voice component is present in input signal 40. As illustrated at input samples 70 of FIG. 1, envelope signal 20 may remain at a level lower than threshold level 10 for a short duration, and a detector device may accordingly miss indicating the presence of voice component in input signal 40.
Due to such misses, an audible voice reproduced from input signal 40 may not have acceptable quality as the leading portion of a word or words may be truncated. To avoid or minimize such misses, either the threshold 10 should be lowered or another prior art detector may be designed to respond quicker to changes in input signal 40. However, such changes could lead to falsely detecting background noise as voice.