A voice-activated switch provides a signal indicative of the presence of human speech in an audio signal. That signal can be used to activate a tape recorder, a transmitter, or a variety of other audio devices that process human speech.
Human speech contains both voiced (vowel) sounds which are formed using the vocal chords and unvoiced (consonant) sounds which are formed without using the vocal chords. Audio signals containing voiced sounds are characterized by predominant signal components at the resonant frequencies of the vocal chords, called the "formant frequencies". Human vocal chords resonate at a first format frequency between 250 and 750 Hz. The presence of human speech in a sound signal can therefore be detected by detecting the presence of resonant formant frequency components.
One way to detect the predominance of particular frequency components in a signal is by the well known technique of auto-correlation where the signal is multiplied by a time-delayed version of itself. The delay amount is the period corresponding to the frequency of interest. U.S. Pat. No. 4,959,865 to Stettiner et al. discloses using thirty-six separate autocorrelation lags to detect voiced speech and non-speech tones. Stettiner teaches examining the periodicity of the peaks of the thirty-six autocorrelation bins to detect the presence of predominant frequency components at frequencies between fifty and five-hundred Hz. However, providing thirty-six autocorrelations requires a relatively large amount of processing bandwidth and therefore may not be desirable for applications where a relatively large amount of processing bandwidth is not available.