Human speech can be considered to be made up of two major components: voiced sounds, generally known as vowels, wherein the vocal cords are active, and unvoiced sounds, where the sound is generated by a constriction or manipulation of the breath channel. Voiced sounds have a quasi-periodic spectral structure of relatively long time duration while the unvoiced sounds are often shorter in duration, broad band and noise-like in their spectral distribution. Most of the speech energy is contained in the voiced portion of the speech signal. In speech processing activities it is often desirable to extract the voiced signal portion of a single talker from a composite of the entire speech signal or from a high noise environment. The circuitry needed to accomplish this task includes a bank of band-pass filters coinciding with the harmonics of the instantaneous voice pitch. There is a significant variation in the instantaneous voice pitch in the speech of a single talker and a very large variability in pitch between different talkers. Consequently, a fixed frequency set of band-pass filters cannot meet the requirements. A filter set capable of being steered to the correct frequencies on a dynamic basis is needed, along with a control signal which represents the pitch of the voice signal to be processed.
The present system basically measures the voice fundamental frequency and uses this information to electrically control a number of narrow-band tracking filters which pass the narrow bands of frequencies that contain the voice pitch harmonics. The outputs of these individual filters are then summed to give the voiced speech portion of one talker's voice signal.