1. Field of the Invention
The present invention relates to a method and an apparatus for detecting a pitch in input voice signals by using a spectral auto-correlation.
2. Description of Related Art
In the field of voice signal processing such as speech recognition, voice synthesis, and analysis, it is important to exactly extract a basic frequency, i.e. a pitch cycle. The exact extraction of the basic frequency may enhance recognition accuracy through reduced speaker-dependent speech recognition, and also easily alter or maintain naturalness and personality in voice synthesis. Additionally, voice analysis synchronized with a pitch may allow for obtaining a correct vocal track parameter from which effects of glottis are removed.
For the above reasons, a variety of ways of implementing a pitch detection in a voice signal have been proposed. Such conventional proposals may be divided into a time domain detection method, a frequency domain detection method, and a time-frequency hybrid domain detection method.
The time domain detection method, such as parallel processing, average magnitude difference function (AMDF), and auto-correlation method (ACM), is a technique to extract a pitch by decision logic after emphasizing periodicity of a waveform. Being performed mostly in a time domain, this method may require only a simple operation such as an addition, a subtraction, and a comparison logic without requiring a domain conversion. However, when a phoneme ranges over a transition region, the pitch detection may be difficult due to excessive variations of a level in a frame and fluctuations in a pitch cycle, and also may be much influenced by formant. Especially, in the case of a noise-mixed voice, a complicated decision logic for the pitch detection may increase unfavorable errors in extraction.
The frequency domain detection method is a technique to extract a basic frequency of voicing by measuring a harmonics interval in a speech spectrum. A harmonics analysis technique, a lifter technique, a comb-filtering technique, etc., have been proposed as such methods. Generally, a spectrum is obtained according to a frame unit. So, even if a transition or variation of a phoneme or a background noise appears, this method may be not much affected since it may average out. However, calculations may become complicated because a conversion to a frequency domain is required for processing. Also, if pointers of a Fast Fourier Transform (FFT) increase in number to raise the precision of the basic frequency, a calculation time required is increased while being insensitive to variation characteristics.
The time-frequency hybrid domain detection method combines the merits of the aforementioned methods, that is, a short calculation time and high precision of the pitch in the time domain detection method and the ability to exactly extract pitch despite a background noise or a phoneme variation in the frequency domain detection method. This hybrid method, for example, includes a cepstrum technique and a spectrum comparison technique, may invite errors while performed between time and frequency domains, thus unfavorably influencing pitch extraction. Also, a double use of the time and frequency domains may create a complicated calculation process.