1. Field of the Invention
The present invention relates to a method and an apparatus for detecting pitch in input voice signals by using subharmonic-to-harmonic ratio.
2. Description of Related Art
In the field of voice signal processing such as speech recognition, voice synthesis, and analysis, it is important to exactly extract the basic frequency, i.e. the pitch cycle. The exact extraction of the basic frequency may not only enhance recognition accuracy through reduced speaker-dependent speech recognition, but easily alter or maintain naturalness and personality in voice synthesis. Additionally, voice analysis synchronized with a pitch may allow for obtaining a correct vocal track parameter from which effects of glottis are removed.
For the above reasons, a variety of ways of implementing a pitch detection in a voice signal have been proposed in the art. Such conventional proposals may be divided into a time domain detection method, a frequency domain detection method, and a time-frequency hybrid domain detection method.
The time domain detection method, such as parallel processing, average magnitude difference function (AMDF), and auto-correlation method (ACM), is a technique to extract a pitch by decision logic after emphasizing periodicity of a waveform. Being performed mostly in a time domain, this method may require only a simple operation such as addition, subtraction, and comparison logic without requiring a domain conversion. However, when a phoneme ranges over a transition region, pitch detection may be difficult due to excessive variations of a level in a frame and fluctuations in a pitch cycle, and also may be much influenced by formant. Especially, in the case of a noise-mixed voice, a complicated decision logic for the pitch detection may increase unfavorable errors in extraction.
The frequency domain detection method is a technique to extract a basic frequency of voicing by measuring a harmonics interval in a speech spectrum. A harmonics analysis technique, a lifter technique, a comb-filtering technique, etc., have been proposed as such methods. Generally, spectrum is obtained according to a frame unit. So, even if a transition or variation of a phoneme or a background noise appears, this method may be not much affected since it may average out. However, calculations may become complicated because a conversion to a frequency domain is required for processing. Also, if pointers of a Fast Fourier Transform (FFT) increase in number to raise the precision of the basic frequency, a calculation time required is increased while being insensitive to variation characteristics.
The time-frequency hybrid domain detection method combines the merits of the aforementioned methods, that is, a short calculation time and high precision of the pitch in the time domain detection method and the ability to exactly extract pitch despite a background noise or a phoneme variation in the frequency domain detection method. This hybrid method, for example, includes a cepstrum technique and a spectrum comparison technique, may invite errors while performed between time and frequency domains, thus unfavorably influencing pitch extraction. Also, a double use of the time and frequency domains may create a complicated calculation process.