1. Field of the Invention
The present invention relates to pitch detection, and more particularly, to a method and apparatus for detecting a pitch by decomposing voice data into even symmetrical components and then obtaining segment correlation values.
2. Description of the Related Art
In the voice signal processing field such as voice recognition, synthesis and analysis, it is important to accurately detect a fundamental frequency, that is, a pitch period. If the fundamental frequency of a voice signal can be accurately detected, effects caused by a speaker's voice in voice recognition can be reduced such that the accuracy of the recognition can be raised, and when the voice is synthesized, naturalness and individual characteristics can be easily modified or maintained. In addition, in voice analysis, if the voice is analyzed in synchronization with a pitch, accurate vocal tract parameters in which the effect of a glottis is removed can be obtained.
Thus, performing pitch detection in a voice signal is an important part and methods for pitch detection have been suggested in a variety of ways. These methods can be broken down into time domain detection, frequency domain detection, and time-frequency hybrid domain detection.
Time domain detection is a method emphasizing periodicity of waveforms and then detecting a pitch by a decision logic, and includes a parallel processing method, average magnitude difference function (hereinafter referred to as AMDF), and auto-correlation method (hereinafter referred to as ACM). These methods are usually performed in time domain such that transforming of the domain is not needed and only simple operations such as addition, subtraction, and comparison logics are needed. However, when a phoneme stretches over a transition interval, signal power levels in a frame change severely and the pitch period changes. Accordingly, detection of a pitch is difficult and influenced by a formant in that interval. In particular, when voice is mixed with noise, decision logic for pitch detection is complicated such that detection error increases. More specifically, in the ACM method, it is highly probable that pitch determination errors, including mistaking a first formant for a pitch, pitch doubling, and pitch halving, occur.
Frequency domain detection is a method detecting the fundamental frequency of voiced sound by measuring harmonic intervals of a voice spectrum, and a harmonic analysis method, Lifter method, and Comb-filtering method have been suggested as frequency domain detection. Since a spectrum is generally obtained within a frame with a duration of 20 to 40 ms, even if phoneme transition/change or background noise occurs within the frame, the influence is not great. However, the detection processing needs to transform to a frequency domain and therefore, the calculation is complicated. If the number of FFT pointers is increased in order to raise the accuracy of a fundamental frequency, the processing time increases proportionately and it is difficult to accurately detect the changed characteristic.
Time-frequency hybrid domain detection is based on the advantages of the two methods, calculation time reduction and pitch accuracy of the time domain detection and frequency domain detection's capability of accurately obtaining a pitch despite background noise or phoneme change. This includes the Cepstrum method, and the spectrum comparison method. However, in these methods, when time domain and frequency domain are alternately visited, errors increase and can affect pitch detection accuracy. In addition, since the time and frequency domains are applied at the same time, the calculation is complicated.