1. Field of the Invention
The present invention relates to speech signal processing, and more particularly, to a pitch determination apparatus and method which is used in a voice coder of a low bit rate, a voice recognition apparatus, etc.
2. Description of the Related Art
A pitch is generated by periodical characteristics of opening and closing of a vocal cord in the respect of the characteristics of voice production of human being. This pitch is an important parameter which is used upon voice modeling. The pitch is usually applied to, for example, a voice coder (or a vocoder or a voice codec), voice recognition, voice transformation, etc.
In a case of a low bit rate voice decoder, when an error is generated upon pitch determination, the quality of speech communication is significantly deteriorated. Thus, in these application fields, it is very important to select an accurate pitch determination method.
Generally, a pitch determination error can be a pitch doubling, a pitch halving, or a first formant error. In the pitch doubling, an original pitch T is erroneously determined to be 2T, 3T, 4T, . . . In the pitch halving, an original pitch T is erroneously determined to be T/2, T/4, T/8, . . . The first formant error is generated when the autocorrelation of a first formant is greater than the correlation value of a pitch.
FIG. 1 shows a widely-used conventional pitch determination method using autocorrelation at a time axis.
However, in this conventional pitch determination method, an error due to pitch doubling occurs frequently.
For example, when an input voice is the same as FIG. 5A, an autocorrelation value is the same as FIG. 5B. When an original voice pitch is 31, the autocorrelation method provokes an error upon pitch determination since correlation values of candidate pitches 31, 62 and 93 are large.
Accordingly, the conventional pitch determination method using the autocorrelation has a high pitch determination error rate, thus significantly degrading the tone quality of a voice coder. Particularly, when background noise is mixed in an input voice, the tone quality is more deteriorated due to a pitch determination error.