This application claims the priority of Korean Patent Application No. 2002-61787, filed on 10 Oct. 2002, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to a method for improving an open-loop pitch estimation device used in a speech COder/DECoder (CODEC) and an apparatus using the method, and more particularly, to a method of pitch by using the ratio of a maximum peak to a candidate for the maximum of an autocorrelation function of a perceptual weighting filtered speech signal, and an apparatus using the method.
2. Description of the Related Art
In general code excited linear prediction (CELP) type speech CODEC, a linear prediction coefficient (LPC) presenting a spectrum envelope, a pitch showing periodical characteristics, and a fixed codebook parameter for modeling a residual signal of a LPC analysis filter are extracted from input speech signal. Then, a speech signal is reconstructed by using those extracted information.
FIG. 1 is a block diagram of a general encoder of the CELP type CODEC. Referring to FIG. 1, a pre-processing unit 101 performs general pre-processing such that it band-pass filters and pre-emphasizes an input speech signal. An LPC analyzing/quantizing unit 102 calculates a linear prediction (LP) coefficient and quantizes the LP coefficient for transmission. A signal inputted to a synthesis filter 103 is modeled as a fixed codebook 104 and an adaptive codebook 105. A pitch estimation unit 106 finds the lag having a most similar signal with the perceptual weighting filtered signal from the adaptive codebook 105, and the lag found by the pitch estimation unit 106 is called a pitch. Since the search of the adaptive codebook 105 requires a large number of calculations, an approximate pitch is calculated firstly through a search of an open-loop, and then the adaptive codebook 105 is searched for only lags in the neighborhood of the approximate pitch. A fixed codebook estimation unit 107 obtains a fixed codebook index most adequate for modeling a residual signal of an LPC analysis filter from which pitch information is removed. After the fixed codebook index and a pitch lag are estimated, a gain of each codebook is calculated, and it is quantized by a gain quantizing unit 109 for transmission.
FIG. 2 is a block diagram of a decoder of a CELP type speech CODEC. In the decoder, the speech signal is reconstructed by the parameters extracted in the encoder. After the excitation signal reproduced by using a fixed codebook 201 and an adaptive codebook 202 that are the same as used in the encoder passes through a synthesis filter 203, a speech signal is synthesized. Here, the quality of the synthesized speech is enhanced by a post-processing filter 204, reflecting human perceptual characteristics.
In general, the pitch estimation unit 106 includes an open-loop pitch estimation device and a closed-loop pitch estimation device. In the open-loop pitch estimation device, a lag having the maximum autocorrelation is selected as a pitch based on the weighted speech signal. Here, some errors may occur such that a multiple or a sub-multiple of an actual pitch lag may be selected as a pitch. In particular, a multiple of an actual pitch lag is frequently selected as a pitch. In the closed-loop pitch estimation device, the pitch is estimated by analysis-synthesis algorithm for the lags in the neighborhood of a pitch estimated in the open-loop pitch estimation device. Therefore, if the multiple or the sub-multiple of the actual lag may be selected as a pitch, namely, if an error is made in the open-loop search, the error cannot be corrected in the closed-loop search. Thus, the quality of the synthesized speech is degraded. Accordingly, in the open-loop pitch estimation device, a pitch should be estimated by a simple method which requires a small number of calculations, and the multiple or the sub-multiple of the actual lag should not be selected as the pitch.
In order to reduce errors in the open-loop pitch estimation device, many algorithms have been suggested and been used, and an open-loop search used in a conventional speech CODEC is conducted in following two ways.
In the open-loop pitch estimation device applied in the ITU-T G.729 and the GSM EFR, a search range is divided into three sections. Three maximums of the correlation function are found in three sections, and then normalized by the energy. The winner among the three normalized maximum correlation is selected by favoring the lags with the values in the lower sections. However this algorithm do not work well with both female and male speakers. Generally, the pitch of male speaker is larger than that of female speaker. Thus this algorithm may cause the sub-multiple error for male speakers.
In AMR-WB, which is selected as a new standard wideband speech CODEC by the third generation partnership project (3GPP) and International Telecommunication Union—Telecommunication Standardization Bureau (ITU-T), a pitch estimation algorithm using a pitch of a previous frame is used. The pitch estimation device in this new standard wideband speech CODEC applies weight to an autocorrelation function of a low lag. If a current frame is decided to voiced frame, weight is applied to the autocorrelation function of the lag in the neighborhood of the pitch of the previous frame. Here, the pitch of the previous frame is determined by median filtering pitches of the previous 5 frames. This method of estimating a pitch is influenced by correctness of the pitch, and if the pitch of the previous frame is a multiple of the pitch of the current frame, an error can occur. For example, if a pitch of the previous frame is a multiple of the actual pitch of the current frame in a neighborhood of transition area, the autocorrelation function has peaks at every multiple of the pitch of the previous frame, and weight is applied to the autocorrelation function value for the multiple lag of the actual pitch. Thus, the multiple lag is estimated as a pitch.