Digital speech compression systems typically require estimation of the fundamental frequency of an input signal. The fundamental frequency .function..sub.0 is usually estimated in terms of the pitch period .tau..sub.0 (otherwise known as "lag"). The two are related by the expression ##EQU1##
where the sampling frequency .function..sub.s is commonly 8000 Hz for telephone grade applications.
Since a speech signal is generally non-stationary, it is partitioned into finite length vectors called frames (e.g., 10 to 40 ms), each of which are presumed to be quasi-stationary. The parameters describing the speech signal are then updated at the associated frame length intervals. The original Code Excited Linear Prediction (CELP) algorithm further updates the pitch period (using what is called Long Term Prediction, or LTP) information on shorter subframe intervals, thus allowing smoother transitions from frame to frame. It was also noted that although .tau..sub.0 could be estimated using open-loop methods, far better performance was achieved using the closed-loop approach. Closed-loop methods involve an exhaustive search of all possible values of .tau..sub.0 (typically integer values from 20 to 147) on a subframe basis, and choosing the value that satisfies some minimum error criterion.
An enhancement to this method involves allowing .tau..sub.0 to take on fractional values. An example of a practical implementation of this method can be found in the GSM half rate speech coder, and is shown in FIG. 1. Here, lags within the range of 21 to 222/3 are allowed 1/3 sample resolution, lags within the range of 23 to 345/6 are allowed 1/6 sample resolution, and so on. In order to keep the search complexity low, a combination of open-loop and closed loop methods is used. The open-loop method involves generating an integer lag candidate list using an autocorrelation peak picking algorithm. The closed-loop method then searches the allowable lags in the neighborhood of the integer lag candidates for the optimal fractional lag value. Furthermore, the lags for subframes 2, 3, and 4 are coded based on the difference from the previous subframe. This allows the lag information to be coded using fewer bits since there is a high intraframe correlation of the lag parameter. Even so, the GSM HR codec uses a total of 8+(3.times.4)=20 bits every 20 ms (1.0 kbps) to convey the pitch period information.
In an effort to reduce the bit rate of the pitch period information, an interpolation strategy was developed that allows the pitch information to be coded only once per frame (using only 7 bits.fwdarw.350 bps), rather than with the usual subframe resolution. This technique is known as relaxed CELP (or RCELP), and is the basis for the recently adopted enhanced variable rate codec (EVRC) standard for Code Division Multiple Access (CDMA) wireless telephone systems. The basic principle is as follows.
The pitch period is estimated for the analysis window centered at the end of the current frame. The lag (delay) contour is then generated, which consists of a linear interpolation of the past frame's lag to the current frame's lag. The linear prediction (LP) residual signal is then modified by means of sophisticated polyphase filtering and shifting techniques, which are designed such that the 1/8 sample interpolation boundaries are not crossed during perceptually critical instances in the waveform. The primary reason for this residual modification process is to account for errors introduced by the open-loop integer lag estimation process. For example, if the integer lag is estimated to be 32 samples, when in fact the true lag is 32.5 samples, the residual waveform can be in conflict with the estimated lag by as many as 2.5 samples in a single 160 sample frame. This can severely degrade the performance of the LTP. The RCELP algorithm accounts for this by shifting the residual waveform during perceptually insignificant instances in the residual waveform (i.e., low energy) to match the delay contour. In the event that there are no such opportunities for shifting, the shift count is accumulated and reserved for use during the next frame. By modifying the residual waveform to match the estimated delay contour, the effectiveness of the LTP is preserved, and the coding gain is maintained. In addition, the associated perceptual degradations due to the residual modification are claimed to be insignificant.
But, while this last claim may be true for medium bit rate coders such as the EVRC full rate mode (i.e., 8.5 kbps), it is less apparent for the EVRC half rate mode, which operates at 4.0 kbps. This is because of the relative ability of the fixed codebooks to model the associated inverse error signal. That is, if coding distortions are introduced by inefficiencies in the LTP, and those distortions can be effectively modeled by the fixed codebook, then the net effect is that the distortion will be canceled. So, while the EVRC full rate mode allocates 120 of 170 its per frame for fixed codebook gain and shape, the half rate mode can afford only 42 of 80 bits per frame for the same. This results in a disproportionate performance degradation due, in part, to the fixed codebook's inability to model the coding distortion introduced by the LTP.
Therefore, there is a need for an improved method of open-loop pitch estimation that provides subsample resolution.