Efforts to produce better speech quality at lower coding rates have stimulated the development of numerous block-based coding algorithms. The basic strategy in block-based coding is to buffer the data into blocks of equal length and to code each block separately in accordance with the statistics it exhibits. The motivation for developing blockwise coders comes from a fundamental result of source coding theory which suggests that better performance is always achieved by coding data in blocks (or vectors) instead of scalars. Indeed, block-based speech coders have demonstrated performance better than other classes of coders, particularly at rates 16 kilobits per second and below. An example of such a coder is presented in our prior U.S. patent application Ser. No. 798,174, filed Nov. 14, 1985.
One artifact of block-based coders, however, is framing noise caused by discontinuities at the block boundaries. These discontinuities comprise all variations in amplitude and phase representation of spectral components between successive blocks. This noise which contaminates th entire speech spectrum is particularly audible in sustained high-energy high-pitched speech (female voiced speech). The noise spectral components falling around the speech harmonics are partially masked and are less audible than the ones falling in the interharmonic gaps. As a result, the larger the interharmonic gaps, or higher the pitch, the more audible is the framing noise. Also, due to the "modulation" process underlying the noise generation, the larger the speech amplitude, the more audible is the framing noise.
The use of block tapering and overlapping can, to some extent, help subdue framing noise, particularly its low frequency components; and the larger the overlap, the better are the results. This method, however, is limited in its application and performance since it requires an increase in the coding rate proportional to the size of the overlap.
A more effective approach, initially applied to enhance speech degraded by additive white noise, is comb filtering of the noisy signal. This approach is based on the observation that waveforms of voiced sound are periodic with a period that corresponds to the fundamental (pitch) frequency. A comb filtering operation adjusts itself to the temporal variations in pitch frequency and passes only the harmonics of speech while filtering out spectral components in the frequency regions between harmonics. An illustration of the magnitude frequency response of a comb filter is illustrated in FIG. 1. The approach can in principle reduce the amount of audible noise with minimal distortion to speech.
An example illustration of a speech pattern is illustrated in FIG. 2. It can be seen that the speech has a period P of N.sub.p samples which is termed the pitch period of the speech. The pitch period P determines the fundamental frequency f.sub.p =1/P of FIG. 1. The speech waveform varies slowly through successive pitch periods; thus, there is a high correlation between a sample within one pitch period and corresponding samples in pitch periods which precede and succeed the pitch period of interest. Thus, with voiced speech, the sample X(n) will be very close in magnitude to the samples X(n-iN.sub.p) and X(n+iN.sub.p) where i is an integer. Any noise in the waveform, however, is not likely to be synchronous with pitch and is thus not expected to be correlated in corresponding samples of adjacent pitch periods. Digital comb filtering is based on the concept that, with a high correlation between periods of speech, noise can be deemphasized by summing corresponding samples of adjacent pitch periods. With perfect correlation, averaging of the corresponding samples provides the best filter response. However, where correlation is less than perfect as can be expected, greater weight is given to the sample of interest X.sub.n than to the corresponding samples of adjacent pitch periods.
The adaptive comb filtering operation can be described by: ##EQU1## where X(n) is the noisy input signal, Y(n) is the filtered output signal, N.sub.p is the number of samples in a pitch period, a.sub.i is the set of filter coefficients, LB is the number of periods considered backward and LF is the number of periods considered forward. The order of the filter is LB+LF. In past implementations of the comb filter approach, filter coefficients are fixed while the pitch period is adjusted once every pitch period. Therefore, the adaptation period as well as the filter processing segment are a pitch period long (N.sub.p samples). In the frequency domain, this pitch adaptation amounts to aligning the "teeth" of the comb filter to the harmonics of speech once every pitch period.
In another past implementation, a modified comb filter has been proposed to reduce discontinuities attributed to the pitch-synchronous adaptation when pitch varies. To that end, filter coefficients within each speech processing segment (N.sub.p samples) are weighted so that the amount of filtering is gradually increased at the first half of the segment and then gradually decreased at the second half of the segment. A symmetrical weighting smooths the transition and guarantees continuity between successive pitch periods. Again, pitch is updated in a pitch-synchronous mode. However, despite increased complexity, the performance of this filter is at most comparable to the performance of the basic adaptive comb filter.