One of the characteristics of Analysis-by-Synthesis (A-by-S) speech coders, that typically use the Mean Square Error (MSE) minimization criterion, is that as the bit rate is reduced, the error matching at higher frequencies becomes less efficient and consequently MSE tends to emphasize signal modeling at lower frequencies. The training procedure for optimizing excitation codebooks, when used, likewise tends to emphasize lower frequencies and attenuate higher frequencies in the trained codevectors, with the effect becoming more pronounced as the excitation codebook size is decreased. The perceived effect of the above on reconstructed speech is that it becomes increasingly muffled with bit rate reduction. One solution to this problem is described in the 3GPP2 Document “Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Options 62 and 63 for Spread Spectrum Systems,” in the context of an algebraic excitation codebook. The solution involves the use of a shaping filter formulated as a preemphasis filter for the excitation codebook, described by:HFCB —shape(z)=1−μz−1, 0≦μ≦0.5where μ is selected based on the degree of periodicity at the previous subframe, which, when high, causes a value of μ close to 0.5 to be selected. This imposes a high-pass characteristic on the excitation codebook vector being evaluated, and thereby the excitation codebook vector that is ultimately selected. The MSE criterion is used to select a vector from the excitation codebook which has been adaptively shaped as described.
While the above technique does mitigate, to a degree, the attenuation of high frequencies in the coded signal, it does not necessarily optimize the MSE criterion. However, the resulting reconstructed speech sounds more similar to the target input speech, which is why the shaping is employed despite its effect on MSE.
In the European Patent EP 1 141 946 B1,titled “Coded Enhancement Feature for Improved Performance in a Coding Communication Signals”, Hagen and Kleijn propose a method for reducing the distance between the target signal and the coded signal. They compute in the frequency domain, a transfer function which when applied to the reconstructed signal, results in the reconstructed signal exactly matching the input signal. In practice, this transfer function is simplified (as explained in EP 1 141 946 B1), prior to being explicitly quantized, so as to reduce the amount of information in need of quantization, and is then conveyed from the encoder to the decoder via a communication channel. The simplification, followed by quantization, of the transfer function prevents exact signal reconstruction from being achieved. The quantized transfer function constitutes the encoded enhancement information, and is explicitly transmitted. This points to one drawback of EP 1 141 946 B1 when applied to the task of enhancing the performance of a selected speech coder. Since the enhancement information is explicitly modeled as a transfer function between the input target signal and the reconstructed (coded) signal, it needs to be potentially simplified, then explicitly quantized, and conveyed to the decoder, because input speech typically is not available at the decoder. Consequently this approach incurs a cost in bandwidth, for providing the enhancement information to the decoder.