Electronic signal generation is pervasive in all areas of electronic and electrical technology. When an electrical signal is used to emulate, transmit, or reproduce a real world quantity, the quality of the signal is important. For example, speech is often received via a microphone or other sound transducer and transformed into an electrical representation or signal. In addition to the artificial noise introduced as an artifact of this transformation, other artificial noise may be additionally introduced into the signal during transmission, and coding and/or decoding. Such noise is often audible to humans, and in fact may dominate a reproduced speech signal to the point of distracting or annoying the listener.
Speech coders, particularly those operating at low bit rates, tend to introduce quantization noise that may be audible and thereby impair the quality of the recovered speech. A postfilter is generally used to mask noise in coded speech signals by enhancing the formants and fine structure of such signals. Typically, noise in strong formant regions of a signal is inaudible, whereas noise in valley regions between two adjacent formants of a signal is perceptible since the signal to noise ratio (SNR) in valley regions is low. The SNR in the valley region may be even lower in the context of a low bit rate codec, since the prevailing linear prediction (LP) modeling methods represent the peaks more accurately than the valleys, and the available bits are insufficient to adequately represent the signal in the valleys. Thus, it is desirable that a speech postfilter attenuates the valleys while preserving the peaks in order to reduce the audible noise level.
Juin-Hwey Chen et al. have proposed an adaptive postfiltering algorithm consisting of a pole-zero long-term postfilter cascaded with a short-term postfilter. The short-term postfilter is derived from the parameters of the LP model in such a way that it attenuates the noise in the spectrum valleys. These parameters are commonly referred to as linear predictive coding coefficients, or LPC coefficients, or LPC parameters. Additionally, Wang et al. introduced a frequency domain adaptive postfiltering algorithm to suppress noise in spectrum valleys. The aforementioned postfiltering algorithms reduce noise without introducing substantial spectral distortion, but they are not efficient in reducing the perceptible noise in shallow, rather than deep, valleys between formants, especially in the context of low bit-rate coders such as those operating at below 8 kbps. A primary explanation for this drawback is that the frequency response of the postfilter itself does not adequately follow the detailed fine structure of the spectral envelope, leading to the masking of shallow valleys between closely-spaced formants.
A typical early time domain LPC postfiltering architecture is illustrated in FIG. 1. An input bit-stream, perhaps transmitted from an encoder, is received at decoder 100. A bit-stream decoder 110 associated with decoder 100 decodes the incoming bit-stream. This step yields a separation of the bit stream into its logical components or virtual channel contents. For example, the bit stream decoder 110 separates LPC coefficients from a coded excitation signal for linear prediction-based codecs. The decoded LPC coefficients are transmitted to a formant filter 131, which is the first stage of a time domain postfilter 130. A synthesized speech signal produced by a speech synthesizer 120 is input to the formant filter 131 followed by a pitch filter 132 wherein the harmonic pitch structure of the signal is enhanced. Cascaded with the pitch filter, a tilt compensation module 133 is generally provided for removing the background tilt of the formant filter to avoid undesirable distortion of the postfilter. Finally, a gain control is applied to the signal in gain controller 134 to eliminate discontinuity of signal power in adjacent frames.
The frequency response of the postfilter architecture represented in prior speech postfiltering systems does not adequately follow the detailed fine structure of the speech spectrum nor does it always adequately resolve the spectral envelope peaks and valleys.