1. Field of the Invention
The present invention relates to methods for performing overlap-add in speech and audio coding to ensure a smooth transition from one segment to the next.
2. Background Art
Overlap-add is used extensively in speech and audio coding to ensure a smooth transition from one segment to the next. Most of the recent audio codecs (MPEG1-Layer3, AC3, AAC) employ a modified discrete cosine transform (MDCT) with 50% overlap between successive transform windows. During transmission, compressed frames of speech or audio may be lost or too corrupted to be used. In this case, the decoder must attempt to conceal the effects of the lost frame. In order to avoid discontinuities and ensure a smooth energy profile, the concealed waveform section is often overlap-added with the bordering (last good frame before concealment and/or first good frame after concealment) received signal. In the case of concealing frame loss with codecs employing overlap between successive frames (as in the audio codecs mentioned above), the concealed waveform may be combined with the overlapped portions of the bordering received frames.
A general overlap-add of two signals can be defined by:s(n)=sout(n)·wout(n)+sin(n)·win(n) n=0. . . N−1where sout is the signal to be faded out, sin is the signal to be faded in, wout is the fade-out window, win is the fade-in window, and N is the overlap-add window length.
Consider two signals whose cross correlation is 1 (sin(n)=α·sout(n)). One example is when the signals are identical (hence α=1). In this case, the overlap-add operation should yield the condition that s(n)=sout(n)=sin(n) which implies that:wout(n)+win(n)=1 n=0. . . N−1
Now consider two signals whose cross-correlation is zero. In this case, the overlap-add operation should give a smooth energy transition. As an example, considerE[sout2(n)]=E[sin2(n)]E[sin(n)·sout(n)]=0In this case, the overlap-add should yield E[s2(n)]=E[sout2(n)]=E[sin2(n)]. Taking the general overlap-add equation above, squaring both sides, taking the expected value, and simplifying given the above conditions yields:E[s2(n)]=E[sin2(n)]·(wout2(n)+win2(n))which implies thatwout2(n)+win2(n)=1 n=0. . . N−1.
As can be seen, the optimal overlap window for correlated and uncorrelated signals is different. If the optimal window for uncorrelated signals is used for correlated signals, it can be shown (again assuming sin(n)=sout(n)) that:s(n)=sin(n)·√{square root over (1+2win(n)wout(n))}{square root over (1+2win(n)wout(n))}In this case, the signal amplitude is modulated by a window-dependent term. Likewise, if the optimal window for correlated signals is used for uncorrelated signals, it can be shown that:E[s2(n)]=E[sin2(n)]·[1−2win(n)wout(n)]Here, the energy is modulated by a window-dependent term. The greatest attenuation occurs when win(n)=wout(n)=0.5 resulting in a 3 dB attenuation of the output signal energy.
When sin and sout are overlapped signals from a codec, as in the audio codecs mentioned above, the two signals have a high cross correlation, regardless if the original signal itself is correlated. In this case, a window with the property above for correlated signals is used exclusively. However, in applications such as frame loss concealment, some overlap-add is often required to maintain a smooth transition between the concealed waveform and the adjacent received signals. Depending on the properties of the neighboring signal, the cross correlation can vary. In speech, for example, periodic waveform extrapolation is a method used to conceal the lost frame during “voiced” speech. In this case, the overlapping signals generally have a high cross correlation. However, during “unvoiced” speech, the waveform is more random or noise-like. Some form of colored random noise is generally used, in which case the cross correlation is very low. In other areas of speech, the signal is a mix, containing both a long term (pitch) periodic component and a noise-like component. Using a single overlap window will cause audible distortion when the window properties do not match the signal properties.