1. Field of the Invention
The present invention relates to digital communication systems. More particularly, the present invention relates to the enhancement of audio quality when portions of a bit stream representing an audio signal are lost within the context of a digital communications system.
2. Background Art
In audio coding (sometimes called “audio compression”), a coder encodes an input audio signal into a compressed digital bit stream for transmission or storage, and a decoder decodes the transmitted or stored bit stream into an output audio signal. The combination of the coder and the decoder is called a codec. The compressed bit stream is usually partitioned into frames. When the decoder decodes the bit stream, certain frames of the compressed bit stream may be deemed “lost” and thus not available for the normal decoding operation. This frame loss may be due to late or dropped packets in a packet transmission system, or to severely corrupted frames in a wireless transmission system. Frame loss may even occur in audio storage applications for a variety of reasons.
When frame loss occurs, the decoder needs to perform special operations to try to conceal the quality-degrading effects of the lost frames; otherwise, the output audio quality may degrade severely. These special operations at the decoder have been given various names, such as “frame loss concealment (FLC)”, “frame erasure concealment (FEC)”, or “packet loss concealment (PLC)”. These names are used interchangeably herein.
One of the simplest and most common FLC techniques consists of repeating the bit stream of the last good frame preceding the lost frame, and decoding the repeated bit stream normally as if it were the received bit stream for the lost frame. This scheme is commonly called the “Frame Repeat” method. If the audio codec uses instantaneous quantization such as Pulse Code Modulation (PCM) without any overlap-add operation, then the application of such a frame repeat method will generally cause waveform discontinuities at the frame boundaries, which will give rise to audible artifacts that sound like some sort of “clicks”.
On the other hand, modern audio codecs typically perform frequency-domain transforms, such as Fast Fourier Transform (FFT) or Modified Discrete Cosine Transform (MDCT), and such transforms are typically performed on a windowed version of the input signal, wherein adjacent windows are to some extent overlapping. The corresponding audio decoders typically synthesize the output audio signals by using an overlap-add technique that is well-known in the art. With such modern audio codecs, the frame repeat FLC method generally will not cause waveform discontinuities at the frame boundaries, because the overlap-add operation gradually transitions between one piece of waveform and the next overlapping piece of waveform, thus smoothing out waveform discontinuities at the frame boundaries.
Even though the frame repeat method will not cause waveform discontinuities if it is used with audio codecs that employ overlap-add synthesis at the decoder, it can still result in audible distortion for certain types of audio signals, especially those signals that are nearly periodic, such as the vowels portions of speech signals (voiced speech). This is understandable since the waveform repeated at the frame rate is generally not aligned or “in phase” with the original input waveform in the lost frame. When the frame repeat method overlaps such two “out-of-phase” waveforms and adds them together, the resulting output signal usually includes some sort of audible disturbance that makes the output signal sound a little “busy” and not as “clean” as the original signal. Therefore, the frame repeat method generally performs poorly for nearly periodic signals such as voiced speech.
What is surprising is that when used with audio codecs employing overlap-add synthesis at the decoder (which include most of the modern audio codec standards), the frame repeat FLC method has been found to work surprisingly well for a large variety of audio signals that are “busy-sounding” and far from periodic. This is because for such busy-sounding audio signals, there is not a well-defined “phase”, and the disturbance resulting from out-of-phase overlap-add is not nearly as pronounced as in the case of nearly periodic signals. Any residual “disturbance” in the output audio signal is probably “buried” by the busy sounds in the audio signal anyway. For such audio signals, perceptually it is actually quite difficult to detect the distortion caused by the frame repeat FLC method.
In contrast to the simple frame repeat FLC method, at the other extreme there is another class of FLC methods that use sophisticated signal processing algorithms to try to extrapolate waveforms based on previously-received good frames to fill the waveform gaps corresponding to the lost frames. Many of these FLC methods perform periodic waveform extrapolation (PWE) when the decoded waveform corresponding to the good frames that preceded the current lost frame is deemed to be roughly periodic. For non-periodic signals these methods use various kinds of other techniques to extrapolate the waveform. Examples of this class of PWE-based FLC methods include, but are not limited to, the method proposed by Goodman, et al. in “Waveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communications”, IEEE Transaction on Acoustics, Speech and Signal Processing, December 1986, pp. 1440-1448, the PLC method of ITU-T Recommendation G.711 Appendix I developed by D. Kapilow, and the method developed by J.-H. Chen as described in U.S. patent application Ser. No. 11/234,291, filed Sep. 26, 2005 and entitled “Packet Loss Concealment for Block-Independent Speech Codecs”. The entirety of each of these documents is incorporated by reference herein in its entirety.
This class of PWE-based FLC methods is usually tuned for speech signals, and thus these methods usually work quite well for speech. However, when applied to general audio signals such as music, while they still work, these methods tend to have more problems and audible distortion. One of the most common problems is that for busy-sounding music signals, the periodic waveform extrapolation of these techniques often causes some “buzz” sounds, because the periodically extrapolated waveform is more periodic than the original waveform corresponding to the lost frames.
To summarize, when used with audio codecs employing overlap-add synthesis in the decoder, the frame repeat FLC method works well for most music signals but performs poorly for speech. On the other hand, PWE-based FLC methods work well for speech but often produce an audible “buzz” for busy, non-periodic music signals. However, in many applications, such as the sound tracks in movie, television, and radio programs, the audio signal frequently changes between pure speech, pure music, and speech in music. In this case, using either frame repeat or PWE-based FLC methods will have performance problems at least for some portions of the audio signal.
What is needed therefore is an FLC technique that works well at least for both speech and music. Ideally, the desired FLC method should be “universal” such that it works well for any kind of audio signal, but at the very least, the desired FLC method should work well for both speech and music, since speech and music are the dominant types of audio signals in sound tracks for movie, TV, and radio. The present invention addresses this problem and can achieve good performance for both speech and music signals.