1. Field of the Invention
The present invention relates to digital communication systems. More particularly, the present invention relates to the enhancement of audio quality when portions of a bit stream representing an audio signal are lost within the context of a digital communications system.
2. Background Art
In audio coding (sometimes called “audio compression”), a coder encodes an input audio signal into a compressed digital bit stream for transmission or storage, and a decoder decodes the transmitted or stored bit stream into an output audio signal. The combination of the coder and the decoder is called a codec. The compressed bit stream is usually partitioned into frames. When the decoder decodes the bit stream, certain frames of the compressed bit stream may be deemed “lost” and thus not available for the normal decoding operation. This frame loss may be due to late or dropped packets in a packet transmission system or to severely corrupted frames in a wireless transmission system. Frame loss may even occur in audio storage applications for a variety of reasons.
When frame loss occurs, the decoder needs to perform special operations to try to conceal the quality-degrading effects of the lost frames; otherwise, the output audio quality may degrade severely. These special operations at the decoder have been given various names, such as “frame loss concealment (FLC)”, “frame erasure concealment (FEC)”, or “packet loss concealment (PLC)”. These names are used interchangeably herein.
One of the simplest and most common FLC techniques consists of repeating the bit stream of the last good frame preceding the lost frame, and decoding the repeated bit stream normally as if it were the received bit stream for the lost frame. This scheme is commonly called the “Frame Repeat” method. If the audio codec performs instantaneous quantization (such as Pulse Code Modulation (PCM)) without any overlap-add operation, the application of such a frame repeat method will generally cause waveform discontinuities at the frame boundaries. These waveform discontinuities will give rise to undesired audible artifacts that may be perceived as “clicks” by the listener.
On the other hand, modern audio codecs typically perform frequency-domain transforms, such as Fast Fourier Transform (FFT) or Modified Discrete Cosine Transform (MDCT), and such transforms are typically performed on a windowed version of the input signal, wherein adjacent windows are to some extent overlapping. The corresponding audio decoders typically synthesize the output audio signals by using an overlap-add technique that is well-known in the art. When used with such modern audio codecs, the frame repeat FLC method generally will not cause waveform discontinuities at the frame boundaries, because the overlap-add operation gradually transitions between one piece of waveform and the next overlapping piece of waveform, thus smoothing out waveform discontinuities at the frame boundaries.
Even though the frame repeat method will not cause waveform discontinuities if it is used with audio codecs that employ overlap-add synthesis at the decoder, it can still result in audible distortion for certain types of audio signals, especially those signals that are nearly periodic, such as the vowels portions of speech signals (voiced speech). This is understandable since the waveform repeated at the frame rate is generally not aligned or “in phase” with the original input waveform in the lost frame. When the frame repeat method overlaps such two “out-of-phase” waveforms and adds them together, the resulting output signal usually includes an audible disturbance that will make the output signal sound a little “busy” and not as “clean” as the original signal. Therefore, the frame repeat method generally performs poorly for nearly periodic signals such as voiced speech.
What is surprising is that when used with audio codecs employing overlap-add synthesis at the decoder (which include most of the modern audio codec standards), the frame repeat FLC method has been found to work surprisingly well for a large variety of audio signals that are “busy-sounding” and far from periodic. This is because for such busy-sounding audio signals, there is not a well-defined “phase”, and the disturbance resulting from out-of-phase overlap-add is not nearly as pronounced as in the case of nearly periodic signals. In other words, any residual disturbance in the output audio signal is likely hidden by the busy sounds in the audio signal. For such audio signals, it is actually quite difficult to perceive the distortion caused by the frame repeat FLC method.
In contrast to the simple frame repeat FLC method, at the other extreme there is another class of FLC methods that use sophisticated signal processing algorithms to try to extrapolate waveforms based on previously-received good frames to fill the waveform gaps corresponding to the lost frames. Many of these FLC methods perform periodic waveform extrapolation (PWE) when the decoded waveform corresponding to the good frames that preceded the current lost frame is deemed to be roughly periodic. For non-periodic signals these methods use various kinds of other techniques to extrapolate the waveform. Examples of this class of PWE-based FLC methods include, but are not limited to, the method proposed by Goodman, et al. in “Waveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communications”, IEEE Transaction on Acoustics, Speech and Signal Processing, December 1986, pp. 1440-1448, the PLC method of ITU-T Recommendation G.711 Appendix I developed by D. Kapilow, and the method developed by J.-H. Chen as described in U.S. patent application Ser. No. 11/234,291, filed Sep. 26, 2005 and entitled “Packet Loss Concealment for Block-Independent Speech Codecs”. The entirety of each of these documents is incorporated by reference herein in its entirety.
This class of PWE-based FLC methods is usually tuned for speech signals, and thus these methods usually work quite well for speech. However, when applied to general audio signals such as music, these methods do not perform as well and tend to generate more audible distortion. One of the most common problems is that for busy-sounding music signals, the use of periodic waveform extrapolation often generates a “buzzing” sound. This is due to the fact that the periodically-extrapolated waveform is more periodic than the original waveform corresponding to the lost frames.
To summarize, when used with audio codecs employing overlap-add synthesis in the decoder, the frame repeat FLC method works well for most music signals but performs poorly for speech signals. On the other hand, PWE-based FLC methods work well for speech signals but often produce an audible “buzzing” for busy, non-periodic music signals. However, many audio signals, such as those associated with movie soundtracks, television, and radio programs, frequently change between pure speech, pure music, and a combination of speech and music. Consequently, using either a frame repeat or a PWE-based FLC method will result in performance problems for at least some portion(s) of the audio signal.
What is needed therefore is an FLC technique that works well for both speech and music. Ideally, the desired FLC method should be “universal” in that it works well for any kind of audio signal, but at the very least, the desired FLC method should work well for both speech and music, since speech and music are the dominant types of audio signals in soundtracks for movie, television, and radio. The present invention addresses this problem and can achieve good performance for both speech and music signals.
It is noted that the classification-based frame loss concealment system of the present invention is an improvement over the classification-based frame loss concealment system described in co-owned, commonly pending U.S. patent application Ser. No. 11/285,311 to Chen, filed Nov. 23, 2005, and entitled “Classification-Based Frame Loss Concealment for Audio Signals,” the entirety of which is incorporated by reference herein.