1. Field of the Invention
The present invention relates to digital communication systems. More particularly, the present invention relates to the enhancement of speech or audio quality when portions of a bit stream representing a speech signal are lost within the context of a digital communication system.
2. Background Art
In speech coding (sometimes called “voice compression”), a coder encodes an input speech or audio signal into a digital bit stream for transmission. A decoder decodes the bit stream into an output speech signal. The combination of the coder and the decoder is called a codec. The transmitted bit stream is usually partitioned into segments called frames, and in packet transmission networks, each transmitted packet may contain one or more frames of a compressed bit stream. In wireless or packet networks, sometimes the transmitted frames or packets are erased or lost. This condition is called frame erasure in wireless networks and packet loss in packet networks. When this condition occurs, to avoid substantial degradation in output speech quality, the decoder needs to perform frame erasure concealment (FEC) or packet loss concealment (PLC) to try to conceal the quality-degrading effects of the lost frames.
For a PLC or FEC algorithm, the packet loss and frame erasure amount to the same thing: certain transmitted frames are not available for decoding, so the PLC or FEC algorithm needs to generate a waveform to fill up the waveform gap corresponding to the lost frames and thus conceal the otherwise degrading effects of the frame loss. Because the terms FLC and PLC generally refer to the same kind of technique, they can be used interchangeably. Thus, for the sake of convenience, the term “packet loss concealment,” or PLC, is used herein to refer to both.
When a frame of transmitted voice data is lost, conventional PLC methods usually extrapolate the missing waveform based on only a waveform that precedes the lost frame in the audio signal. If the waveform extrapolation is performed properly, there will usually be no audible distortion during the lost frame (also referred to herein as a “bad” frame). Audible distortion usually occurs, however, during the first good frame or first few good frames immediately following a frame erasure or packet loss, where the extrapolated waveform needs to somehow merge with the normally-decoded waveform corresponding to the first good frame(s). What often happens is that the extrapolated waveform can be out of phase with respect to the normally-decoded waveform after a frame erasure or packet loss. Although the use of an overlap-add method will reduce waveform discontinuity, it cannot fix the problem of destructive interference between the extrapolated waveform and the normally-decoded waveform after a frame erasure or packet loss if the two waveforms are out of phase. This is the main source of the audible distortion in conventional PLC systems.