Transmission of audio, such as voice and music, by digital techniques has become widespread, particularly in long distance telephony, packet-switched telephony such as Voice over IP (also called VoIP, where IP denotes Internet Protocol), and digital radio telephony such as cellular telephony. Such proliferation has created interest in reducing the amount of information used to transfer a voice communication over a transmission channel while maintaining the perceived quality of the reconstructed speech. For example, it is desirable to make the best use of available wireless system bandwidth. One way to use system bandwidth efficiently is to employ signal compression techniques. For wireless systems which carry speech signals, speech compression (or “speech coding”) techniques are commonly employed for this purpose.
Devices that are configured to compress speech by extracting parameters that relate to a model of human speech generation are often called vocoders, “audio coders,” or “speech coders.” An audio coder generally includes an encoder and a decoder. The encoder typically divides the incoming speech signal (a digital signal representing audio information) into segments of time called “frames,” analyzes each frame to extract certain relevant parameters, and quantizes the parameters into an encoded frame. The encoded frames are transmitted over a transmission channel (i.e., a wired or wireless network connection) to a receiver that includes a decoder. The decoder receives and processes encoded frames, dequantizes them to produce the parameters, and recreates speech frames using the dequantized parameters.
In a typical conversation, each speaker is silent for about sixty percent of the time. Speech encoders are usually configured to distinguish frames of the speech signal that contain speech (“active frames”) from frames of the speech signal that contain only silence or background noise (“inactive frames”). Such an encoder may be configured to use different coding modes and/or rates to encode active and inactive frames. For example, speech encoders are typically configured to use fewer bits to encode an inactive frame than to encode an active frame. A speech coder may use a lower bit rate for inactive frames to support transfer of the speech signal at a lower average bit rate with little to no perceived loss of quality.
Examples of bit rates used to encode active frames include 171 bits per frame, eighty bits per frame, and forty bits per frame. Examples of bit rates used to encode inactive frames include sixteen bits per frame. In the context of cellular telephony systems (especially systems that are compliant with Interim Standard (IS)-95 as promulgated by the Telecommunications Industry Association, Arlington, Va., or a similar industry standard), these four bit rates are also referred to as “full rate,” “half rate,” “quarter rate,” and “eighth rate,” respectively.
Many communication systems that employ speech coders, such as cellular telephone and satellite communications systems, rely on wireless channels to communicate information. In the course of communicating such information, a wireless transmission channel can suffer from several sources of error, such as multipath fading. Errors in transmission may lead to unrecoverable corruption of a frame, also called “frame erasure.” In a typical cellular telephone system, frame erasure occurs at a rate of one to three percent and may even reach or exceed five percent.
The problem of packet loss in packet-switched networks that employ audio coding arrangements (e.g., Voice over Internet Protocol or “VoIP”) is very similar to frame erasure in the wireless context. That is, due to packet loss, an audio decoder may fail to receive a frame or may receive a frame having a significant number of bit errors. In either case, the audio decoder is presented with the same problem: the need to produce a decoded audio frame despite the loss of compressed speech information. For purposes of this description, the term “frame erasure” may be deemed to include “packet loss.”
Frame erasure may be detected at the decoder according to a failure of a check function, such as a CRC (cyclic redundancy check) function or other error detection function that uses, e.g., one or more checksums and/or parity bits. Such a function is typically performed by a channel decoder (e.g., in a multiplex sublayer), which may also perform tasks such as convolutional decoding and/or de-interleaving. In a typical decoder, a frame-error detector sets a frame erasure flag upon receiving an indication of an uncorrectable error in a frame. The decoder may be configured to select a frame erasure recovery module to process a frame for which the frame erasure flag is set.