Transmission of audio signals, such as voice and music, by digital techniques has become widespread, particularly in long distance telephony, packet-switched telephony such as Voice over IP (also called VoIP, where IP denotes Internet Protocol), and digital radio telephony such as cellular telephony. Such proliferation has created interest in reducing the amount of information used to transfer a voice communication over a transmission channel while maintaining the perceived quality of the reconstructed speech. For example, it is desirable to make the best use of available wireless system bandwidth. One way to use system bandwidth efficiently is to employ signal compression techniques. For wireless systems which carry speech signals, speech compression (or “speech coding”) techniques are commonly employed for this purpose.
Devices that are configured to compress speech by extracting parameters that relate to a model of human speech generation are often called vocoders, “audio coders,” or “speech coders.” (These three terms are used interchangeably herein.) A speech coder generally includes an encoder and a decoder. The encoder typically divides the incoming speech signal (a digital signal representing audio information) into segments of time called “frames,” analyzes each frame to extract certain relevant parameters, and quantizes the parameters into an encoded frame. The encoded frames are transmitted over a transmission channel (i.e., a wired or wireless network connection) to a receiver that includes a decoder. The decoder receives and processes encoded frames, dequantizes them to produce the parameters, and recreates speech frames using the dequantized parameters.
In a typical conversation, each speaker is silent for about sixty percent of the time. Speech encoders are usually configured to distinguish frames of the speech signal that contain speech (“active frames”) from frames of the speech signal that contain only silence or background noise (“inactive frames”). Such an encoder may be configured to use different coding modes and/or rates to encode active and inactive frames. For example, speech encoders are typically configured to use fewer bits to encode an inactive frame than to encode an active frame. A speech coder may use a lower bit rate for inactive frames to support transfer of the speech signal at a lower average bit rate with little to no perceived loss of quality.
Examples of bit rates used to encode active frames include 171 bits per frame, eighty bits per frame, and forty bits per frame. Examples of bit rates used to encode inactive frames include sixteen bits per frame. In the context of cellular telephony systems (especially systems that are compliant with Interim Standard (IS)-95 as promulgated by the Telecommunications Industry Association, Arlington, Va., or a similar industry standard), these four bit rates are also referred to as “full rate,” “half rate,” “quarter rate,” and “eighth rate,” respectively.