Transmission of voice by digital techniques has become widespread, particularly in long distance telephony, packet-switched telephony such as Voice over IP (also called VoIP, where IP denotes Internet Protocol), and digital radio telephony such as cellular telephony. Such proliferation has created interest in reducing the amount of information used to transfer a voice communication over a transmission channel while maintaining the perceived quality of the reconstructed speech.
Devices that are configured to compress speech by extracting parameters that relate to a model of human speech generation are called “speech coders.” A speech coder (also called a speech codec or vocoder) generally includes a speech encoder and a speech decoder. The speech encoder typically divides the incoming speech signal (a digital signal representing audio information) into segments of time called “frames,” analyzes each frame to extract certain relevant parameters, and quantizes the parameters into an encoded frame. The encoded frames are transmitted over a transmission channel (i.e., a wired or wireless network connection) to a receiver that includes a speech decoder. The speech decoder receives and processes encoded frames, dequantizes them to produce the parameters, and recreates speech frames using the dequantized parameters.
Speech encoders are usually configured to distinguish frames of the speech signal that contain speech (“active frames”) from frames of the speech signal that contain only silence or background noise (“inactive frames”). A speech encoder may be configured to use different coding modes and/or rates to encode active and inactive frames. For example, speech encoders are typically configured to use fewer bits to encode an inactive frame than to encode an active frame. A speech coder may use a lower bit rate for inactive frames, and/or different bit rates for different types of active frames, to support transfer of the speech signal at a lower average bit rate with little to no perceived loss of quality.
Voice communications over the public switched telephone network (PSTN) have traditionally been limited in bandwidth to the frequency range of 300-3400 kilohertz (kHz). More recent networks for voice communications, such as networks that use cellular telephony and/or VoIP, may not have the same bandwidth limits, and it may be desirable for apparatus using such networks to have the ability to transmit and receive voice communications that include a wideband frequency range. For example, it may be desirable for such apparatus to support an audio frequency range that extends down to 50 Hz and/or up to 7 or 8 kHz. It may also be desirable for such apparatus to support other applications, such as high-quality audio or audio/video conferencing, delivery of multimedia services such as music and/or television, etc., that may have audio speech content in ranges outside the traditional PSTN limits.
Extension of the range supported by a speech coder into higher frequencies may improve intelligibility. For example, the information in a speech signal that differentiates fricatives such as ‘s’ and ‘f’ is largely in the high frequencies. Highband extension may also improve other qualities of the decoded speech signal, such as presence. For example, even a voiced vowel may have spectral energy far above the PSTN frequency range.