Demand for efficient digital narrowband and wideband speech coding techniques with a good trade-off between the subjective quality and bit rate is increasing in various application areas such as teleconferencing, multimedia, and wireless communications. Until recently, telephone bandwidth constrained into a range of 200-3400 Hz has mainly been used in speech coding applications. However, wideband speech applications provide increased intelligibility and naturalness in communication compared to the conventional telephone bandwidth. A bandwidth in the range 50-7000 Hz has been found sufficient for delivering a good quality giving an impression of face-to-face communication. For general audio signals, this bandwidth gives an acceptable subjective quality, but is still lower than the quality of FM radio or CD that operate on ranges of 20-16000 Hz and 20-20000 Hz, respectively.
A speech coder converts a speech signal into a digital bit stream which is transmitted over a communication channel or stored in a storage medium. The speech signal is digitized, that is, sampled and quantized with usually 16-bits per sample. The speech coder has the role of representing these digital samples with a smaller number of bits while maintaining a good subjective quality of speech. The speech decoder or synthesizer operates on the transmitted or stored bit stream and converts it back to a speech signal.
Code-Excited Linear Prediction (CELP) coding is one of the best prior art techniques for achieving a good compromise between the subjective quality and bit rate. This coding technique constitutes the basis of several speech coding standards both in wireless and wire line applications. In CELP coding, the sampled speech signal is processed in successive blocks of N samples usually called frames, where N is a predetermined number corresponding typically to 10-30 ms. A linear prediction (LP) filter is computed and transmitted every frame. The computation of the LP filter typically needs a look-ahead, i.e. a 5-15 ms speech segment from the subsequent frame. The N-sample frame is divided into smaller blocks called subframes. Usually the number of subframes in a frame is three (3) or four (4) resulting in 4-10 ms subframes. In each subframe, an excitation signal is usually obtained from two components, the past excitation and the innovative, fixed-codebook excitation. The component formed from the past excitation is often referred to as the adaptive codebook or pitch excitation. The parameters characterizing the excitation signal are coded and transmitted to the decoder, where the reconstructed excitation signal is used as the input of the LP filter.
In wireless systems using Code Division Multiple Access (CDMA) technology, the use of source-controlled Variable Bit Rate (VBR) speech coding significantly improves the capacity of the system. In source-controlled VBR coding, the codec operates at several bit rates, and a rate selection module is used to determine the bit rate used for coding each speech frame based on the nature of the speech frame (e.g. voiced, unvoiced, transient, background noise, etc.). The goal is to attain the best speech quality at a given average bit rate, also referred to as Average Data Rate (ADR). The codec can operate at different modes by tuning the rate selection module to attain different ADRs at the different modes, where codec performance improves with increasing ADRs. This provides the codec with a mechanism of trade-off between speech quality and system capacity. In CDMA systems (e.g. CDMA-one and CDMA2000), typically 4 bit rates are used and they are referred to as Full-Rate (FR), Half-Rate (HR), Quarter-Rate (QR), and Eighth-Rate (ER). In this system two rate sets are supported referred to as Rate Set I and Rate Set II. In Rate Set II, a variable-rate codec with rate selection mechanism operates at source-coding bit rates of 13.3 (FR), 6.2 (HR), 2.7 (QR), and 1.0 (ER) kbit/s, corresponding to gross bit rates of 14.4, 7.2, 3.6, and 1.8 kbit/s (with some bits added for error detection).
In CDMA systems, the half-rate can be imposed instead of full-rate in some speech frames in order to send in-band signaling information (called dim-and-burst signaling). The use of half-rate as a maximum bit rate can be also imposed by the system during bad channel conditions (such as near the cell boundaries) in order to improve the codec robustness. This is referred to as half-rate max. Typically, in VBR coding, the half rate is used when the frame is stationary voiced or stationary unvoiced. Two codec structures are used for each type of signal (in unvoiced case a CELP model without the pitch codebook is used and in voiced case signal modification is used to enhance the periodicity and reduce the number of bits for the pitch indices). Full-rate is used for onsets, transient frames, and mixed voiced frames (a typical CELP model is usually used). When the rate-selection module chooses the frame to be encoded as a full-rate frame and the system imposes the half-rate frame the speech performance is degraded since the half-rate modes are not capable of efficiently encoding onsets and transient signals.
A wideband codec known as Adaptive Multi-Rate WideBand (AMR-WB) speech codec was recently selected by the ITU-T (International Telecommunications Union-Telecommunication Standardization Sector) for several wideband speech telephony and services and by 3GPP (Third Generation Partnership Project) for GSM and W-CDMA third generation wireless systems. The AMR-WB codec comprises nine (9) bit rates in the range from 6.6 to 23.85 kbit/s. Designing an AMR-WB-based source controlled VBR codec for CDMA2000 system has the advantage of enabling interoperation between CDMA2000 and other systems using the AMR-WB codec. The AMR-WB bit rate of 12.65 kbit/s is the closest rate that can fit in the 13.3 kbit/s full-rate of Rate Set II. This rate can be used as the common rate between a CDMA2000 wideband VBR codec and AMR-WB to enable interoperability without the need for transcoding (which degrades the speech quality). A half-rate at 6.2 kbit/s has to be added to the CDMA2000 VBR wideband solution to enable the efficient operation in the Rate Set II framework. The codec can then operate in few CDMA2000-specific modes and comprises a mode for enabling interoperability with systems using the AMR-WB codec. However, in a cross-system tandem free operation call between CDMA2000 and another system using AMR-WB, the CDMA2000 system can force the use of the half-rate as explained earlier (such as in dim-and-burst signaling). Since the AMR-WB codec does not recognize the 6.2 kbit/s half-rate of the CDMA2000 wideband codec, forced half-rate frames are interpreted as erased frames. This adversely affects the performance of the connection.