1. Field of the Invention
The present invention relates in general to signal coding and more particularly, to variable bit rate speech coding.
2. Background
Speech coding is traditionally driven by bandwidth considerations and efficiency. As a result, modern communication systems typically implement various speech coding and compression techniques to reduce requirements on bandwidth and to achieve higher transmission efficiency.
One typical scheme for providing speech coding is a technique called Pulse Code Modulation (“PCM”) that is used for converting speech signals into digital form and is widely used by the telephone companies in their T1 circuits. Every minute of the day, millions of telephone conversations, as well as data transmissions via modems, are converted into digital via PCM for transport over high-speed intercity trunks. PCM samples the analog waves 8,000 times per second and converts each sample into an 8-bit number, resulting in a 64 kbps data stream. In fact, the PCM technique has been adopted by the International Telecommunication Union (“ITU”) under G.711 standard which defines a single rate coding method at 64 kbps.
Another technique adopted by the ITU utilizes a method called Adaptive Differential PCM (“ADPCM”) that converts analog sound, such as speech, into digital. Using this technique, in lieu of coding an absolute measurement at each sample point, the difference between samples is coded. ADPCM can dynamically switch the coding scale to compensate for variations in amplitude. The ITU standards that have utilized this technique include G.721 (32 kbps), G.722 (64 kbps), G.723 (20 kbps and 40 kbps), G.726 (16 kbps, 24 kbps, 32 kbps and 40 kbps) and G.727 (16 kbps, 24 kbps, 32 kbps and 40 kbps).
A more recent ITU standard has adopted the Code Excited Linear Prediction Technique (“CELP”) in G.729 family, the main body and Annex A (8 kbps), Annex B (0 kbps and 1.5 kbps), Annex D (6.4 kbps), Annex E (11.2 kbps), and Annex I (0 kbps, 1.5 kbps, 6.4 kbps, 8 kbps and 11.2 kbps) that achieves high compression ratios along with toll quality narrow-band (telephone band) audio. A similar method has also been utilized in G.723.1 (5.3 kbps and 6.4 kbps). And a method called Low-Delay CELP (“LD-CELP”) has been used in G.728 (16 kbps) standards and provides near toll quality audio by using a smaller sample size that is processed faster, resulting in lower delays.
As noted above, G.723, G.726, G.727, G.729 Annex I and G.723.1 standards define a multi-rate capability for speech data transfer. Today, these multi-rates have been taken advantage of by the network providers, such as AT&T, MCI or Sprint, which control data bit rates according to predetermined factors, such as time of the day or particular usage of the network. For example, the network providers may decide to save network bandwidth during business hours and limit the data bit rate to 6.4 kbps. After business hours, however, the network providers may increase the data bit rate to 11.2 kbps. Yet, the network providers may allocate certain lines for high quality speech data transfer during specific hours.
FIG. 1 illustrates a typical system 100 used by the network providers for implementing the above schemes. As shown, system 100 includes a plurality of speech encoders 1,2, . . . , n, enumerated as modules 130, 140, . . . , 150, respectively. In one embodiment, system 100 may be ITU G.729 Annex I compatible and speech encoder 130 may encode at 6.4 kbps, speech encoder 140 may encode at 8.0 kbps and speech encoder 150 may encode at 11.2 kbps.
As shown in FIG. 1, encoder selector 112 is positioned by the network controller 120. As stated above, the selector 112 is positioned in accordance with predetermined factors under the network provider control. For example, the network controller 120 may decide to use the speech encoder 150 at data bit rate of 11.2 kbps after business hours or from 2:00 p.m. to 4:00 p.m. when communication channel 160 is utilized for music broadcast which requires high data rates to preserve the speech quality. On the other hand, the network controller 120 may position the encoder selector 112 so as to select the speech encoder 130 at data bit rate of 6.4 kbps for voice communications from 4:00 p.m. to 8:00 p.m.
While such traditional multi-rate speech encoders have been successfully implemented in digital communication systems, they are restricted in use and application. Such systems are disadvantageous and inflexible, since data bit rates are set based on predetermined factors that may or may not hold true. As a result, too little or too much of the network bandwidth may be used for a given speech. For example, high quality speech, such as music, may be transmitted on a communication channel selected to transmit at low date rates, and thus, causing degradation in the quality. On the other hand, a high data rate communication channel may be wasted if only low quality speech, such as voice which does not require a high bandwidth, is transmitted.
Accordingly, there is an intense need in the technology for a flexible speech encoder that can efficiently utilize the bandwidth of a given communication channel. Furthermore, there is a strong need in the industry for a speech encoder system that can combine various speech encoding schemes while maintaining interoperability with the exiting speech decoders and standards.