In the 1980s, all traditional speech encoding and decoding methods are based on the pulse code modulation (PCM) technology. For example, G.711 is a speech encoding and decoding technology completely based on PCM; G.722 is a speech encoding and decoding technology based on adaptive differential pulse code modulation (ADPCM), where ADPCM is improved PCM. The PCM technology is usually applied to narrowband signals or wideband signals. Because the speech of people is also centered on narrowband or wideband, the technology has a good speech encoding and decoding effect.
With the development of network technologies and increase of the network bandwidth and transmission rate, people impose higher requirements on quality of the speech/audio in communication. More and more communication standardization organizations are researching technologies for encoding, decoding and transmitting wideband, ultra-wideband, and even full-band and stereo speech/audio signals. To get compatible with the traditional speech encoding and decoding methods, most bandwidth extension standards, such as the wideband extension standard G.711.1 of G.711 of the International Telecommunication Union (ITU) and the combined ultra-wideband stereo extension project G.711.1/G.722, extend the bandwidth based on the original narrowband or wideband single-channel codec. The traditional narrowband or wideband encoding and decoding methods are referred to as the core layer of an extended codec that corresponds to the traditional narrowband or wideband encoding and decoding methods.
The above extension method is compatible with the traditional encoding and decoding methods, but also brings about some problems. Because the core layer usually uses a simple PCM encoding and decoding method, the encoding and decoding quality is poor; to ensure the quality of an entire wideband signal, the corresponding extension method must further enhance the encoding and decoding quality of the core layer. In the prior art, the method for enhancing the encoding and decoding quality of the core layer is categorized into the following two types:
One is: No extra bit is added, and the core layer enhancement is performed by using the pre-processing (such as noise shaping processing) or post-processing technology; the merit of this method is that no extra bit is used, but the application scope is limited to some extent; for most traditional codecs, using this method cannot get a good enhancement effect.
The other is: Without changing the traditional core layer encoding and decoding method, sufficient scalar or vector quantized bits are added to improve the precision of core layer encoding, thus enhancing the core layer quality; the demerit of this method is that a large number of extra bits are required; if the core layer is a PCM-based scalar quantizer, each sample point is enhanced by consuming 2 bits, which increases the burden of the extended codec greatly; no sufficient bits are available in many cases, and therefore the enhancement quality of the core layer is not ensured.