At present, communication transmission has been placing more and more importance on quality of audio. Therefore, it is required that music quality is improved as much as possible during coding and decoding while ensuring the voice quality. Music signals usually carry much more abundant information, so a traditional voice CELP (Code Excited Linear Prediction, code excited linear prediction) coding mode is not suitable for coding the music signals. Generally, a transform coding mode is use to process the music signals in a frequency domain to improve the coding quality of the music signals. However, it is a hot top for research in the field of current audio coding on how to effectively use the limited coding bits to efficiently code information.
The current audio coding technology generally uses FFT (Fast Fourier Transform, fast Fourier transform) or MDCT (Modified Discrete Cosine Transform, modified discrete cosine transform) to transform time domain signals to the frequency domain, and then code the frequency domain signals. A limit number of bits for quantification in the case of a low bit rate fail to quantize all audio signals. Therefore, generally the BWE (Bandwidth Extension, bandwidth extension) technology and the spectrum overlay technology may be used.
At the coding end, first input time domain signals are transformed to the frequency domain, and a sub-band normalization factor, that is, envelop information of a spectrum, is extracted from the frequency domain. The spectrum is normalized by using the quantized sub-band normalization factor to obtain the normalized spectrum information. Finally, bit allocation for each sub-band is determined, and the normalized spectrum is quantized. In this manner, the audio signals are coded into quantized envelop information and normalized spectrum information, and then bit streams are output.
The process at a decoding end is inverse to that at a coding end. During low-rate coding, the coding end is incapable of coding all frequency bands; and at the decoding end, the bandwidth extension technology is required to recover frequency bands that are not coded at the coding end. Meanwhile, a lot of zero frequency points may be produced on the coded sub-band due to limitation of a quantifier, so a noise filling module is needed to improve the performance. Finally, the decoded sub-band normalization factor is applied to a decoded normalization spectrum coefficient to obtain a reconstructed spectrum coefficient, and an inverse transform is performed to output time domain audio signals.
However, during the coding process, a high-frequency harmonic may be allocated with some dispersed bits for coding. However, in this case, the distribution of bits at the time axis is not continuous, and consequently a high-frequency harmonic reconstructed during decoding is not smooth, with interruptions. This produces much noise, causing a poor quality of the reconstructed audio.