In the field of digital communications, there are extremely widespread application requirements for voice, picture, audio, and video transmission, such as a phone call, an audio and video conference, broadcast television, and multimedia entertainment. To reduce a resource occupied in a process of storing or transmitting an audio and video signal, an audio and video compression and encoding technology comes into existence. Many different technical branches emerge in the development of the audio and video compression and encoding technology, where a technology in which a signal is encoded and processed after being transformed from a time domain to a frequency domain is widely applied due to a good compression characteristic, and the technology is also referred to as a domain transformation encoding technology.
An increasing emphasis is placed on audio quality in communication transmission; therefore, there is a need to increase quality of a music signal as much as possible on a premise that voice quality is ensured. Meanwhile, the amount of information of an audio signal is extremely rich; therefore, a code excited linear prediction (CELP) encoding mode of conventional voice cannot be adopted; instead, generally, to process the audio signal, a time domain signal is transformed into a frequency domain signal using an audio encoding technology of domain transformation encoding, thereby enhancing encoding quality of the audio signal.
In an existing audio encoding technology, generally, by adopting a transformation technology, such as a fast Fourier transform (FFT) or a modified discrete cosine transform (MDCT) or a discrete cosine transform (DCT), a high frequency band signal in an audio signal is transformed from a time domain signal to a frequency domain signal, and then, the frequency domain signal is encoded.
In the case of a low bit rate, limited quantization bits cannot quantize all to-be-quantized audio signals; therefore, an encoding device uses most bits to precisely quantize relatively important low frequency band signals in audio signals, that is, quantization parameters of the low frequency band signals occupy most bits, and only a few bits are used to roughly quantize and encode high frequency band signals in the audio signals to obtain frequency envelopes of the high frequency band signals. Then, the frequency envelopes of the high frequency band signals and the quantization parameters of the low frequency band signals are sent to a decoding device in a form of a bitstream. The quantization parameters of the low frequency band signals may include excitation signals and frequency envelopes. When being quantized, the low frequency band signals may first also be transformed from time domain signals to frequency domain signals, and then, the frequency domain signals are quantized and encoded into excitation signals.
Generally, the decoding device may restore the low frequency band signals according to the quantization parameters that are of the low frequency band signals and in the received bitstream, then acquire the excitation signals of the low frequency band signals according to the low frequency band signals, predict excitation signals of the high frequency band signals using a bandwidth extension (BWE) technology and a spectrum filling technology and according to the excitation signals of the low frequency band signals, and modify the predicted excitation signals of the high frequency band signals according to the frequency envelopes that are of the high frequency band signals and in the bitstream, to obtain the predicted high frequency band signals. Herein, the obtained high frequency band signals are frequency domain signals.
In the BWE technology, a highest frequency bin to which a bit is allocated may be a highest frequency bin to which an excitation signal is decoded, that is, no excitation signal is decoded on a frequency bin greater than the highest frequency bin. A frequency band greater than the highest frequency bin to which a bit is allocated may be referred to as a high frequency band, and a frequency band less than the highest frequency bin to which a bit is allocated may be referred to as a low frequency band. That an excitation signal of a high frequency band signal is predicted according to an excitation signal of a low frequency band signal may be as follows. The highest frequency bin to which a bit is allocated is used as a center, an excitation signal that is of the low frequency band signal and less than the highest frequency bin to which a bit is allocated is copied into a high frequency band signal that is greater than the highest frequency bin to which a bit is allocated and whose bandwidth is equivalent to bandwidth of the low frequency band signal, and the excitation signal is used as the excitation signal of the high frequency band signal.
In a process of implementing the present disclosure, the inventor finds that at least the following problem exists in the prior art. According to the foregoing method for predicting a bandwidth extension frequency band signal in the prior art, an excitation signal of a high frequency band signal is predicted according to an excitation signal of a low frequency band signal, excitation signals of different low frequency band signals may be copied into a same high frequency band signal in different frames, causing discontinuity of excitation signal and reducing quality of the predicted bandwidth extension frequency band signal, thereby reducing auditory quality of an audio signal.