In the audio and voice processing technologies, a bandwidth expansion technology emerges, that is, a frequency range of a sound signal (for example, an audio signal or a voice signal) is expanded, and mainly the bands that contain useful information or affect the sound effect are expanded. The bandwidth expansion technology has developed fast in recent years and is commercially applied in several fields, for example, to enhance the sound effect of a woofer and enhance the high frequencies of the audio and voice.
In the bandwidth expansion technology, at an encoding end, a core encoder is generally adopted to perform higher accuracy encoding on a low band input signal, and another encoder performs lower bit rate encoding on a high band input signal on which the core encoder does not perform encoding. Therefore, in many cases, the high band input signal may be regarded as a separate signal to be encoded. The process of the common bandwidth expansion method in the prior art is as follows:
The encoding end receives the high band input signal, calculates a time envelope signal and a spectral envelope signal to obtain a time envelope and a spectral envelope respectively, quantizes and muxes the time envelope and the spectral envelope, and then transmits the time envelope and spectral envelope to a decoding end. At the decoding end, the demuxed time envelope and spectral envelope are decoded, an excitation signal of a high band is generated according to parameters of the core encoder at the encoding end, and then the excitation signal is shaped by using the decoded time envelope and spectral envelope to obtain the high band output signal.
During the research and implementation of the prior art, the inventors find that the prior art has the following problems.
In the prior art, the mode for calculating and quantizing the time envelope and spectral envelope of the high band input signal is fixed, so the encoder should be set in advance to a mode applicable to a certain type of input signal, such as, a mode applicable to a voice type signal. In this case, although it is beneficial for encoding of a voice type signal, an encoding effect for an audio type signal is relatively poor. Furthermore, the types applicable in the prior art are only classification at a macroscopic level. More specific subdivided types are not distinguished in the voice type signal. For example, a transient type or a harmonic type is not considered. Therefore, better encoding cannot be performed according to further subdivided types of the input signals and better encoding effects cannot be achieved.