In most traditional voice codecs, the bandwidth of voice signal is low. Only a few voice codecs have a wide bandwidth, with the development of the network technology, the network transmission rate increases and the requirement for the wideband codec becomes higher. Optionally, it is desirable that the bandwidth of the voice codec is up to the ultra-wideband (50 Hz-14000 Hz) and fullband (20 Hz-20000 Hz).
In order to make the wideband voice codec compatible with the traditional voice codec, a voice codec may be divided into a plurality of layers. The following description will be given with the voice codec including two layers as an example.
First, the voice codec including two layers separates the input signals into higher-band signals and lower-band signals with an analysis Quadrature-Mirror Filterbank at the coding side. The lower-band signal is input into a lower-band coder for coding and the higher-band signal is input into a higher-band coder for coding. The obtained lower-band data and higher-band data are synthesized into a bitstream via a bitstream multiplexer and the bitstream is sent out. The lower-band signal refers to a signal whose frequency is in the lower band of the bandwidth for the signal and the higher-band signal refers to a signal whose frequency is in the higher band of the bandwidth for the signal. For example, when the bandwidth of an input signal is 50 Hz-7000 Hz, the bandwidth of the lower-band signal may be 50 Hz-4000 Hz and the bandwidth of the higher-band signal may be 4000 Hz-7000 Hz. The decoding is implemented at the decoding side. The bitstream is divided into a lower-band bitstream and a higher-band bitstream, and the lower-band bitstream and the higher-band bitstream are input into the lower-band decoder and the higher-band decoder for decoding, respectively. Thus, the lower-band signal and the higher-band signal are obtained. The lower-band signal and the higher-band signal are synthesized into the voice signal to be output with a synthesis Quadrature-Mirror Filterbank.
At present, the application of Voice over IP (VoIP) and the application of the wireless network voice become more and more popular. The voice transmission requires transmitting a small data packet in realtime and reliably. When a voice frame is lost during the transmission, there is no time for resending the lost voice frame. Similarly, if a voice frame passes through a long routing and can not reach timely when the voice frame is to be played, the voice frame is equivalent to a lost frame. Thus, in the voice system, if a voice frame can not reach or can not reach in time, the voice frame may be considered as a lost frame.
If no processing is performed to the lost frame, the voice is intermittent and the voice quality is affected greatly. Thus, for the lost frame, a frame erasure concealment processing is required. In other words, the lost voice data are estimated and the estimated data are used to replace the lost data. Hence, a better voice quality may be obtained in a frame lost environment. As for the voice codec which divides the input signal into the higher-band signal and the lower-band signal, the frame erasure concealment is performed to the lower-band signal and the higher-band signal respectively during the frame erasure concealment, and the higher-band signal and the lower-band signal obtained after the frame erasure concealment are synthesized into a voice signal to be output via the synthesis Quadrature-Mirror Filterbank.
The frame erasure concealment method includes the insertion method, the interpolation method and the regeneration method.
The insertion method for the frame erasure concealment includes the splicing, the silence replacement, the noise replacement and the previous frame repetition.
The interpolation method for the frame erasure concealment includes the waveform replacement, the pitch repetition and the time domain waveform revision.
The regeneration method includes the coder parameter interpolation and the model-based regeneration method.
The model-based regeneration method has the best voice quality and the highest algorithm complexity, and the previous frame repetition method has a good voice quality and an algorithm complexity which is not high.
Because the affections on the voice quality by the lower-band signal are higher than that by the higher-band signal, a frame erasure concealment algorithm with a high complexity and a high voice quality (for example, the pitch repetition, the time domain waveform revision, the coder parameter interpolation and the model-based regeneration method) is used for the lower-band signal. A frame erasure concealment algorithm with a low complexity and a low voice quality is used for the higher-band signal. Thus, the compromise between the voice quality and the complexity is accomplished.
In the speech decoder of the prior art, the pitch repetition is used for the lower-band signal to implement the frame erasure concealment, while the previous frame repetition and attenuation method is used for the higher-band signal to implement the frame erasure concealment.
The formula for recovering the higher-band signal based on the previous frame repetition and attenuation method is as follows:shb(n)=shb(n−N)·α,n=0, . . . ,N−1In the formula, shb(n), n=0, . . . , N−1 represents the recovered higher-band signal of the lost frame, and N represents the number of the samples of a frame; the attenuation coefficient α is a nonnegative number ranging from 0 to 1. The attenuation coefficient α may be a constant such as 0.8 or a variable which changes adaptively according to the number of continuously lost packets. For example, the first lost frame is multiplied by a larger attenuation coefficient such as 0.9, while the second lost frame and the following frames are multiplied by a smaller attenuation coefficient such as 0.7.
In the process of realizing the invention, the inventor finds: when the signal has a strong periodicity, the higher-band signal can not be recovered correctly. When the lower-band signal and the higher-band signal have the consistent periodicity. the original periodicity of the higher-band signal is destroyed when the frame erasure concealment is performed to the higher-band signal with the prior art. Thus, the quality of the voice signal output from the speech decoder is lowered.