In most traditional voice codecs, the bandwidth of voice signal is low. Only a few voice codecs have a wide bandwidth. However, with the development of network technology, network transmission rates have increased and the requirement for wideband codecs has become greater. It is desirable that the bandwidth of voice codec be up to the ultra-wideband (50 Hz-14000 Hz) and full band (20 Hz-20000 Hz).
In order to make the wideband voice codec compatible with the traditional voice codec, a voice codec may be divided into a plurality of layers. The following description will be given with the voice codec having two layers as an example.
First, the voice codec with two layers separates the input signals into higher-band signals and lower-band signals with an analysis Quadrature-Mirror Filterbank at the coding side. The lower-band signal is input into a lower-band coder for coding and the higher-band signal is input into a higher-band coder for coding. The obtained lower-band data and higher-band data are synthesized into a bitstream via a bitstream multiplexer and the bitstream is sent out.
The lower-band signal refers to a signal whose frequency is in the lower band of the bandwidth for the signal and the higher-band signal refers to a signal whose frequency is in the higher band of the bandwidth for the signal. For example, when the bandwidth of an input signal is 50 Hz-7000 Hz, the bandwidth of the lower-band signal may be 50 Hz-4000 Hz and the bandwidth of the higher-band signal may be 4000 Hz-7000 Hz. The decoding is implemented at the decoding side. The bitstream is divided into a lower-band bitstream and a higher-band bitstream, and the lower-band bitstream and the higher-band bitstream are input into the lower-band decoder and the higher-band decoder for decoding, respectively. Thus, the lower-band signal and the higher-band signal are obtained. The lower-band signal and the higher-band signal are synthesized into the voice signal which is output with a synthesis Quadrature-Mirror Filterbank.
At present, the application of Voice over IP (VOIP) and the application of wireless network voice have become more and more popular. This voice transmission requires transmitting a small data packet in real time and reliably. When a voice frame is lost during transmission, there is no time to resend the lost voice frame. Similarly, if a voice frame passes through a long route and can not reach the decoder at the time the voice frame is to be played, the voice frame is equivalent to a lost frame. Thus, in a voice system, if a voice frame can not reach or can not reach in time, the decoder, the voice frame may be considered a lost frame.
If no processing is performed on the lost frame, the voice signal is intermittent and the voice quality is affected greatly. Thus, for the lost frame, frame erasure concealment processing is required. In other words, the lost voice data are estimated and the estimated data are used to replace the lost data. Hence, a better voice quality may be obtained in a frame lost environment. As for the voice codec which divides the input signal into the higher-band signal and the lower-band signal, the frame erasure concealment is performed on the lower-band signal and the higher-band signal, respectively, during the frame erasure concealment, and the higher-band signal and the lower-band signal obtained after the frame erasure concealment are synthesized into a voice signal to be output via the synthesis Quadrature-Mirror Filterbank.
The frame erasure concealment method includes the insertion method, the interpolation method and the regeneration method.
The insertion method for the frame erasure concealment includes the splicing, the silence replacement, the noise replacement and the previous frame repetition techniques.
The interpolation method for the frame erasure concealment includes the waveform replacement, the pitch repetition and the time domain waveform revision techniques.
The regeneration method includes the coder parameter interpolation and the model-based regeneration methods.
The model-based regeneration method has the best voice quality and the highest algorithm complexity, and the previous frame repetition method has a good voice quality and an algorithm complexity which is not high.
Because the affect on the voice quality by the lower-band signal is higher than that of the higher-band signal, a frame erasure concealment algorithm with high complexity and high voice quality (for example, the pitch repetition, the time domain waveform revision, the coder parameter interpolation and the model-based regeneration methods) is used for the lower-band signal. A frame erasure concealment algorithm with a low complexity and a low voice quality is used for the higher-band signal. Thus, the compromise between the voice quality and the complexity is accomplished.
In the speech decoder of the prior art, the pitch repetition is used for the lower-band signal to implement the frame erasure concealment, while the previous frame repetition and attenuation methods are used for the higher-band signal to implement the frame erasure concealment.
The formula for recovering the higher-band signal based on the previous frame repetition and attenuation methods is as follows:shb(n)=shb(n−N)·α, n=0, . . . , N−1
In the formula, shb(n), n=0, . . . , N−1 represents the recovered higher-band signal of the lost frame, and N represents the number of the samples of a frame; the attenuation coefficient α is a nonnegative number ranging from 0 to 1. The attenuation coefficient α may be a constant such as 0.8 or a variable which changes adaptively according to the number of continuously lost packets. For example, the first lost frame is multiplied by a larger attenuation coefficient such as 0.9, while the second lost frame and the following frames are multiplied by a smaller attenuation coefficient such as 0.7.
In the process of realizing the invention, the inventor finds: when the signal has a strong periodicity, the higher-band signal can not be recovered correctly. When the lower-band signal and the higher-band signal have a consistent periodicity, the original periodicity of the higher-band signal is destroyed when the frame erasure concealment is performed on the higher-band signal with the prior art codec. Thus, the quality of the voice signal output from the speech decoder is lowered.