A transmission of voice data is required to be real-time and reliable in a real time voice communication system, for example, a VoIP (Voice over IP) system. Because of unreliable characteristics of a network system, data packets may be lost or not reach the destination in time in a transmission procedure from a sending end to a receiving end. These two kinds of situations are both considered as network packet loss by the receiving end. It is unavoidable for the network packet loss to happen. Meanwhile, the network packet loss is one of the most important factors influencing the talk quality of the voice. Therefore, a robust packet loss concealment method is needed to recover the lost data packet in the real time communication system so that a good talk quality is still obtained under the situation of the network packet loss.
In the existing real-time voice communication technology, in the sending end, an encoder divides a broad band voice into a high sub band and a low sub band, and uses ADPCM (Adaptive Differential Pulse Code Modulation) to encode the two sub bands, respectively, and sends them together to the receiving end via the network. In the receiving end, the two sub bands are decoded, respectively, by the ADPCM decoder, and then the final signal is synthesized by using a QMF (Quadrature Mirror Filter) synthesis filter.
Different Packet Loss Concealment (PLC) methods are adopted for two different sub bands. For a low band signal, under the situation with no packet loss, a reconstruction signal is not changed during CROSS-FADING. Under the situation with packet loss, for the first lost frame, the history signal (the history signal is a voice signal before the lost frame in the present application document) is analyzed by using a short term predictor and a long term predictor, and voice classification information is extracted. The lost frame signal is reconstructed by using a LPC (linear predictive coding) based on pitch repetition method, the predictor and the classification information. The status of ADPCM will be also updated synchronously until a good frame is found. In addition, not only the signal corresponding to the lost frame needs to be generated, but also a section of signal adapting for CROSS-FADING needs to be generated. In that way, once a good frame is received, the CROSS-FADING is executed to process the good frame signal and the section of signal. It is noticed that this kind of CROSS-FADING only happens after the receiving end loses a frame and receives the first good frame.
During the process of realizing the present invention, the inventor finds out at least following problems in the prior art: the energy of the synthesized signal is controlled by using a static self-adaptive attenuation factor in the prior art. Although the attenuation factor defined changes gradually, its attenuation speed, i.e. the value of the attenuation factor, is the same regarding the same classification of voice. However, human voices are various. If the attenuation factor does not match the characteristic of human voices, there will be uncomfortable noise in the reconstruction signal, particularly at the end of the steady vowels. The static self-adaptive attenuation factor cannot be adapted for the characteristic of various human voices.
The situation shown in FIG. 1 is taken as an example, wherein T0 is the pitch period of the history signal. The upper signal corresponds to an original signal, i.e. a waveform schematic diagram under the situation with no packet loss. The underneath signal with dash line is a signal synthesized, according to the prior art. As can be seen from the figure, the synthesized signal does not keep the same attenuation speed with the original signal. If there are too many times of the same pitch repetition, the synthesized signal will produce obvious music noise so that the difference between the situation of the synthesized signal, and the desirable situation is great.