In a real-time voice communication system, voice data is required to be transmitted in time and reliably, such as a VoIP (Voice over IP) system. However, because of unreliability of the network system itself, during the transmitting process from a transmitter to a receiver, the data packet may be dropped or can not arrive on the destination in time. The two situations are considered as network packet loss by the receiverer. The network packet loss is unavoidable, and is one of the principal factors influencing the quality of voice communication. Therefore, in the real-time voice communication system, a forceful packet loss concealment method is needed to restore a lost data packet and to get good quality of voice communication under the situation that the network packet loss happens.
In prior real time voice communication technologies, at the transmitter, a coder divides a broadband voice into two sub-bands, a high-band and a low-band, encodes the two sub-bands respectively using Adaptive Differential Pulse Code Modulation (ADPCM), and sends the two encoded sub-bands to the receiver via the network. At the receiver, the two sub-bands are decoded by an ADPCM decoder respectively, and are synthesized to a final signal by a Quadrature Mirror Filter (QMF)
For two different sub-bands, different Packet Loss Concealment (PLC) methods are used. For the low-band signal, when there is no packet loss, a reconstructed signal does not change during cross-fading. When there is packet loss, a short-term predictor and a long-term predictor are used to analyze a past signal (the past signal in the present application means the voice signal before a lost frame), and a voice class information is extracted. And the signal of the lost frame is reconstructed by taking the method for Linear Predictive Coding (LPC) based on pitch repetition, and by using the predictors and the voice class information. The state of the ADPCM should be updated synchronously until a good frame appears. In addition, not only the corresponding signal of the lost frame should be generated, but also a signal for cross-fading should be generated. And once a good frame is received, cross-fading can be executed to the signal of the good frame and the said signal. It should be noted that the cross-fading only happens when a good frame is received after a frame loss by the receiver.
During the process of implementing the present invention, the inventor finds that there exist the following problems in the prior arts: the reconstructed signal of the lost frame is synthesized using the past signal. The waveform and the energy are more similar to the signal in the history buffer, namely the signal before the lost frame, even at the end of the synthesized signal, but not similar to the signal newly decoded. This may cause that a waveform sudden change or an energy sudden change of the synthesized signal occurs at the joint between the lost frame and the first frame following the lost frame. The sudden change is shown in FIG. 1. In FIG. 1, three frames of signals are comprised, which are separated by two vertical lines. The frame N is a lost frame, and the other two frames are good frames. The upper signal is corresponding to an original signal. All of the three data frames are not lost in transmission. And a middle dashed line is corresponding to a signal synthesized by using the frames N−1, N−2 and so on before the frame N. The signal in the downmost row is corresponding to the signal synthesized by employing the prior arts. From FIG. 1, it can be seen that an energy sudden change exists in the transition of the final output signal frame N and the frame N+1, especially at the end of the voice and with longer frames. And repeating the same pitch repetition signal too much can result in music noises.