With development of network technologies, more applications are put forward that transmit voice packets through a packet switching network and perform real-time voice communication, for example, Voice over IP (VoIP). However, the network based on the packet switching technology is not initially designed for the applications that require real-time communication, and is not absolutely reliable. In the transmission process, data packets may be lost; or, if they arrive at the receiver beyond the time of playing, they are discarded by the receiver, which are both considered as packet loss. Packet loss is a huge problem to real-time requirement and the voice quality required by the VoIP. The VoIP receiver is responsible for decoding the voice packets sent by the sender into playable voice signals. If any packet is lost and no compensation is made, the voice signals are not continuous, and noise occurs, which affects voice quality. Therefore, a robust solution to concealing lost packets is required in a real-time communication system to recover the lost packets, and ensure communication quality in the case that some packets are lost in the network.
Currently, the common technology of concealing lost packets is based on pitch repetition. For example, the solution to concealing lost packets in Appendix I to voice compression standard G.711 formulated by ITU employs is based on pitch waveform substitution. Pitch waveform substitution compensates for the lost audio frames based on the receiver. The history signals that exist before the lost frame are used to calculate the pitch period T0 of the history signals, and then a segment of signals that exist before the lost frame are copied repeatedly to reconstruct the signals corresponding to the lost frame, where the length of the segment is T0. As shown in FIG. 1, frame 2 is a lost frame, frame length is N, and frame 1 and frame 3 are complete frames. It is assumed that the pitch period corresponding to the history signals (signals of frame 1 and those before frame 1) is T0, and the interval corresponding to the signals is interval 1. The signals corresponding to the last pitch period of the history signals (namely, signals corresponding to interval 1) may be copied to frame 2 repeatedly until frame 2 is full in order to reconstruct the signals corresponding to the lost frame. In FIG. 1, the signals of two pitch periods need to be copied repeatedly to fill the lost frame.
However, if the signals of the last pitch in the history signals are repeatedly used directly as the signals corresponding to the lost frame, waveform mutation occurs at the joint of the two pitches. To ensure smoothness of the joint, the signals in last T0/4 of the history buffer generally undergo cross attenuation before the signals of the last pitch period in the history buffer are used to fill the lost frame. As shown in FIG. 2, the applied window is a simple triangular window. The rising window corresponds to the dashed line with an upward gradient in FIG. 2, and the falling window corresponds to the dashed line with a downward gradient in FIG. 2. The T0/4 signals prior to the last pitch period T0 in the history buffer are multiplied by the rising window. The last T0/4 signals in the buffer are multiplied by the falling window and overlapped. Then, the multiplied signals replace the last T0/4 signals of the history buffer to ensure smooth transition at the joint of two adjacent pitches at the time of pitch repetition.
In voice communication, when Discrete Cosine Transform (DCT) is applied to broadband audio coding, because the shock response of the bandpass filter is a finite length, a block boundary effect occurs, and great noise occurs. Such defects are overcome by Modified Discrete Cosine Transform (MDCT).
MDCT uses Time Domain Aliasing Cancellation (TDAC) to reduce the boundary effect. To obtain an MDCT coefficient composed of 2N sample signals, for an input sequence x[n], the MDCT uses N samples of this frame and N samples of an adjacent signal frame before the frame to constitute a sequence of 2N samples, and then defines a window function of 2N samples to be h[n], which fulfills:h[n]2+h[n+N]2=1  (1)
For example, h[n] may be defined simply as a sine window:
                              h          ⁡                      [            n            ]                          =                  sin          ⁡                      (                                          n                                  2                  ⁢                                                                          ⁢                  N                                            ⁢              π                        )                                              (        2        )            
which leads to 50% overlap of the data between the windows. The MDCT coefficient of x[n] is X[k], and the Inverse Modified Discrete Cosine Transform (IMDCT) coefficient of x[n] is Y[n], which are separately defined as:
                              X          ⁡                      [            k            ]                          =                              ∑                          n              =              0                                                      2                ⁢                                                                  ⁢                N                            -              1                                ⁢                                          ⁢                                    x              ⁡                              [                n                ]                                      ·                          h              ⁡                              [                n                ]                                      ·                          cos              ⁡                              [                                                                                                    (                                                                              2                            ⁢                                                                                                                  ⁢                            k                                                    +                          1                                                )                                            ⁢                      π                                                              2                      ⁢                                                                                          ⁢                      N                                                        ·                                      (                                          n                      +                                              n                        0                                                              )                                                  ]                                                                        (        3        )                                          Y          ⁡                      [            n            ]                          =                              2            N                    ·                                    ∑                              k                =                0                                            N                -                1                                      ⁢                                                  ⁢                                          X                ⁡                                  [                  k                  ]                                            ·                              cos                ⁡                                  [                                                                                                              (                                                                                    2                              ⁢                                                                                                                          ⁢                              k                                                        +                            1                                                    )                                                ⁢                        π                                                                    2                        ⁢                                                                                                  ⁢                        N                                                              ·                                          (                                              n                        +                                                  n                          0                                                                    )                                                        ]                                                                                        (        4        )            
In the formulas above,
      k    =    0    ,  …  ⁢          ,      N    -    1    ,      n    =    0    ,  …  ⁢          ,            2      ⁢                          ⁢      N        -    1    ,            n      0        =                            N          +          1                2            .      
Therefore, the reconstructed signal y[n] may be obtained from TDAC for Y[n] and Y′[n] based on the following formula:y[n]=h[n+N]·Y′[n+N]+h[n]·Y[n]n=0, . . . , N−1,  (5)
In the formula above, Y′[n] represents an IMDCT coefficient that is prior to and adjacent to Y[n].
On the encoder side, the encoder performs MDCT for the original voice signal according to formula (3) to obtain X[k], encodes X[k] and sends it to the decoder side. On the decoder side, after receiving the MDCT coefficient from the encoder, the decoder performs IMDCT for the received X[k] according to formula (4) to obtain Y[n], namely, IMDCT coefficient corresponding to X[k].
For brevity of description, it is assumed that the IMDCT coefficient obtained after the decoder performs IMDCT for the currently received X[k] is Y[n], n=0, . . . , 2N−1, and the IMDCT coefficient prior to and adjacent to Y[n] is Y′[n], n=0, . . . , 2N−1. Taking FIG. 3 as an example, based on the foregoing assumption, the IMDCT coefficient corresponding to frame F0 and frame F1 is IMDCT1, expressed as Y′[n], n=0, . . . , 2N−1; the IMDCT coefficient corresponding to frame F1 and F2 is IMDCT2, expressed as Y[n], n=0, . . . , 2N−1. On the decoder side, the decoder substitutes Y[n], n=0, . . . , 2N−1 and Y′[n], n=0, . . . , 2N−1 into formula (5) to obtain the reconstructed signal y[n].
When an MDCT coefficient is lost, as shown in FIG. 4, the decoder receives MDCT3 corresponding to frame F2 and frame F3 and MDCT5 corresponding to frame F4 and frame F5, but fails to receive MDCT4 corresponding to frame F3 and frame F4. Consequently, the decoder fails to obtain IMDCT4 according to formula (4). The decoder receives only the part of coefficient corresponding to F3 in IMDCT3 and the part of coefficient corresponding to F4 in IMDCT5, and is unable to recover the signals corresponding to frame F3 and frame F4 completely by using IMDCT3 and IMDCT5 alone.
In the process of developing the present invention, the inventor finds that: The prior art needs to use the decoded signals of frame F2 and frames prior to F2 to generate signals of the lost frame, and completely discard the part of coefficient corresponding to F3 in the received IMDCT3 and the part of coefficient corresponding to the frame F4 in the received IMDCT5. According to definition of MDCT/IMDCT in formula (3) and formula (4), the part of coefficient corresponding to frame F3 in the received IMDCT3 and the part of coefficient corresponding to frame F4 in the received IMDCT5 include useful information in light of formula (5). Moreover, supposing that the frame length is N samples, once n MDCT coefficients are lost continuously, the number of samples corresponding to the affected signals is (n+1)*N. With more MDCT coefficients being lost, the quality of the recovered signals is worse, the user experience is worse, and the Quality of Service (QoS) is deteriorated.