The Internet Protocol (IP) network was initially designed to transmit data streams which include large packets and require no real-time or reliable transmission. However, voice streams include small packets which need to be transmitted reliably in real time. When a voice packet is lost in the transmission process, there is not enough time to retransmit the lost packet. Likewise, after a voice packet is transmitted along a long route but fails to reach the intended destination at the required time of play, the packet makes no sense. Therefore, in a Voice over IP (VoIP) system, the voice packets that fail to reach the intended destination in timely manner are regarded as having been “lost”.
Packet loss in a network is the primary cause for deterioration of voice Quality of Service (QoS) in the network transmission. Without effective technologies for recovering or hiding lost voice packets, even the best designed and managed IP network is incapable of providing good toll call services. Well designed packet loss solutions improve the voice transmission quality significantly.
A solution to hiding lost voice frames in the prior art is as follows: The voice signals of the frame prior to a lost frame and the frame subsequent to the lost packet are transformed to the frequency domain, and then values are interpolated into the amplitude values of the frequency-domain parameters of the prior frame and the subsequent frame, and finally the frequency-domain coefficients with interpolated values are transformed back to the time domain. This solution is detailed below with reference to FIG. 1. In FIG. 1, the waveform 101 is time-domain voice signals, which include three frames in total. One of such frames is lost. As shown in FIG. 2, the sampling in the solution to hiding lost voice frames in the prior art includes the following block:
Block 201: Fourier transformation is performed for the time-domain signal of the frame prior to the lost frame through the following formula:
                              X          ⁡                      (                                          n                ⁢                                                                  ⁢                1                            ,              k                        )                          =                              ∑                          m              =                              -                ∞                                      ∞                    ⁢                                    x              ⁡                              (                m                )                                      ·                          w              ⁡                              (                                                      n                    ⁢                                                                                  ⁢                    1                                    -                  m                                )                                      ·                          ⅇ                                                -                  j                                ⁢                                                      2                    ⁢                    π                                    N                                ⁢                                                                  ⁢                k                ⁢                                                                  ⁢                m                                                                        [        1        ]            
In formula (1), x(m) is a voice time-domain signal; N is a frame length; w is an analysis window (also known as a transformation window) which is zero outside the interval [0, N−1] and is preferably a triangular window to reduce the calculation load; n1 is the position of the end sample point of the prior frame signal; X (n1,k) is a frequency-domain coefficient after the Fourier transformation of the prior frame; k is a discrete frequency, its value range is 0, 1, . . . , N−1, and the relation between k and the angular frequency is
  ω  =                    2        ⁢        π            N        ⁢          k      .      
After the time-domain signal of the prior frame undergoes the Fourier transformation, the frequency-domain coefficient X (n1,k) is a complex number, and can be expressed in the equation of amplitude and phase:X(n1,k)=An1,kejθn1,k  [2]
In formula (2), An1,k is the amplitude value of the kth frequency, and θn1,k is the phase of the kth frequency.
Block 202: Fourier transformation is performed for the time-domain signal of the frame subsequent to the lost frame through the following formula:
                              X          ⁡                      (                                          n                ⁢                                                                  ⁢                2                            ,              k                        )                          =                              ∑                          m              =                              -                ∞                                      ∞                    ⁢                                    x              ⁡                              (                m                )                                      ·                          w              ⁡                              (                                                      n                    ⁢                                                                                  ⁢                    2                                    -                  m                                )                                      ·                          ⅇ                                                -                  j                                ⁢                                                      2                    ⁢                    π                                    N                                ⁢                k                ⁢                                                                  ⁢                m                                                                        [        3        ]            
In formula (3), n2 is the position of the end sample point of the subsequent frame signal.
After the time-domain signal of the subsequent frame undergoes the Fourier transformation, the frequency-domain coefficient X (n2,k) is a complex number, and can be expressed in the equation of amplitude and phase:X(n2,k)=An2,kejθn2,k  [4]
Block 203: Values are interpolated into the amplitude value of the frequency-domain coefficient of the prior frame and the amplitude value of the frequency-domain coefficient of the subsequent frame to obtain reconstructed signals through the following formula:
                              A                      p            ,            k                          =                              A                                          n                ⁢                                                                  ⁢                1                            ,              k                                +                                    (                                                A                                                            n                      ⁢                                                                                          ⁢                      2                                        ,                    k                                                  -                                  A                                                            n                      ⁢                                                                                          ⁢                      1                                        ,                    k                                                              )                        ·                          p                              PP                +                1                                                                        [        5        ]            
In formula (5), Ap,k is the amplitude value of the frequency-domain coefficient of a reconstructed signal obtained after interpolation; the range of p is 1, 2, . . . , PP; and PP is the quantity of interpolated values (namely, the quantity of reconstructed signals), which may be calculated out through the following formula:PP=(lostNum+1)·N/S−1  [6]
In formula (6), S is the interval (namely, the window offset) of the sample points of the reconstructed signal frequency-domain coefficient mapped onto the time domain after the interpolation, and is generally N/2. In the formula, lostNum is the quantity of lost frames.
Block 204: Inverse Fourier transformation is performed for the frequency-domain coefficient of each reconstructed signal after the interpolation. The Fourier retransformation may be implemented through the following formula:
                                          y            p                    ⁡                      (            n            )                          =                              1            N                    ⁢                                    ∑                              k                =                0                                            N                -                1                                      ⁢                                          X                ⁡                                  (                                      p                    ,                    k                                    )                                            ·                              ⅇ                                  j                  ⁢                                                            2                      ⁢                      π                                        N                                    ⁢                  kn                                                                                        [        7        ]            
In formula (7), yp(n) is a signal after inverse transformation; the value range of n is 0, 1, . . . , N−1; X(p,k) is the frequency-domain coefficient of a reconstructed signal after the interpolation, and is calculated through the following formula:X(p,k)=Ap,kejθn1,k  [8]
In formula (8), the amplitude is the frequency-domain amplitude of the prior frame and the subsequent frame after the interpolation, and the phase is the frequency-domain phase of the prior frame.
Block 205: After the inverse Fourier transformation, the obtained time-domain signals are superposed to generate the lost voice signals. If S=N/2 (the window offset is half of the frame length), the voice signal of the lost frame may be calculated through the following formula:
                                          x            ^                    ⁡                      (                                          n                ⁢                                                                  ⁢                1                            +                                                (                                      p                    -                    1                                    )                                *                S                            +              l                        )                          =                              S                          W              ⁢                                                          ⁢              0                                ⁢                      (                                                            y                  p                                ⁡                                  (                                      l                    +                                          N                      /                      2                                                        )                                            +                                                y                                      p                    +                    1                                                  ⁡                                  (                  l                  )                                                      )                                              [        9        ]            
In formula (9), the value range of p is 1, 2, . . . , PP−1, and the value range of l is 1, 2, . . . , S. W0 is:
                              W          ⁢                                          ⁢          0                =                              ∑                          n              =                              -                ∞                                      ∞                    ⁢                      w            ⁡                          (              n              )                                                          [        10        ]            
The waveform 102 in FIG. 1 shows Fourier transformation and interpolation for the frame prior to the lost frame and the frame subsequent to the lost frame. The solid line triangular window at the forepart is an analysis window added for the time-domain signal of the prior frame, and the solid line triangular window at the last part is an analysis window added for the time-domain signal of the subsequent frame. The three dotted line triangular windows are the positions of the time-domain signals generated after the interpolation. The waveform 103 in FIG. 1 is a waveform generated after the solution to hiding lost packets in the prior art is applied.
As seen from the process of hiding lost packets in the prior art and the waveform 103 in FIG. 1, the window offset is set to be half of the frame length; the window length is the frame length; and each frame corresponds to two window offsets. Consequently, the reconstructed signals of two periods are generated, and the period of the generated voice signals may be inconsistent with that of the actual signals. Moreover, the setting of the position of the Fourier transformation window does not take the phase of the prior frame or the subsequent frame into consideration, and therefore, the phase of each constructed signal does not match the phase of the prior frame or the subsequent frame.