1. Field of the Invention
The present invention relates generally to speech coding. More particularly, the present invention relates to pitch prediction for concealing lost packets.
2. Background Art
Subscribers use speech quality as the benchmark for assessing the overall quality of a telephone network. Gateway VoIP (Voice over Internet Protocol or Packet Network) devices, which are placed at the edge of the packet network, perform the task of encoding speech signals (speech compression), packetizing the encoded speech into data packets, and transmitting the data packets over the packet network to remote VoIP devices. Conversely, such remote VoIP devices perform the task of receiving the data packets over the packet network, depacketizing the data packets to retrieve the encoded speech and decoding (speech decompression) the encoded speech to regenerate the original speech signals.
Packet loss over the packet network is a major source of speech impairments in VoIP applications. Such loss could be caused for a variety of reasons, such as discarding packets in the packet network due to congestion or by dropping packets at the gateway due to late arrival. Of course, packet loss can have a substantial impact on perceived speech quality. In modern codecs, concealment algorithms are used to alleviate the effects of packet loss on perceived speech quality. For example, when a loss occurs, the speech decoder derives the parameters for the lost frame from the parameters of previous frames to conceal the loss. The loss also affects the subsequent frames, because the decoder takes a finite time to resynchronize its state to that of the encoder. Recent research has shown that for some codecs (e.g. G.729) packet loss concealment (PLC) works well for a single frame loss, but not for consecutive or burst losses. Further, the effectiveness of a concealment algorithm is affected by which part of speech is lost (e.g. voiced or unvoiced). For example, it has been shown that concealment for G.729 works well for unvoiced frames, but not for voiced frames.
When a packet loss occurs, one of the most important parameters to be recovered or reconstructed is the pitch lag parameter, which represents the fundamental frequency of the speech (active-voice) signal. Traditional packet loss algorithms copy or duplicate the previous pitch lag parameter for the lost frame or constantly add one (1) to the immediately previous pitch lag parameter. In other words, if a number of frames have been lost, all the lost frames use the same pitch lag parameter from the last good frame, or the first frame duplicates the pitch lag parameter from the last good frame, and each subsequent lost frame adds one (1) to its immediately previous pitch lag parameter, which has itself been reconstructed.
FIG. 1 illustrates a conventional approach for pitch lag prediction used by conventional packet loss concealment algorithms. As shown, pitch lags 120-129 show the true pitch lags on pitch track 110. FIG. 1 also shows a situation where a number of frames have been lost due to packet loss. Conventional pitch lag prediction algorithms duplicate or copy the pitch lag parameter from the last good frame, i.e. pitch lag 125 is copied as pitch lag 130 for the first lost frame. Further, pitch lag 130 is copied as pitch lag 131 for the next lost frame, which is then copied as pitch lag 132 for the next lost frame, and so on. As a result, it can been seen from FIG. 1 that pitch lags 130-132 fall considerably outside of pitch track 130, and there is a considerable distance or gap between the next good pitch lag 129 and reconstructed pitch lag 132, when compared to the distance between lost pitch lag 128 and pitch lag 129. Although, pitch lags 130-132 are the same as pitch lag 125 and do not create a perceptible difference for a listener at that juncture, but the considerable distance gap between reconstructed pitch lag 132 and pitch lag 129 creates a click sound that is perceptually very unpleasant to the listener.
Accordingly, there is a strong need in the art to for packet loss concealment systems and methods, which can offer a superior speech quality by efficiently predicting the pitch lags for lost frames that are more in line with the pitch track.