In a packet-switched network, a packet of data often traverses several network nodes as it goes across the network in “hops.” Each packet has a header that contains destination address information for the entire packet. Since each packet contains a destination address, they may travel independent of one another and occasionally become delayed or misdirected from the primary data stream. If delayed, the packets may arrive out of order. The packets are not only merely delayed relative to the source, but also have delay jitter. Delay jitter is variability in packet delay, or variation in timing of packets relative to each other due to buffering within nodes in the same routing path, and differing delays and/or numbers of hops in different routing paths. Packets may even be actually lost and never reach their destination.
Voice over Packet (VOP) networks and Voice over Internet Protocols (VOIP) are sensitive to delay jitter to an extent qualitatively more important than for text data files for example. Delay jitter is one of the important factors that causes packet loss in a network. Packet loss can produce interruptions, clicks, pops, hisses and blurring of the sound and/or images as perceived by the user, unless the delay jitter problem can be ameliorated or obviated. Packets that are not literally lost, but are substantially delayed when received, may have to be discarded at the destination nonetheless because they have lost their usefulness at the receiving end. Thus, packets that are discarded, as well as those that are literally lost, are all called “lost packets.” Packet loss is a common source of distortion in VOIP.
Packet loss causes the degradation of speech quality as perceived by a user. From an end-user's point of view, the experience of even a single click or pop during a conversation will greatly reduce the user's satisfaction level with the quality of the entire conversation period. This is true regardless of whether the speech quality is good or excellent most of the time during the call. Customers of telephony services will simply remember the once or twice during a call that degradation was perceived and rate the entire call as poor quality. Thus, from an end-user's point of view even a single instance of quality degradation has a severely damaging effect on call quality. The user can rarely tolerate as much as half a second (500 milliseconds) of delay. For real-time communication some solution to the problem of packet loss is imperative, and the packet loss problem is exacerbated in heavily-loaded packet networks. Also, even a lightly-loaded packet network with a packet loss ration of 0.1% perhaps, still requires some mechanism to deal with the circumstances of lost packets.
Due to packet loss in a packet-switched network employing speech encoders and decoders, a speech decoder may either fail to receive a frame or receive a frame having a significant number of missing bits. In either case, the speech decoder is presented with the same essential problem—the need to synthesize speech despite the loss of compressed speech information. Both “frame erasure” and “packet loss” concern a communication channel or network problem that causes the loss of the transmitted bits.
Packet loss concealment (also called frame loss concealment) algorithms hide losses that occur in packet networks by reconstructing the signal from the characteristics of the past signal. These algorithms reduce the click and pops and other artifacts that occur when a network experiences packet loss. PLC improves the overall voice quality in unreliable networks.
One standard recommendation to address this problem is the International Telecommunication Union (ITU) G.711 standard “Pulse Code Modulation (PCM) of Voice Frequencies. G.711 Appendix I is an international standard that uses pulse code modulation (PCM) of voice frequencies to transmit packetized voice data over a communications network. Appendix I of G.711 is a standard describing a “high quality low-complexity algorithm for packet loss concealment with G.711.” G.711 describes the PLC algorithms as “frame erasure concealment algorithms,” that “hide transmission losses in an audio system where the input signal is encoded and packetized at a transmitter, sent over a network, and received at a receiver that decodes the packet and plays out the output.”
FIG. 1 illustrates a block flow diagram of an implementation of a receiver and decoder that uses features from ITU G.711 Appendix I. The figure shows a receiver 10 that maintains two data buffers that are used by a PLC module 22, history buffer 24 and pitch buffer 26. A data stream 12 is normally processed through the voice playout unit 14 in a receiver 10. If there are no lost packets in packet stream 12, then the VPU 14 sends its output data stream to voice decoder 16, which decodes the voice payload from each received packet 12. After the decoder 16, decoded voice data is sent through a switch 18 to and through various processes that are understood in the art to produce an audio output at audio port 20. Whether or not there is packet loss, the VPU 14 output is also saved into history buffer 24 on an ongoing basis. The history buffer 24 has a length of 48.75 ms worth of voice data samples. This length is equivalent to 390 samples for a 8 KHz sample rate. The history buffer 24 is constantly updated from samples from the VPU 14.
Pitch buffer 26 is the same length as the history buffer 24 and is used as a working buffer during a period of packet loss. Pitch buffer 26 is updated from the history buffer 24 at the occurrence of the first packet loss and is maintained for a period of consecutive losses. During the packet loss, the PLC algorithm generates a synthesized signal from the last received pitch period with no attenuation into the pitch buffer 26, which can then be added to the decoded stream from 16 through switch 18 or other device for playout at audio port 22. The history buffer is updated through each loss with the synthesized output as the erasure progresses.
The G.711 PLC algorithm adds a 3.75 ms delay, which is equivalent to 30 samples at 8 KHz. This delay is used for an Overlap Add (OLA) at the start of an erasure and at the end of the erasure. This allows the algorithm to perform smooth transitions between real and synthetic generated speech, and vice-versa. The synthesized speech from the pitch buffer is continued beyond the end of the erasure and then the generated speech is mixed with the real speech using OLA. The delay is to provide a smooth transition from a good frame to the first reconstructed frame. This avoids clicks in the audio caused by discontinuity between the good frames and the reconstructed frames, output that is unpleasant to the listener.
For some applications, however, the aspects of delay, memory consumption, and processing resources (e.g., MIPS) consumption associated with the G.711 Appendix I PLC algorithm are not acceptable. G.711 Appendix I standards can achieve high voice quality but require 3.75 ms of delay and a 48.75 ms history buffer that consumes approximately 1 MIPS per channel. Under the standards of G.711 Appendix I, the packet loss concealment algorithm reduces channel density by up to 30% while actual packet losses in a stable network usually occur less in less than one percent of all data transmissions. Even though a single incident of degradation of quality caused by packet loss can subjectively cause significant problems to the perceived call quality by an end user, a significant amount of MIPS are consumed by the prior art PLC algorithm to address a very low packet loss rate.