Embodiments of the invention relate generally to reducing memory requirements for the generation of a synthetic speech signal for packet loss concealment in a voice over packet network.
In a packet-switched network, a packet of data often traverses several network nodes as it goes across the network in “hops.” Each packet has a header that contains destination address information for the entire packet. Since each packet contains a destination address, they may travel independent of one another and occasionally become delayed or misdirected from the primary data stream. If delayed, the packets may arrive out of order. The packets are not only merely delayed relative to the source, but also have delay jitter. Delay jitter is variability in packet delay, or variation in timing of packets relative to each other due to buffering within nodes in the same routing path, and differing delays and/or numbers of hops in different routing paths. Packets may even be actually lost and never reach their destination.
Voice over Packet (VOP) and Voice over Internet Protocol (VOIP) are sensitive to delay jitter to an extent qualitatively more important than for text data files for example. Delay jitter produces interruptions, clicks, pops, hisses and blurring of the sound and/or images as perceived by the user, unless the delay jitter problem can be ameliorated or obviated. Packets that are not literally lost, but are substantially delayed when received, may have to be discarded at the destination nonetheless because they have lost their usefulness at the receiving end. Thus, packets that are discarded, as well as those that are literally lost, are all called “lost packets.”
The user can rarely tolerate as much as half a second (500 milliseconds) of delay. For real-time communication some solution to the problem of packet loss is imperative, and the packet loss problem is exacerbated in heavily-loaded packet networks. Also, even a lightly-loaded packet network with a packet loss ration of 0.1% perhaps, still requires some mechanism to deal with the circumstances of lost packets.
Due to packet loss in a packet-switched network employing speech encoders and decoders, a speech decoder may either fail to receive a frame or receive a frame having a significant number of missing bits. In either case, the speech decoder is presented with the same essential problem—the need to synthesize speech despite the loss of compressed speech information. Both “frame erasure” and “packet loss” concern a communication channel or network problem that causes the loss of the transmitted bits.
Packet loss concealment (PLC) (also called frame loss concealment) algorithms hide losses that occur in packet networks by reconstructing the signal from the characteristics of the past signal. These algorithms reduce the click and pops and other artifacts that occur when a network experiences packet loss. PLC improves the overall voice quality in unreliable networks.
One standard recommendation to address this problem is the International Telecommunication Union (ITU) G.711 Appendix I recommendation for a packet loss concealment algorithm (G.711), which is used together with the G.711 codec. Referring to the block flow diagram in FIG. 1, G.711 describes a receiver 10 that maintains two data buffers that are used by a PLC module 18, history buffer 16 and pitch buffer 20. A data stream 12 is normally processed through the voice playout unit 14 in a receiver 10. If there are no lost packets in packet stream 12, then the voice playout unit (VPU) 14 sends its output data stream to voice decoder 15, which decodes the voice payload from each received packet 12. After the decoder 15, decoded voice data is sent through a switch 19 to and through various processes that are understood in the art to produce an audio output at audio port 22. Whether or not there is packet loss, the VPU 14 output is also saved into history buffer 16 on an ongoing basis. The history buffer 16 has a length of 48.75 ms worth of voice data samples. This length is equivalent to 390 samples for a 8 KHz sample rate. The history buffer 16 is constantly updated from samples from the VPU 14.
Pitch buffer 20 is the same length as the history buffer 16 and is used as a working buffer during a period of packet loss. Pitch buffer 20 is updated from the history buffer 16 at the occurrence of the first packet loss and is maintained for a period of consecutive losses. During the packet loss, the PLC algorithm generates a synthesized signal from the last received pitch period with no attenuation into the pitch buffer 20, which can then be added to the decoded stream from 15 through switch 19 or other device for playout at audio port 20. The history buffer is updated through each loss with the synthesized output as the erasure progresses.
The G.711 PLC algorithm adds a 3.75 ms delay, which is equivalent to 30 samples at 8 KHz. This delay is used for an Overlap Add (OLA) at the start of an erasure and at the end of the erasure. This allows the algorithm to perform smooth transitions between real and synthetic generated speech, and vice-versa. The synthesized speech from the pitch buffer is continued beyond the end of the erasure and then the generated speech is mixed with the real speech using OLA. The delay is to provide a smooth transition from a good frame to the first reconstructed frame. This avoids clicks in the audio caused by discontinuity between the good frames and the reconstructed frames, output that is unpleasant to the listener.
However, use of the pitch 20 and history 16 buffers and the OLA all require allocations of significant memory resources, even if the buffers are idle. These allocations are in addition to the memory allocation given to a receiver's voice playout unit 14. Packet losses in a stable network usually occur less in less than one percent of all data transmissions. Thus, the PLC is typically idle while continuing to require full memory allocations for its operations. What is needed is a technique to reduce memory requirements for packet loss concealment algorithms by reducing the buffer and OLA memory allocations.