A packet switched network is a communication network that transmits data from a sender to a receiver packaged in packets, which are routed from the sender to the receiver over a network of switching nodes connected by “data links”. Each switching node receives packets via links that connect it to other switching nodes and switches packets that it receives to forward them over other data links that are suitable for bringing the packets to their destinations. Any two given packets may propagate over different routes, i e. different configurations of nodes and links, from a same sender to a same receiver. Examples of such packet switched networks are Arpanet, which was established more than thirty years ago and is the first packet switched network, and the Internet. The Internet is used today for all types of data communication and is commonly used to transmit multimedia data and for voice communication, conventionally known as Voice over Internet Protocol (VoIP).
A packet comprises a header at the beginning of the packet, a payload in the middle of the packet, and a trailer at the end of the packet. The header generally includes information related to a destination address of the packet, routing information, a sequence number that identifies the packet's position in a transmitted sequence of packets, and information regarding a size of the packet The payload comprises data actually being communicated. The trailer typically includes error-checking data, which is used at the packet's destination to detect errors, which may have occurred in the packet on route.
Since packets from a same sender to a same receiver may travel via different routes, packets, which are sequentially transmitted, may arrive at their common destination, i.e. receiver, in a different order than the order in which they were transmitted. As each packet is identified by a sequence number, its processing at the receiver will be done according to the sequence number regardless of the order in which it arrived at the receiver.
In VoIP and other voice related packet switching applications, a sender's transmitter will generally digitize an analog voice stream and group the resultant digital data in sections. The transmitter packages each section in a payload portion of a packet and sends the packet to a receiver, or a plurality of receivers, via the Internet. The receiver decodes the data in the payloads of the packets it receives and orders the data according to the sequence numbers of the packets to regenerate the voice stream. In VoIP protocols, generally, packets are required to be received at the receiver within a delay time less than from about 250 msec to about 500 msec following their transmission in order to maintain voice continuity of a reconstructed voice stream. The network generally classifies packets that do not reach their destinations within this delay as “lost packets”, ceases attempts at routing them to their destinations and discards them. Packet losses may affect intelligibility of a received voice stream if sound encoded in lost packets has a generally continuous duration, hereinafter a “discontinuity duration”, between about 60 msec to about 100 msec. To make up for the lost packets, packet loss concealment (PLC) techniques are commonly used in VoIP and other voice related packet switching applications. PLC techniques are generally considered to be either sender based or receiver based.
Sender based techniques may be classified as “active” or “passive”. Active techniques generally involve the receiver sending a message to the sender informing the sender which packets are lost, in response to which, the sender retransmits the lost packets. A drawback of this technique is that often a period, from a moment when a “lost packet” in a voice stream is first transmitted until a replacement packet is received at the receiver, exceeds the 250-500 msec delay time required to maintain voice continuity of the voice stream.
There are generally considered to be two types of passive techniques: interleaving and forward error correction. In interleaving, the transmitter distributes bytes that encode temporally contiguous portions of an audio stream in different packets prior to transmission. As a result, loss of a single packet does not, in general, result in loss of audio data corresponding to a continuous period of time greater than that corresponding to audio data encoded in a single byte, which is generally less than the discontinuity duration. Forward error correction comprises sending additional data with each packet, often referred to as redundancy data, that is useable to reconstruct lost packets. Reed Solomon encoding/decoding is a well-known forward error correction technique. Passive methods usually require that all data in a given data stream be received prior to processing and reconstructing lost packets. As a result, these techniques may be time consuming and may requite large buffering capacity in the receiver.
Receiver based techniques generally take advantage of a characteristic whereby variations in an audio waveform of a voice signal are relatively very small between adjacent packets. Numerous receiver-based techniques are known in the art, some of which are briefly discussed below.    a. Silence Substitution—the method comprises replacing voice that is encoded in a lost packet with a period of silence.    b. Packet Repetition—the method comprises replacing a lost packet with a duplicate of a packet immediately preceding the lost packet.    c. Pitch Estimation—the method comprises determining a fundamental frequency of voice encoded in packets preceding a lost packet and duplicating the fundamental frequency during a period in which voice encoded in the missing packet would be made audible.    d. Linear Prediction—the method comprises determining waveform parameters from a portion of an audio waveform preceding a segment of the waveform encoded in a lost packet. The lost segment is synthesized responsive to the predicted parameters using linear interpolation techniques. Optionally, a portion of the audio waveform following the lost segment may also be used to perform linear prediction.
For convenience of presentation, a portion of an audio waveform encoded in a packet immediately preceding a lost packet is referred to as a “leading portion”. A portion encoded in a packet immediately following the lost packet is referred to as a “trailing portion”.
Typically, in replacing a missing portion of an audio waveform with a synthesized segment, the synthesized segment is matched to the leading portion of the audio waveform to provide a smooth transition between the leading portion and the synthesized segment. Generally, matching comprises overlapping and adding (OLA) a leading section of the synthesized segment with a trailing section of the leading portion so that the amplitude of the audio waveform is substantially preserved in a leading overlap region. In other matching techniques the trailing section of the leading portion is butted on to the leading section of the synthesized segment. Furthermore, several other matching techniques comprise phase matching, referred to as “synchronous overlap and add” (SOLA) techniques, wherein the leading section of a synthesized segment is overlapped with a trailing section of a leading portion of the waveform to preserve pitch as well as amplitude in the overlap region.
PLC and techniques for synthesizing lost packets may be found in “Packet Loss Concealment for Voice Transmission over IP Networks”, Ejaz Mahfuz, Department of Electrical Engineering, McGill University, Montreal, Canada. September 2001, (www.tsp.ece.mcgill.ca/MMSP/Theses/2001/MahfuzT2001.pdf), “A Survey of Packet Loss Recovery Techniques for Streaming Audio”, C. Perkins, O. Hodson, V. Hardman, IEEE Network, September/October 1998, pp. 40-48, ANSI T1.521a-2000 (Annex B) “Standard for Packet Loss Concealment”, and ITU-T Recommendation G.711, Appendix I, “A High Quality Low-Complexity Algorithm for Packet Loss Concealment with G.711”, all of which are incorporated herein by reference. OLA and SOLA techniques are described in Chapter 2, “Sound modeling: signal based approaches” by Giovanni De Poli and Federico Avanzini (www.dei.unipd.it/˜musical/IM06/Dispense06/2_signalmodels.pdf), incorporated herein by reference.