It is well known that network environments are migrating toward a single converged IP (Internet Protocol) network that delivers voice, video and data traffic. One critical component for such a successful convergence is the transmission of voice packets over the IP network. IP networks were originally designed for transmitting data traffic that consists of relatively large-sized packets and that do not necessarily require reliable real-time delivery. In such applications, packets can be dropped, if necessary, with relative impunity in order to alleviate network congestion. In addition, subsequent packets can be harmlessly routed through different paths. As a result, each packet may experience quite different transmission delays. The resulting network characteristics are, however, very difficult, if not impossible, to predict—but they may nonetheless be perfectly acceptable for data transmission since dropped packets can simply be retransmitted, and delay jitter (i.e., variance) has a fairly insignificant effect.
Voice transmission, however, requires real-time and reliable delivery of smaller-sized packets. The receiving end needs to get a steady stream of voice packets for “playback.” When a voice packet is dropped, there is no time for retransmitting the dropped packet. In addition, if one voice packet takes a longer route than the others and fails to arrive on time for playback, the received voice packet is in fact useless. In voice-over-IP (VoIP) applications, therefore, a voice packet is typically regarded as being lost whether the packet fails to arrive on time or if it fails to be delivered at all. Such problems are invariably found in all IP networks, regardless of how well managed or over-provisioned they may be—that is, such problems are not limited to the public Internet or only to “mismanaged” networks.
Various prior art techniques have been suggested and/or employed to recover or conceal the effects of lost packets. Without such efforts, even the best designed and managed IP networks would fail to deliver “toll quality” speech. In particular, many VoIP systems rely on receiver-based Packet-Loss Concealment (PLC) schemes. These may be generally classified into insertion-based, interpolation-based and regeneration-based methods.
Insertion-based PLC methods include such well-known prior art techniques as silence insertion, noise insertion and packet repetition. Silence insertion merely fills the gap (where the lost packet should have been) with silence. Although widely used, its performance is quite poor because packet loss thereby results in periods of silence, which, in turn, causes unpleasant clipped-speech distortion. Noise insertion—in which noise rather than silence is inserted in the gap where the lost packet should have been—produces slightly better voice quality and intelligibility than silence insertion. And packet repetition uses the most recently received packet to replace lost packets. Packet repetition performs the best among insertion-based methods, but still results in audible distortions in the speech signal.
Interpolation-based prior art PLC methods, such as G.711 PLC, provide higher concealment performance but do so at the expense of increased computational requirements. (G.711 is a standard communications protocol promulgated by the International Telecommunications Union Telecommunications Standardization Sector.) Another prior art interpolation-based method is the time scale modification technique, which “stretches” the good speech frame across the time gap to hide the lost packets. And finally, regeneration-based PLC methods, which are the most sophisticated of PLC techniques, produce the highest quality speech in the presence of lost packets. Imbedded PLC algorithms in CELP (Code-Excited Linear Predictive) based speech codecs (i.e., coder/decoder systems) such as the G.723.1, G.728 and G.729 standards (each also promulgated by the International Telecommunications Union Telecommunications Standardization Sector) belong to this category.
Note that each of the prior art PLC algorithms described above run at the receiving end (i.e., at the decoder). When the decoder determines that its packet receiving buffer is empty, implying that the packets which should follow the previous packets have either been lost or been delayed, it begins PLC processing. In the case of packet repetition—the most commonly used prior art PLC technique—this processing involves simply using the previous received packet. This choice is based on the assumption that speech is quasi-stationary—that is, the current missing packet will most likely possess similar characteristics to the previously received packet. However, this assumption is not always valid.