1. Field of the Invention
This invention relates to digital voice communications in general and more specifically to conveying voice information digitally over a non-ideal packet network, such as providing long distance telephone service over the Internet using Voice-over-Internet-Protocol (VOIP).
2. Description of the Related Art
A typical VOIP system is shown in FIG. 1. Person A's voice is digitized, compressed, and divided into small packets of encoded binary data by Gateway B (numbered in temporal sequence in the figure, for convenience). The packets are sent over the unregulated network C which results in them arriving at the far end Gateway D with varying amounts of delay on each packet. Gateway D puts the packets back in the correct order (1,2,3,4), then uncompresses (or, synonymously, decodes) the encoded binary data and thus provides a continuous audio signal to person E which sounds like a slightly delayed copy of what person A said. The same process typically happens in the reverse direction at the same time, thus supplying a full duplex conversation.
In general, there are at least three factors which determine the perceived quality of the resulting phone conversation: (1) distortions introduced by the compression/decompression (coder losses); (2) total delay from speech event to aural reception. (3) drop outs and other artifacts due to packets arriving too early or too late to be correctly included into the audio stream (or outright packet loss).
Appropriate audio compression/decompression methods are available so that issue (1) does not contribute significantly to the overall perceived quality of the conversation. Examples of such coders include ITU standards G.728, G.729, G.729a, G.723.1, GSM, G.722 and many others which provide a Mean Opinion Score (MOS) of 3.6 to 3.9 as compared to the perfect toll-quality telephone connection MOS of 4.1 Simply put, if all the packets arrive quickly and no packets are lost, these coders can provide call quality which is very hard to distinguish from a normal phone call over high quality circuit switch connections (e.g. the traditional PSTN phone system).
Issues (2) and (3) are still troublesome in packet networks, even with efficient codecs. FIG. 2 shows a prior system which uses a static jitter buffer 20 to compensate for variable network delays encountered by packets. Exemplary packets 1,5,6 and 10 arrive asynchronously, and each is routed to an appropriate relative position in the jitter buffer queue 20, according to its temporal address (commonly tagged as part of the packet). The jitter buffer 20 is then shifted, much in the manner of a shift register, to read the packets out serially to a decoder 22 in the correct sequence. Slight jitter in arrival time is thus eliminated, as the contents of the buffer 20 are clocked out by a synchronous local clock at the receiver.
A large static jitter buffer can be designed into the receiving gateway to optimize performance against large amounts of network delay jitter at the cost of large delays which will be noticed by users; on the other hand, a small jitter buffer can be used which will introduce minimal delays but at the cost of significant packet loss. In this case, call quality degrades when the network jitter exceeds the size of the jitter buffer.
Conventionally, a compromise is adopted: a fixed jitter buffer of medium size is used, which introduces noticeable but only midly annoying delays. One such system, for example, is described in U.S. Pat. No. 5,526,353 to Henley et al. (1996). That system uses a jitterbuffer of predetermined length to reassemble packets, thus introducing a fixed delay. (The amounts of data available to the buffer vary, but not the buffer length). Such a jitter buffer manages to accomodate most network delays with only periodic drops in quality when the network is unusually slow or fast. Users may notice the fixed, moderate delays on all calls (typically 50-100 ms for internet telephony, according to Henley Col. 6, line 66), and many calls will have compromised audio quality due to failure of packets to fit in the jitter buffer (early or late arrival).