Streamed data delivery technology is useful in delivering sound or video data over a packet-switched data network such as the Internet because the sound or the video can be played almost immediately during a realtime information exchange session. The audio or video data is delivered continuously as sequential packets. Such a system is used to implement Internet telephony, a term used to describe the transmission of telephone calls over the Internet.
One problem with achieving acceptable quality telephone calls over the Internet is the varying delays of a packet network such as the Internet. Specifically, such Internet telephone calls are typically implemented between gateways that communicate over the Internet. Each gateway is then connected to an end user telephone over a conventional telephone network or through other means. An exemplary such system is shown in FIG. 1.
Using the arrangement of FIG. 1, a telephone call may be completed between telephones 101 and 107. The audio from telephone 101 to telephone 107 travels over a conventional public switched telephone network (PSTN) 102 and is received by gateway 103. The audio is then packetized and transmitted using an internet protocol and other well known packet switching techniques to a gateway 105, which may be located in a remote country. Typically, the packetized voice is also encoded using one or more standards such as G 729, G 723, etc.
At gateway 105, the received packets are converted back to a conventional audio signal for transmission over a PSTN 106 to telephone 107. Communications in the opposite direction, from telephone 107 to telephone 101, is typically accomplished in an identical fashion. Additionally, one or both telephones may involve a computer connection directly to the gateway, as indicated at 120 and 122.
Considering, for explanation purposes, audio traveling from telephone 101 to telephone 107, one problem is the variable delays that the packets exchanged between gateway 103 and gateway 105 experience. Specifically, although the packets leave gateway 103 in a specified order, they often do not arrive at gateway 105 in the same order. The packets are switched through the network 104 using different paths which may change dynamically during any one call. Additionally, the router switches that convey the packets through network 104 may be busier at certain times than at others, thereby introducing varying delays. Since the packets often represent human voice, packets may not be presented out of order. Rather, the packets must be put into their original sequence, at the receiving gateway 105, and then turned back onto analog voice.
A buffer may be provided at the receiving gateway to hold packets. The buffer introduces an additional delay at the receiving gateway, but permits packets arriving out of order to be rearranged in sequence. Thus, the packets that leave the receiving gateway to be transmitted to the receiving telephone 107 are in the proper order. If the gateway 105 converts the packets to analog voice, then the analog signal is properly constructed based upon packets in the right order.
If a packet experiences a delay through the network that is unusually long, it could arrive too late to be used and must therefore be discarded. For example, consider three sequentially transmitted packets P1, P2, and P3. If the first packet PI arrives at receiving gateway 105 after P2 and P3 have already been transmitted from gateway 105 to telephone 107, then P1 must be discarded. It would make no sense to send earlier occurring voice to the listener after later occurring voice has already been heard by that listener.
In order to ensure that only a small number of packets are lost, it is desirable to make the buffer at gateway 105 very long in time. This means that packets that experience a relatively large delay (i.e., much longer than average) through the network can still be placed into sequence at the receiving gateway 105 before the earlier arriving packets are sent to the listener. On the other hand, a long buffer latency at receiving gateway 105 means there will be a relatively long delay between a speaker at telephone 101 speaking and the speech arriving at telephone 107. This relatively long delay is undesirable, and often results in the parties interrupting each other.
In order to optimize the buffer latency in such systems, typically, a statistical estimate of packet delays is calculated or arrived at empirically. An acceptable probability of lost packets is then specified, and the buffer latency is set at the minimum amount that assures that an acceptable level of packets lost for a given set of statistics regarding packet delay variances. This trades off delay (i.e. latency) against packet loss. The longer the delay, the less chance of packet loss.
The foregoing solution is less than optimal because it can result in false buffer adjustment. For example, the delays over the network are not always constant. During times when the delays are less than calculated, the buffer is too long and introduces extra delay. During times when the network is more congested and the packet delay increases, the latency will probably not be long enough and too many packets will be lost. Therefore, it is desirous to have an optimal buffer latency to avoid an incorrect buffer adjustment so as to insure good audio quality as well as to minimize the buffer latency.