Packet-switched networks are increasingly used for real-time multimedia communications. Thus, there is a requirement that endpoints be able to recover from network impairments.
One of these impairments is called “jitter”. Jitter can be considered in a wide sense. Herein, jitter is the variation in the duration between the time a frame is captured by a transmitter audio card and the time it is received by a receiver. Therefore, it includes not only network jitter, i.e. variations in transmission delays, but also variations in processing delays.
Jitter is a severe audio stream impairment. In order to be understandable, audio streams must not be interrupted, or at least be interrupted as less as possible. If frames were played out as they arrive at the receiver, due to the jitter, the playing would be constantly interrupted. Hence, arriving frames are not played out immediately, but kept in a so-called jitter buffer. A playout algorithm must then be implemented in the receiver in order to determine the playout time of the received frames.
In its simplest form, the algorithm buffers the first received frame for a predetermined time before playing it. Therefore, instead of interrupting the audio stream, an initial delay is applied to the stream.
The problem with such a method, however, is to decide how long this buffer delay should be. A large delay will minimize the probability of an interruption but will cause a lack of interactivity between the end-users. Moreover, the packet delay distribution may be quite complex and variable over time. Thus, applying a fixed delay is satisfactory only in a limited number of cases, e.g. in communications over a Local Area Network (LAN) with limited delay, but does not scale to more complex networks, particularly the Internet.
In order to overcome the above-mentioned problem, adaptive algorithms have been introduced. Jitter adaptation is based on silence compression/expansion, wherein silence is a conversational device. In a conversation, silence indicates a speaker's expectation that his interlocutor starts talking. Therefore, silence can be expanded or compressed without impairing the understandability. An adaptation algorithm estimates the jitter from packet arrival times and then modifies silence period lengths according to the latest estimate. For example, jitter adaptation algorithms based on this idea can be found in Sue B. Moon, Jim Kurose, Don Towsley, “Packet audio playout delay adjustment: performance bounds and algorithms”, Multimedia Systems, Springer Verlag 1998, pp. 17–28, and in Ramachandran Ramjee, Jim Kurose, Don Towsley, Henning Schulzrinne, “Adaptive Playout Mechanism for Packetized Audio Applications in Wide-Area Networks”, in Proceedings of the conference on computer communications, (IEE Infocom, Toronto, Canada), pp. 680–688, IEE Computer Society Press, Los Alamitos, Calif., June 1994.
In the above-mentioned adaptive or adaptation algorithms, the received frames playout times are computed so as to achieve a good trade-off between buffering delay and residual drop rate, which will be described later.
However, this adaptation scheme is not sufficient because it trades-off a drop percentage against an added buffering delay. What should be traded-off is the drop against the response time which is defined as the time elapsed between the capture of a given frame of speech at one endpoint and its playout at an other endpoint plus the same quantity in the other direction. In accordance with the conventional adaptation scheme mentioned above, the added delay reflects only partially the response time.
It is therefore an object of the present invention to overcome the aforementioned adaptation algorithm limitations and to allow a terminal to trade-off the response time against the drop instead of the added buffering delay against the drop.