The unreliable and stateless nature of today's Internet protocol (IP) results in a best-effort service, i.e., packets may be delivered with arbitrary delay or may even be lost. This quality of service (QoS) limitation is a major challenge for real-time voice communication over IP networks (VoIP). Since excessive end-to-end delay impairs the interactivity of human conversation, active error control techniques such as retransmission cannot be applied. Therefore, any packet loss directly degrades the quality of the reconstructed speech. Furthermore, delay variation (also known as jitter) obstructs the proper reconstruction of the voice packets in their original sequential and periodic pattern.
Considerable efforts have been made in different layers of current communication systems to reduce the delay, smooth the jitter, and recover the loss. On the application layer, receiver-based, passive methods have the advantage that no cooperation of the sender is required. Furthermore, these methods can operate independently of the network infrastructure.
The common way to control the playout of packets is to employ a playout buffer at the receiver to absorb the delay jitter before the audio is output. When using this jitter absorption technique, packets are not played out immediately after reception but held in a buffer until their scheduled playout time (playout deadline) arrives. Though this introduces additional delay for packets arriving early, it allows the playing of packets that arrive with a larger amount of delay. Note that there is a trade-off between the average time that packets spend in the buffer (buffering delay) and the number of packets that have to be dropped because they arrive too late (late loss). Scheduling a later deadline increases the possibility of playing out more packets and results in lower loss rate, but at the cost of higher buffering delay. On the other hand, it is difficult to decrease the buffering delay without significantly increasing the loss rate. Therefore, packet loss in delay-sensitive applications, such as VoIP, is a result of not only packets being dropped over the network, but also delay jitter, which greatly impairs communication quality.
Prior art attempts to solve this problem mainly focused on improving the trade-off between delay and loss, while trying to compensate the jitter completely or almost completely within talkspurts. By setting the same fixed time for all the packets in a talkspurt, the output packets are played in the original, continuous, and periodic pattern, e.g., every 20 ms. Therefore, even though there may be delay jitter on the network, the audio is reconstructed without any playout jitter. Other prior art solutions apply adaptive scheduling of audio and other types of multimedia, accepting a certain amount of playout jitter. However, in these methods, the playout time adjustment is made without regard to the audio signal and how continuous playout of the audio stream can actually be achieved is not addressed. As a result, the playout jitter that can be tolerated has to be small in order to preserve reasonable audio quality.