Voice communication over Internet Protocol (VoIP) has experienced rapid growth in recent years. However, the quality of VoIP is usually not as good as is that provided by the traditional Public Switched Telephone Network (PSTN). In fact, VoIP is affected by various transmission impairments that do not appear in PSTN systems. They include: packet delays, packet loss, and packet delay variation (or jitter). It will be appreciated that in order to achieve PSTN-like quality, the impact of these transmission impairments should be minimized.
The quality of VoIP, as perceived by an end-user, is a combined effect of conversation's interactivity and listening speech quality. Large packet delays introduced in IP networks influence a conversation's interactivity. Packet loss impacts on the listening speech quality. Finally, packet delay variation affects both interactivity and speech quality (jitter is processed by means of de-jitter buffering and transformed into either additional de-jitter play-out delay or further late packet loss) [1].
Real-time VoIP transmission imposes stringent requirements on one-way mouth-to-ear delays and packet loss. ITU-T defines these requirements by introducing so called “contours of user satisfaction” that determine speech transmission quality for all possible combinations of packet loss and mouth-to-ear ear delay [2]. The responsibility of meeting these requirements is shared between end-points and the underlying network. As long as transmission impairments remain below a certain level, actions at end terminals can be employed to mitigate their effects. For example:                optimal voice encoding schemes can be used to give smaller bandwidth utilization;        packet loss concealment (PLC) can be implemented to mitigate the effect of packet loss on speech quality;        de-jitter buffering can be implemented to compensate for packet delay variation;        echo cancellation techniques can improve speech intelligibility.        
On the network side there is considerable development activity in designing new architectures and protocols. Integrated Services (Int-Serv) mechanisms can provide QoS guarantees by adding circuit-like functionality (with the use of RSVP protocol). Differentiated Services (Diff-Serv) mechanisms enable service differentiation and prioritization of various traffic classes (e.g. prioritizing VoIP traffic over other traffic types)
Application-layer and network-layer mechanisms can greatly mitigate the effect of transmission impairments on VoIP quality. VoIP devices may make use of both of these mechanisms in order to achieve PSTN-like conversational quality. However these quality enhancement mechanisms are often complex and difficult to configure. Moreover, tuning one parameter can often lead to a local performance improvement but can have a disastrous effect on the overall end-to-end VoIP quality. If a part of the VoIP transmission path is being tuned, the impact of local tuning actions on end-to-end VoIP quality (i.e. both interactivity and speech quality) has to be taken into account. An example of such a tuning process is the process of tuning the size of the de-jitter buffer at VoIP terminals.
To compensate for jitter a typical VoIP terminal buffers incoming packets before playing them out. This lets slower packets arrive on time and play out at their sender-generated rate. In theory, the optimal play-out delay for this de-jitter buffer should be equal to the total variable delay along the connection. Unfortunately, it's impossible to find an optimal, fixed de-jitter buffer size when network conditions vary in time. The fluctuating end-to-end network delays may cause play-out delays to increase to a level, which is irritating to end users (when the de-jitter buffer is too large) or may cause packet losses due to their late arrivals (when the de-jitter buffer is too small). A good playout algorithm should be able to keep the buffering delays as short as possible while minimizing the number of packets that arrive too late to be played out. These two conflicting goals have led to various de-jitter buffers with a dynamic size allocation, so called adaptive playout buffers [3], [4], [5], [6], [7], [8].
An adaptive playout buffer makes it possible to balance its buffering delay—a major addition to end-to-end delay—with the possibility of late packet loss. A fundamental trade-off exists between late packet loss and buffering delay as both increased packet loss and increased buffering delay impair the conversational VoIP quality. This loss/delay trade-off leads to an operating point where maximum conversational VoIP quality may be achieved. Typical adaptive playout algorithms are not designed to search for this operating point. Instead, they simply follow network delays closely while attempting to keep both delay and loss low. Given that the purpose of de-jitter buffering is to improve conversational VoIP quality (i.e. both interactivity and speech quality), a more informed choice of the playout mechanism may be made by considering its effect on user satisfaction [9],[10].