In a video-over-IP system, each image, or frame, may be encoded into one or several data packets that are sent with minimal delay (“back-to-back”) to the IP network. The frames are usually produced at a constant frame rate, wherefore the packet clusters are sent at the same constant rate. On the receiver side the packets arrive with a variable delay. This delay is mainly due to the delays inflicted by the IP network and is often referred to as jitter. The severity of the jitter can vary significantly depending on network type and current network conditions. For example, the variance of the packet delay can change with several orders of magnitude from one network type to another, or even from one time to another on the same network path.
In order to reproduce a video stream that is true to the original that was transmitted from the source(s), the decoder (or receiver) must be provided with data packet clusters at the same constant rate with which the data packet clusters were sent. A device often referred to as a jitter buffer may be introduced in the receiver. The jitter buffer may be capable of de-jittering the incoming stream of packets and providing a constant flow of data to the decoder. This is done by holding the packets in a buffer, thus introducing delay, so that also the packets that were subject to larger delays will have arrived before their respective time-of-use.
There is an inevitable trade-off in jitter-buffers between buffer delay on the one hand and distortions due to late arrivals on the other hand. A lower buffer level, and thus a shorter delay, generally results in a larger portion of packets arriving late or even being discarded, as the packets may be considered as being too late, while a higher buffer level, and thus a longer delay, is generally detrimental in itself for two-way communication between, e.g., humans.