In packet switched networks, such as the Internet, data packets transferred by the network are subject to varying delays due to network load when transferring a packet, network path for a transferred packet, and other network conditions. Thus, data packets that are produced by a transmitter at a constant rate arrive at a receiver with variable delays. The varying delay of a data packet is mainly due to the delay inflicted by the packet network and is often referred to as jitter. The severity of the jitter can vary significantly depending on network type and current conditions; the variance of the packet delay can change with several orders of magnitude from one network type to another.
In order to reproduce an audio stream that is true to the original, a decoder must be provided with data packets at the same constant rate with which they were sent. Therefore, a device called a jitter buffer is commonly introduced in the receiver. The jitter buffer must de-jitter the incoming stream of packets and provide a constant flow of data to the decoder. This is done by holding the packets in a buffer, thus introducing a delay at the receiver, so that future packets that are subject to larger delays will have arrived before their respective time-of-use. In other words, packets are needed in the jitter buffer to prevent the buffer from underflowing, or at least minimizing the time during which the buffer is in a state of underflow. A long delay of a packet may not only result in that the buffer becomes empty, but also that the buffer may be empty for an unacceptable long time. If the buffer becomes empty, continued playback of the received signal is no longer possible and the delayed packet will be treated as a lost packet. However, a high buffer level will introduce a long delay at the receiver which is detrimental in itself for two-way human communication.
There is an inevitable trade-off in jitter buffers between buffer delay on the one hand and packet losses due to late arrivals on the other. Aiming for a low buffer level, and thus a short delay, results in a larger portion of packets being discarded since they will arrive too late for continuous playback, while a high buffer level and a long delay will be very annoying for two-way human communication.
In the prior art, attempts are often made to estimate and control an end-to-end delay, i.e. the total delay from the sound source, e.g. a microphone, to the destination, e.g. a loudspeaker. This total delay is hard to estimate and requires synchronized clocks on transmitting and receiving ends. In addition to requiring synchronized clocks, this solution suffers from the problem of sample clock drift.
The present invention addresses the problem of how to determine a jitter buffer level which provides a suitable trade-off between buffer delay and packet losses.