When a media stream in a conventional media service, such as e.g. a telephony service, is received by a network node, such as e.g. a media gateway, from an interface where transport delay tends to vary over a wide range, a jitter buffer will be required at the input of the network node in order to guarantee a continuous and constant rate of the play-out from the network node towards another interface, which may require a very limited variation in the output timing.
The general principles of jitter buffering in a network node is described with reference to FIG. 1. It is to be understood that only parts which are essential for the understanding of jitter buffering are shown in the figure, while other parts necessary for the speech processing functions, such as e.g. speech encoders and decoders, have been omitted for simplicity reasons. For the same reasons, the figure only describes how media transmission is executed in one direction, i.e. in the uplink, omitting the downlink transmission, which completes a two-way conversation.
In FIG. 1, a speech source 100, which is configured to deliver real-time data in a media stream, to one or more users, generates packets with a constant time interval, Trepin 102. As the packets are routed through a packet switched network 101, a transport delay which is not constant will be introduced to the media stream. In the figure this phenomenon, referred to as jitter, is illustrated as packets, leaving network 101 with irregular intervals, 103. Since a number of packets may arrive to an intermediate network node 104 with very short time intervals, i.e. in bursts, followed by a time interval when no packets arrive at all, the pattern with which packets arrive to the network node may be difficult to predict and to handle.
A common way to keep the jitter under control is to implement a jitter buffer 105 at the intermediate network node 104. In addition to the transport delay, caused by the network, the jitter buffer 105 will introduce another delay, which can be identified as a jitter protection time Tjit 106, as packets arriving to the network node are buffered 107 into the jitter buffer before they are played-out 108 from the network node with a recovered constant interval, Trepout, 109 which is equivalent to Trepin. The packets can now be forwarded to one or more terminating entities (not shown) via another transport network 110, typically a circuit switched network, which does not tolerate jitter.
If Tjit 106 is a pre-set constant, the jitter buffering is called static buffering, and, thus, all buffered packets will experience the same jitter buffer delay. If on the other hand Tjit is allowed to change on the basis of some kind of analysis of the behaviour of the delay at the input of the network node, the buffering method is instead referred to as adaptive jitter buffering.
In order to avoid longer delays than what is absolutely necessary, adaptive jitter buffering is preferred over static buffering. In order to operate properly, a jitter buffer enabling static buffering has to be dimensioned for the worst case variation of the delay, and, thus, the delay caused by static buffering will typically be much higher than what is required for dynamic buffering, especially when the worst case occurs relatively seldom.
Adaptive jitter buffering algorithms are usually developed for receiving ends of terminals or clients, which typically are assigned for a single end user. In network nodes, however, one processing unit is typically shared by tens, or even hundreds of concurrent users, or stream instances. In such a situation, simplicity of the buffering algorithm will become a vital issue, in order for the operator to keep the processing costs per channel low.
When dimensioning network buffers, there is usually a trade-off between simplicity and the perceptual quality which has to be taken into consideration. This means that the buffering algorithm implemented at a network node should be as simple as possible, but still good enough in quality, without the quality having to reach the quality level which is necessary at a typical end-user terminal. A scalable play-out requires a rather complex function at network nodes, compared to what is required at end-user terminals. In network nodes, speeding up, or catching up, is usually made by skipping packets, or frames, while slowing down is realised by inserting frames, i.e. as error concealment packets.