In packet switched systems, such as the General Packet Radio System (GPRS) wireless system, the uncertainty due to variations in data packet arrival times can have a significant impact on system performance. Reasons for the variation in packet arrival times include congestion of network resources and route variations between successive packets. When the packets contain voice data, as in a VoIP system, in order to obtain a continuous voice output the buffering depth, or buffering delay at the data packet receiver must be proportional to the variations in packet arrival times.
A conventional fixed initial delay data buffer can remove the variations to some extent. It is, however, very likely that the network conditions will vary depending on congestion of network resources, the location of the receiving terminal and the specific implementation of the network components. With conventional (fixed delay) buffering it is impossible to react to changing network conditions. In addition, when the throughput is consistently low it is impossible to prevent receiver buffer underflows.
For these reasons some type of adaptive buffer management needs to be introduced if optimal operation is desired in terms of buffering delay and minimal interruptions in output voice. The buffer management should be capable of changing the buffering delay in as smooth a manner as possible. Stated another way, it is most desirable if the change of buffering delay is done at same ratio over a longer interval, than if the buffering delay is first decreased then increased and then decreased again and so on over short intervals. To prevent these fluctuations the prevailing network conditions should be estimated as closely as possible. In order to accomplish this estimation, it is important to first define what important network characteristics should be the subject of the estimation.
At least two prior art buffer management techniques have required accurate knowledge of the network end-to-end delay: Ramjee R. (1994), “Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks”, in IEEE INFOCOM '94, The Conference on Computer Communications Proceedings, 12-13 June, Toronto, Vol.2. pp. 680-688, Canada; and Liang Y. J. (2001), “Adaptive Playout Scheduling Using Time-Scale Modification in Packet Voice Communications”, in IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings, 7-11 May, Salt Lake City, Vol. 3, pp 1445-1448, USA.
Due to fact that an exact knowledge of network end-to-end delay is currently not possible to obtain, another technique has been proposed that does not require this information: Telefonaktiebolaget L M Ericsson. “Adaptive Jitter Buffering”, WO 00/42749. This approach attempts to estimate network conditions over a fixed sampling interval. While this approach may have some use when interruptions (delay spikes) occur at relatively short intervals, if the interval between successive interruptions is greater than the sampling interval then there can be cases where no interruption will occur during one of the sampling intervals. From this it follows that the control mechanism will decrease the buffering delay, as opposed to the situation where the interruption would have occurred. Now if the interruption occurs during the next sampling interval it causes an undesired interruption in speech due to buffer underflow. After the speech interruption the buffering delay is increased once again during the following sampling interval. As can be appreciated, this type of operation can readily lead to the situation where the buffering delay is decreased/increased/decreased and so on by the control mechanism, resulting in unnecessary fluctuations in the playout rate. In addition, some fixed number of packets must be accumulated before performing the buffering delay change (sampling interval). This results in a slower reaction time when packets arrive at a reduced rate, and potentially increases the possibility of interruptions because the buffering delay is increased only after the sampling interval. In the approach of WO 00/42749 the change in the buffer delay is accomplished by discarding or delaying packets, where more specifically the change is made during a silent period by adding or removing speech frames containing silence. However, adding or removing only silence leads to changes in the time relation between silent periods and speech periods, which can result in unnatural sounding very long or very short silences between sentences or even between words. The duration of the silent periods can vary from sentence to sentence or from word to word, and therefore results in an unnatural rhythm to the speech.
In general, the adaptive buffer management should be applied only when it is needed. The situation in a packet switched network may well be such that the packets arrive in bursts, and between each burst there is a long (perhaps several seconds) delay. This is not a problem if the long-term arrival interval average is the same as the rate at which the packets were created. This means only that the physical buffer size should be long enough at the receiver side to accommodate the variations. However, in the design of the adaptive management this should be considered, since the play-out rate of voice should not annoyingly fluctuate if the buffering delay fluctuates.
It can thus be appreciated that the current approaches to dealing with the variability of arrival times of data packets containing voice or video signals are not satisfactory, and do not adequately address the problems inherent in providing natural sounding voice in VoIP and other types of data packet-based network systems.