This invention relates to a data processing device for playing out buffered media data packets to a media consumer.
Expectation of voice over internet protocol (VoIP) services is growing rapidly due to improvements in high-speed wireless internet technology and more powerful mobile devices. In packet-switched networks, the regularity of a VoIP stream is however naturally impaired by routing, queuing, scheduling and serialization effects, which result in loss and jitter (including delays) to data packets. The main factors affecting voice quality are in fact delay and loss which cannot generally be known in advance to the receiving device because they depend on the real-time behaviour of connections throughout the network.
Achieving high quality real-time voice transmission between VoIP devices requires mechanisms for smoothing out the jitter inherent in a received stream of network data packets. This is generally done by means of an Adaptive Jitter Buffer (AJB).
Most of the existing jitter buffer algorithms calculate play-out times of data packets to a media decoder using adaptive estimation of network jitter. The adaptive algorithm typically uses adaptive dual alpha or other relevant weighting factors, for example as is described in “Perceptual optimisation of playout buffer in voip applications”, Chun-Feng Wu and Wen-Whei Chang, First International Conference on Communications and Networking in China, ChinaCom 2006. Network statistics and a history of measurements may also be used for controlling the adaptation, for example as described in “Jitter Buffer Loss Estimate for Effective Equipment Impairment Factor”, Pavol Partila et al., International journal of mathematics and computers in simulation.
Such conventional algorithms can sometimes work under slightly impaired network conditions, but the behaviour of bursty traffic, self-similar traffic and long range dependent traffic often differs from the ideal stochastic models of absolutely independent packets which these techniques use when trying to assess or describe traffic inter-arrival times (e.g. using standard distributions such as Markov models, Poison distributions, exponential distributions, neural network modelling, etc.) These algorithms therefore suffer from suboptimal performance as these models can give wrong or inaccurate predictions on the inter-frame dependency between consecutive packets.
Recently EMOS (Equivalent Mean Opinion Score) based algorithms are becoming more popular due to better performance than the performance of adaptive estimation algorithms. EMOS algorithms for predicting the subjective quality of packetized voice have been standardised in ITU-T G.107. Examples of EMOS algorithms are described in “E-model MOS estimate precision improvement and modelling of jitter effects”, Information and Communication Technologies and Services, Vol. 10, 2012. However, EMOS algorithms are sensitive to network delay and can often discard a significant number of packets even under slightly poor network conditions—for example, if a gateway or media server adds considerable fixed delay.
Both adaptive estimation and EMOS algorithms suffer severely when streams of network packets experience significant jitter and bunching effects.