With the development of Internet Protocol (IP) networks, video communication over wired and wireless IP network (e.g. IPTV service) has become very popular. Unlike traditional video transmission over cable network, video delivery over IP network is much less reliable. The situation is even worse in the environment of wireless networks. Correspondingly, one recent requirement for video quality modelling (VQM) is to rate the quality degradation caused by IP transmission impairment, e.g. packet loss, delay, jitter, except those caused by video compression. Current research addresses objective video quality assessment models at media-layer or at packet-layer for estimating audio/video quality of experience (QoE)i. Quality is usually expressed as a mean opinion score (MOS) value. Media-layer models use media signals, such as speech waveform or video pixel data. Packet-layer models use only packet header information, and may help in automatic network diagnosis to guarantee user experience. It is a light-weight model as compared to the media-layer model, thus suitable for real-time monitoring and for easy deployment in customer devices (e.g. STBs). i Akira Takahashi: Framework and Standardization of Quality of Experience (QoE) Design and Management for Audiovisual Communication Services, NTT Technical Review 4/2009, www.ntt-review.jp/archive/2009/200904.html
Currently known objective packet-layer models use packet information as input parameters, e.g. packet loss rate, timestamp in RTP header and packet size. These input parameters are video content independent. However, a loss of different part of video content arouses different perceptual degradation. This is a shortage of the existing packet-layer VQM that affects its performance in terms of estimation accuracy and robustness.
A further problem of the existing models is that the effect of error concealment (EC) is not fully taken into account. The impact of a lost packet on visual quality depends significantly on the error concealment methods employed in the decoder, in addition to the encoder configuration. Though a known VQM modelii uses two model coefficients, depending on the employed packet-loss concealment scheme, the two model coefficients b0 and b1 are fixed for given scheme. A similar model uses fixed EC weights for a given EC scheme employed in a decoder, wherein the value of the EC weights is set empirically. However, it has been observed that setting a fixed EC weight for each decoder is far from approximating the actual effect of EC. ii A. Raake, M.-N. Garcia, S. Möller J. Berger, F. Kling, P. List, J. Johann, C. Heidemann, T-V-MODEL: PARAMETER-BASED PREDICTION OF IPTV QUALITY, ICASSP 2008
Further, in packet based networks the coded bits of a video frame may be encapsulated into several packets, such as RTP packets, depending on the maximum transmission unit (MTU) size of the underlying network. For H.264 encoder, a video frame may be encoded into several slices. For transmission, data of a slice may be encapsulated in several RTP packets, or several slices may be encapsulated in one RTP packet, or one slice is used per packet, depending on MTU size. Thus, the relationship between the statistics features of packet loss and MOS is not stable. Therefore the existing models are not able to provide stable performance due to limitation of only considering statistics features of packet loss and MOS.