In IPTV (Internet protocol television), video programs have different format stages during their life circle. A video encoder compresses the video program to a bit stream, also referred to as an elementary stream (ES). The ES is further packetized into a transport stream (TS) and finally transmitted in an IP channel. Video quality can be measured using data that are obtained by accessing the transport stream, elementary stream or decoded video. Among the three types of measurement, using the transport stream is generally the fastest but the least accurate, since it has the smallest amount of video data available; using the decoded video is often accurate but the slowest, since decoding the video is computationally expensive; using the elementary stream can achieve a tradeoff between the accuracy and the computational complexity. Currently, particularly video quality measurement based on the elementary stream is being investigated.
Video compression generally employs quantization techniques. Quantization is a lossy compression technique by means of limiting the precision of signal values. It is well known that quantization is a significant factor to artifact visibility, and the quantization parameter (QP) is a powerful predictor to the video quality. Various functions of video quality with respect to QP have been provided in the literature, such as linear function [1, 2] and exponential function [3]. However, they are insufficiently accurate for the relatively large and/or the relatively small QP level, and thus their results are not satisfactory for low-bandwidth or high-fidelity applications.
The content complexity is another critical factor for video quality measurement (VQM). Visual artifacts in complex videos are more likely to be tolerated by the human eye, and thus show better quality. Therefore, content complexity in combination with QP can improve the accuracy of quality measurement, compared to using the QP alone.
Traditionally, as in [4], content complexity may be quantified as the variance, the gradient, or the edge filter response of pixel values, or their combinations. The traditional methods have at least the following disadvantages.
First, such features are not tightly correlated with human visual perception. A video with large content complexity may have not only rich texture and irregular motion, but also many edges and/or regular motion. For human eyes, visual artifacts are more likely to be tolerated in texture and irregularly (i.e., stochastically) moving regions, but ordinarily more attractive and visible in edges or regularly (i.e., constantly) moving regions. Second, such features can hardly be computed until the pixels are recovered after full decoding. Thus, the traditional complexity measurement is computational expensive since it requires full decoding of the video.