As multimedia services become more pervasive, video communications will play an increasing role in entertainment as well as in important new classes of applications such as tele-collaboration, tele-health and distributed education. Consumer applications will continue to be entertainment-intensive, with the new foci of EoD (everything-on-demand) and mobility.
Video quality evaluation is an important problem in audiovisual communications. The need for perceptually meaningful objective metrics is broadly recognized, and such measures have the dual role of (a) understanding signal quality in completed algorithm designs and (b) providing an in-the-loop metric for real-time algorithm steering. For video, subjective testing is the ideal approach, since it involves real viewers evaluating the end output. In current subjective testing methodology, the discrete-point of Mean Opinion Score (MOS) and Mean Impairment Score (MIS) are well understood and provide useful quality measurements under conditions in where there is adequate training of subjects, and if the mean scores are appropriately qualified by a standard deviation score reflecting interviewer differences. Some established methods of subjective testing involve viewers watching different video clips and giving each clip a score, or giving a continuous score using a user feedback device like a slider or throttle. Some of the desired characteristics of a testing scheme involve ease, intuitiveness, effectiveness, and giving the user real-time feedback about the current score. Mean Time Between Failures (MTBF) is an intuitive video quality metric that is used in this work.
Subjective testing takes up a significant amount of time and effort, and hence objective testing for video is a more practical approach. Objective metrics can be broadly classified based on the amount of information available about the original video. Full-reference metrics need the complete original signal for metric calculation. Reduced-reference metrics need some information extracted from the original video to be transmitted over the channel for comparison with the received video. No-reference metrics work on the received video alone. No-reference and reduced-reference metrics are considered to be more practical than full-reference metrics because the original video is in general not available at an arbitrary place of quality evaluation such as a network node or the ultimate receiver.
Block-Transform based compression schemes like MPEG-2 and H.264 introduce a variety of artifacts in the video. Blockiness and blurriness are two of the most common artifacts. Block artifacts occur when the DCT-block edges are visible in the picture frames, and blurriness is caused at times when the edges in the image are subject to excessive compression. Apart from these compression related artifacts, packet losses in the video stream cause network artifacts as well, which manifest themselves as unnatural streaks in the frames or as stuck/reordered frames. There are a considerable number of blockiness metrics in literature, and exhaustive surveys of those metrics as well. Most metrics compare the inter-block and intra-block differences to get an estimate of the video quality. Some metrics compare the differences in correlation between and across block boundaries. Some metrics measure blockiness from the histogram of edge angles in the video frames. These blockiness metrics in general focus on a video frame, and do not incorporate temporal masking. The metrics described above are no-reference in nature, meaning that the quality score can be evaluated with just the received video. There are some reduced-reference metrics as well, that evaluate blockiness. For instance, one such metric evaluates video quality by measuring the degradation of certain features extracted over the frames. One of the features relates to the addition of new edges in the compressed video that are close to horizontal or vertical alignments.
Some of the drawbacks of current metrics are that they can function unexpectedly when the image contains intended edges (i.e., blurry edges that are naturally present). This problem is avoided at times by using different thresholds for omitting natural edges. The threshold calculation is difficult, however, resulting in a few false decisions. When the metrics are calculated over an original signal with no block artifacts, one would expect a metric signature that indicates an error free signal. In general, this is not the case, and there is in fact a varying signature with time. This problem is particularly encountered when there are scene changes in the video.
In addition to blockiness, research has also evaluated the effect of packet losses on video. The algorithms used in detecting network errors can be bit-stream based, pixel-based, or a combination of the two. One such algorithm estimates the mean squared error by just looking at the received bit-stream. A classifier algorithm is used to measure the visibility of a packet loss based on certain stream parameters. The temporal locations of the packet losses, the amount of motion and the accuracy and consistency of motion prediction are some of the parameters considered. Some network-error detectors use blockiness metrics in a modified fashion. The blockiness is measured as a function of time, and any abrupt changes in this signature are used to indicate a network error. This simple pixel based measure could possibly face problems with video that is varying considerably or has many scene changes.
Blurriness has also been evaluated in prior research. To measure blurriness, conventional blurriness metrics typically focus on measuring the blurriness either directly or indirectly through a measure of sharpness. One such metric, for example, locates the edges in a given frame and evaluates blurriness as a measure of the average edge spread. In another such metric, a measure of image sharpness is obtained by calculating the local edge kurtosis around edges. Some metrics compute the blurriness as a function of the histogram of DCT coefficients in the compressed bit-stream. Some of the disadvantages of the conventional blurriness metrics, as described above, are that they typically require accurate edge detection, and further, blurry edges that are intended to be in the video are oftentimes incorrectly denoted as visually bad. Further, conventional techniques do not incorporate temporal effects of blurriness.
Based on the above, there presently exists a need in the art for an enhanced no-reference objective video quality metric that can evaluate these different artifacts with a unified approach and also correlates well with subjective video evaluations.