The present invention relates to video processing, and more particularly to an adaptive spatio-temporal filter for human vision system models used in determining the quality of video service.
Video is recorded and transmitted via methods that may create errors, such as lossy compression systems and the like. It is of interest to produce objective measures to predict the human perceptibility of these errors. The perceptibility of these errors at and above a threshold is a function of many aspects or parameters of the errors and the video context in which they occur. Perceptual sensitivity to errors in video as a function of spatial frequency varies with local average luminance, duration of local stimulus (temporal extent), the associated temporal frequency content, area (spatial/angular extent) and local contrast of the original, error-free or reference, video. Likewise perceptual sensitivity to errors in video as a function of temporal frequency varies with local average luminance, temporal extent, area, associated spatial frequency content and local contrast of the original video.
Existing spatio-temporal filtering in human vision system (HVS) models is generally designed with a fixed response calibrated to one particular set of parameters i.e., spatial and temporal responses are designed to adhere to the HVS sensitivities at one particular average luminance level. Also most HVS models are based on methods of predicting the threshold of just noticeable differences (JND), such as contrast detection and discrimination thresholds. Since the HVS model component are based on mimicking behavior at threshold, behavior above threshold, i.e., at supra-threshold, is not guaranteed.
For example J. Lubin in “A Visual Discrimination Model for Imaging System Design and Evaluation”, Vision Models for Target Detection and Recognition, ed. Eli Peli, World Scientific Publishing, River Edge, N.J. 1995, pp. 245-283, proposed a model that has a fixed spatial filter designed to match human vision system response at only one luminance level and one duration or temporal extent. There is no mechanism to create temporal frequency roll-off at spatial frequencies of peak sensitivity, so this does not match data from human vision experiments. The model proposed by Lubin has no provision for moving peaks in the spatial or temporal frequency response as a function of luminance or other parameters. Also Lubin's model is based on units of threshold, or “just noticeable difference (JND)”. The only mechanism which modifies response beyond the fixed spatial and temporal filters is a masking mechanism which is a modified version of J. Foley's model for predicting contrast discrimination (“Contrast Masking in Human Vision”, Journal of the Optical Society of America, Vol. 70, No. 12 pp.1458-1471, December 1980). However M. Canon showed (“Perceived Contrast in the Fovea and Periphery”, Journal of the Optical Society of America, Vol. 2, No. 10 pp.1760-1768, 1985) that Foley's model is grossly in error when used to predict perceptual contrast at even moderate contrast levels above threshold. In addition Lubin's proposed model does not account for the non-linearities such as those causing the double spatial frequency and phantom pulse visual illusions. Many other human vision based models, such as that of S. Daly, “The Visible Differences Predictor: an Algorithm for the Assessment of Image Fidelity”, Digital Images and Human Vision, ed. Andrew B. Watson, MIT Press, Cambridge, Mass. 1993, pp. 162-206, do not account for temporal aspects at all.
What is desired is a filter that is designed to match human vision system response over a range of perceptual parameters.