The present invention relates to video quality of service, and more particularly to an improvement in predicting human vision perception and perceptual difference for video quality metrics using masking to take into consideration pupil size variation and other effects.
The problem addressed by the present invention is predicting subjective quality ratings of video that has been processed in such a way that visually detectable impairments may be present. Human vision perceptual models have been shown to be instrumental in the solution to this problem by predicting perceptibility of impairments. See U.S. Pat. No. 6,678,424 entitled “Realtime Human Vision System Behavioral Modeling” and U.S. Patent Application Publication No. 2003/0053698 A1 entitled “Temporal Processing for Realtime Human Vision System Behavior Modeling”, both by the present inventor. These documents describe two merged human vision systems, one for a reference video signal and the other for an impaired video signal, as a perceptual difference model. The model has filter pairs, each having a two-dimensional lowpass filter and an implicit high pass filter, and the outputs are differenced to produce an impaired image map which is further processed to produce a measure for picture quality of the impaired video signal relative to the reference video signal. The filters are primarily responsible for variations in human vision response over spatial and temporal frequencies—spatiotemporal response. Such filters may be adaptive, as described in pending U.S. Patent Application Publication Nos. 2002/0186894 A1 entitled “Adaptive Spatio-Temporal Filter for Human Vision System Models” and 2003/0031281 A1 entitled “Variable Sample Rate Recursive Digital Filter”, both by the present inventor. The adaptive filters have two paths, a center path and a surround path, for processing an input video signal, each path having a temporal and spatial component. A controller determines adaptively from the input video signal or one of the path outputs the coefficients for the filter components. The difference between the path outputs is the adaptive filter output for the input video signal.
The filters take into account most of the effects of lens and pupil related optical modulation transfer function (MTF), lateral inhibition, aggregate temporal response of photoreceptors, neurons, etc., and adaptation of pupil, neurons, including dark adaptation, etc. Although this model does change spatial and temporal frequency response due to pupil changes, it does not take into consideration the effects of noise masking changes due to pupil changes in response to changes in luminance, and does not consider other adaptation due to similarity (correlation) between test and reference signals, luminance sensitivity including the equivalent of luminance portion of contrast gain control or spatiotemporal effects of the variance or AC portion of contrast gain control.
What is desired is an improved method of masking for predicting human vision perception and perceptual difference in order to produce a more accurate prediction of subjective quality ratings of video.