The present invention relates to picture quality measurements, and more particularly to an efficient predictor of subjective video quality rating measures.
Video is recorded and transmitted via methods that create errors, such as video compression/decompression methods. It is of interest to produce objective measures to predict a subjective rating of these errors. Subjective ratings have been developed that involve collecting data from experiments in which persons are asked to rate both the original and corresponding impaired video. Methods have been developed that indicate, to some degree, the subjective rating of the impairment in the impaired video. So far methods have proven to be too complex and compute intensive for accurate real-time applications.
Existing methods either include computationally intensive human vision system (HVS) models, such as those described by Lubin or Daly, or include ANSI/IRT measurements that are generally faster but, given a sufficiently varied set of video content, do not correlate as well with subjective ratings as do the HVS models. The HVS models generally include one or more stages to render each of the following:
Contrast Sensitivity Function (CSF) to model the sensitivity to different spatial frequencies at a given average luminance level.
Masking, which the literature commonly describes in terms of Contrast Discrimination (CD), that includes both xe2x80x9cself-maskingxe2x80x9d and masking across frequency bands and orientations.
The CSF portion of the HSV model is accomplished at each pixel by calculating the local ratio of high frequency energy to low (DC) frequency energy, or the equivalent, coupled with a linear filter or pyramid decomposition with appropriate frequency weighting. This provides two different filters, a highpass filter and a lowpass filter, and a division for each pixel. Where pyramid decomposition and horizontal and vertical channels are used, these operations are multiplied by the number of orientations and percentage of pixels added from the pyramid.
A large portion of the processing time required in HVS models is due to the image decomposition into a multiplicity of images, each corresponding to a set of parameter coordinates. For example the popular Lubin and Daly image decomposition parameterization is based on filters of various orientations, sequency (spatial frequency) bands and polarities. So two orientations (horizontal, vertical), two spatial frequency bands and two polarities require 2*2*2=8 images per processing stage for the reference (original) image and likewise for the impaired image.
Additionally a great deal of processing is involved in masking. However to date there is arguably insufficient experimental data acquired to fully predict masking as would be required to verify the models currently in use or proposed.
What is desired is a method and apparatus for more efficient (using less processing time and/or less expensive hardware) prediction of subjective video quality rating measures while maintaining desired accuracy.
Accordingly the present invention provides an efficient predictor of subjective video quality rating measures that processes a reference video and corresponding impaired video directly in parallel channels based on a human vision model. The outputs from the two channels that represent maps of the respective videos are subtracted from each other to obtain an absolute difference map. A global masking scalar may be obtained from the reference video map as a measure of the busyness or complexity of the image. The global masking scalar is used to modify the difference map, which modified difference map is converted to standard picture quality analysis measurement units.
The objects, advantages and other novel features of the present invention are apparent from the following detailed description when read in conjunction with the appended claims and attached drawing.