The present invention relates to video picture quality assessment, and more particularly to realtime human vision system behavioral modeling for producing objective measures to predict a subjective rating of errors in a video image signal that is recorded and transmitted via methods that are lossy, such as video compression.
Existing methods for using human vision system models for predicting observer subjective reactions to errors introduced into a video image signal subjected to lossy processes, such as video compression, include computationally expensive human vision system (HVS) models, such as those described by J. Lubin, xe2x80x9cA Visual Discrimination Model for Imaging System Design and Evaluationxe2x80x9d, Vision Models for Target Detection and Recognition, World Scientific Publishing, River Edge, N.J. 1995, pp. 245-283, or by S. Daly xe2x80x9cThe Visible Differences Predictor: An Algorithm for the Assessment of Image Fidelityxe2x80x9d, Digital Images and Human Vision, MIT Press, Cambridge, Mass. 1993, pp. 162-206. Measures used to predict subjective impairment ratings that do not use human vision models include ANSI/IRT measurements (see xe2x80x9cDigital Transport of One-Way Signals-Parameters for Objective Performance Assistancexe2x80x9d, ANSI T1.801.03-yyy) that are generally faster but, given a sufficiently varied set of video image content, do not correlate as well with subjective ratings as do the methods that include HVS models.
Most HVS models are based on methods of predicting the threshold of noticeable differences, commonly referred to as Just Noticeable Differences (JND), such as contrast detection and discrimination thresholds. Since the model components are based on mimicking behavior at threshold, behavior above threshold, i.e., at suprathreshold, is not guaranteed. These HVS models generally include one or more stages to account for one or more of the experimentally determined behaviors near incremental contrast detection and discrimination threshold as effected by the following parameters:
mean luminance
angular extent or size of target image on retina
orientation (rotational, both of target image pattern and masker)
spatial frequency (both of target image pattern and masker)
temporal frequency (both of target image pattern and masker)
surround (or lateral masking effects)
eccentricity (or angular distance from the center of vision/fovea)
What follows is a brief summary of how one or more of the effects of these seven parameters have been accounted for in HVS models.
First it is worth noting the approach to the image processing flow structure in prior art. A large portion of the processing time required in HVS models is due to two common implementation stages:
filter bank (image decomposition such as Gaussian pyramids)
contrast gain control (contrast masking non-linearity)
Filter banks are popular for image decomposition into neural images or channels with maximum response at various orientations, spatial frequency bands, polarities, etc. For a practical implementation a minimal decomposition of two orientations (horizontal, vertical), four spatial frequency bands and two polarities requires 2*4*2=16 images per processing stage for the reference image signal, and likewise for the impaired video image signal.
For the typical HVS model response sensitivity as a function of spatial frequency has been accounted for in what has been called the contrast sensitivity function. The contrast sensitivity portion of the model has been accomplished by:
Calculating the contrast at each pixel of each filter bank channel, corresponding to a unique combination of spatial frequency subband and rotational orientations, as the ratio of high frequency energy to low (DC) frequency energy, or the equivalent.
Scaling the contrast values depending on the sub-band and rotational orientations.
Calculating contrast requires two different filters, high pass and low pass, and a division for each pixel of each channel. Even with this complex and expensive algorithm stage, variation in spatial frequency sensitivity function of local average luminance and angular extent of segment or self-similar regions of the image is not taken into account. The xe2x80x9clinear rangexe2x80x9d is not exhibited in these models. At frequencies where sensitivity is generally the greatest, between one and four cycles per degree, the contrast sensitivity increases roughly proportional to the square root of the average luminance and likewise for angular extent. Thus, while the prior art includes quite complex and computationally expensive methods, by ignoring the effects of average luminance and angular extent threshold predictions may be in error by greater than an order of magnitude. Though models for part of the HVS have been proposed to account for the effects of average luminance and angular extent, they apparently have not been adopted into subsequent full HVS models, ostensibly due to the further added complexity.
The contrast gain control portion of the model is generally based on the work of J. Foley, such as his xe2x80x9cHuman Luminance Pattern-Vision Mechanisms: Masking Experiments Require a New Modelxe2x80x9d, Journal of the Optical Society of America, Vol. 11, No. 6 June 1994, pp. 1710-1719, that requires a minimum of
Calculation of the sum of energy (square) of respective pixels of the scaled contrast images over all channels. Lower resolution channels are up-sampled in order to be summed with higher resolution channels. This channel to channel conversion increases the effective throughput at this stage and further complicates implementation.
One addition, two non-integer exponentiations and one division operation per pixel per channel. M. Cannon, xe2x80x9cA Multiple Spatial Filter Model for Suprathreshold Contrast Perceptionxe2x80x9d, Vision Models for Target Detection and Recognition, World Scientific Publishing, River Edge, N.J. 1995, pp. 88-117, proposed a model that extends to the suprathreshold region with a substantial increase in complexity. However it too apparently has not been adopted into subsequent full HVS models, ostensibly due to the further added complexity.
Temporal effects on spatial frequency sensitivity in these models mostly either have been absent, have tended to only include inhibitory effects, or have been relatively complex.
Finally the effects of orientation and surround are only represented to the extent that the orthogonal filters and cross-pyramid level maskings are capable, generally not well matched with HVS experimental data.
A current picture quality analyzer, the PQA-200 Analyzer manufactured by Tektronix, Inc. of Beaverton, Oreg., USA, is described in U.S. Pat. No. 5,818,520. This is a non-realtime system based on the JNDMetrix(copyright) algorithm of Sarnoff Corporation of Princeton, N.J., USA where a reference image signal is compared with a corresponding impaired video image signal to obtain differences which are processed according to an HVS model. In order to perform the assessment, the system under test is essentially taken out of service until the test is complete.
What is desired is a realtime HSV behavioral modeling system for video picture quality assessment that is simple enough to be performed in a realtime video environment.
Accordingly the present invention provides realtime human vision system behavioral modeling for performing picture quality analysis of video systems in a realtime video environment. A reference image signal and a test image signal derived from the reference image signal are processed in separate channels. The image signals are converted to luminance image signals and filtered by a two-dimensional low-pass filter to produce processed image signals. The processed image signals are segmented into regions having similar statistics, and the segment or region means are subtracted from the pixels of the processed image signals to produce segmented processed image signals that have been implicitly high pass filtered. Noise is injected into the segmented processed image signals, and variances are calculated for the reference segmented processed image signals and for the differences between the reference and test segmented processed image signals. The variance for the difference segmented processed image signal is normalized by the variance for the reference segmented processed image signal, and the Nth root of the result is determined as a measure of visible impairment of the test image signal. The measure of visible impairment may be converted into appropriate units, such as JND, MOS, etc.