The present invention relates to video picture quality assessment, and more particularly to temporal processing for realtime human vision system behavior modeling.
In general the stimuli that have been used to explore the human vision system (HVS) may be described using the following parameters: temporal frequency; spatial frequency; average luminance or lateral masking effects; angular extent or size of target image on retina; eccentricity or angular distance from the center of vision (fovea); equivalent eye motion; rotational orientation; and surround. Furthermore many of the stimuli may be classified into one of the following categories: standing waves; traveling waves; temporal pulses and steps; combinations of the above (rotational, both of target image pattern and masker); and “natural” scene/sequences. The responses to these stimuli have been parameterized as: threshold of perception as in discrimination; suprathreshold temporal contrast perception; perceived spatial frequency including frequency aliasing/doubling; perceived temporal frequency including flickering, etc.; perceived velocity (speed and direction); perceived phantom signals (noise, residual images, extra/missing pulses, etc.); perceived image quality; and neural response (voltage waveforms, etc.).
The problem is to create a method for reproducing and predicting human responses given the corresponding set of stimuli. The ultimate goal is to predict image quality. It is assumed that to achieve this, at a minimum the threshold and suprathreshold responses should be mimicked as closely as possible. In addition the prediction of visual illusions, such as spatial frequency doubling or those related to seeing additional (phantom) pulses, etc. is desired, but this is considered to be of secondary importance. Finally the predictions should be consistent with neural and other intermediate responses.
The HVS models in the literature either do not account for temporal response, do not take into account fundamental aspects (such as the bimodal spatio-temporal threshold surface for standing waves, masking, spatial frequency doubling, etc.), and/or are too computationally complex or inefficient for most practical applications.
U.S. Pat. No. 6,678,424, issued Jan. 13, 2004 to the present inventor and entitled “Real Time Human Vision System Behavioral Modeling”, provides an HVS behavioral modeling algorithm that is spatial in nature and is simple enough to be performed in a realtime video environment. Reference and test image signals are processed in separate channels. Each signal is spatially lowpass filtered, segmented into correponding regions, and then has the region means subtracted from the filtered signals. Then after injection of noise the two processed image signals are subtracted from each other and per segment variances are determined from which a video picture quality metric is determined. However this modeling does not consider temporal effects.
Neural responses generally have fast attack and slow decay, and there is evidence that some retinal ganglion cells respond to positive temporal pulses, some to negative, and some to both. In each case if the attack is faster than decay, temporal frequency dependent rectification occurs. Above a critical temporal frequency at which rectification becomes dominant, spatial frequency doubling takes place. This critical temporal frequency happens to correspond to a secondary peak in the spatio-temporal response, where the spatial frequency sensitivity (and associated contrast sensitivity versus frequency—CSF) curve is roughly translated down an octave from that at 0 Hertz.
What is desired is an algorithm for realtime HVS behavior modeling that improves the efficiency and accuracy for predicting the temporal response of the HVS.