Designers of imaging systems often assess the performance of their designs in terms of physical parameters such as contrast, resolution and bit-rate efficiency in compression/decompression (codec) processes. While these parameters can be easily measured, they may not be accurate gauges for evaluating performance. The reason is that end users of imaging systems are generally more concerned with the subjective visual performance such as the visibility of artifacts or distortions and in some cases, the enhancement of these image features which may reveal information such as the existence of a tumor in an image, e.g., a MRI (Magnetic Resonance Imaging) image or a CAT (Computer-Assisted Tomography) scan image.
Over the years, various human visual performance methods (perceptual metric generator or visual discrimination measure (VDM)) have been used to improve imaging system design. These visual discrimination measures can be broadly classified as "spatial" or "spatiotemporal". Examples of spatial visual discrimination measures include the Carlson and Cohen generator and the square root integral (SQRI) generator. Examples of a spatiotemporal visual discrimination measures (VDM) are disclosed in U.S. patent application Ser. No. 08/668,015, filed Jun. 17, 1996 and "Method And Apparatus For Assessing The Visibility Of Differences Between Two Image Sequences" filed on Mar. 28, 1997 with Ser. No. 08/829,516, now U.S. Pat. No. 5,694,491.
The spatiotemporal VDMs disclosed in the above-referenced patent applications receive a pair of image sequences as input, and then produce an estimate of the discriminability between the sequences, for each local region in space and time. In the Sarnoff VDM, this set of discriminability estimates (fidelity metric, perceptual metric or quality metric) is generated in units of Just Noticeable Differences (JNDs), as a sequence of maps, wherein each pixel value in each frame of the JND Map Sequence is a discriminability estimate for corresponding spaciotemporal regions of the two input sequences.
For some applications, such as quality metering of a digital video channel, this large volume of output data is more useful if it can be condensed into a single number or a small set of numbers for each pair of input image sequences. Current approaches to this condensation process involve the computation of simple image statistics such as mean or maximum, in an attempt to correlate a single JND-based number to a number derived from subjective quality experiments in which human observers are asked to rate the quality of each sequence with a single number or adjectival rating. However, these statistics-based approaches do not capture some effects, for example, in which high JNDs within specific objects in the center of the frame (e.g., a face) have a greater impact on subjective image quality ratings than high JNDs in less significant objects, in less significant locations.
Furthermore, since the content of the image sequences may change rapidly in some applications, it would be imprecise to use a rigid rule for evaluating subjective image quality ratings.
Therefore, a need exists in the art for training an apparatus to learn and use fidelity metric as a control mechanism and to quickly and accurately process the large quantities of fidelity metrics from a VDM to a manageable subjective image quality ratings, e.g., a single numbered subjective quality ratings.