a. Field of the Invention
The present invention relates to an apparatus and method for normalisation of an image signal, particularly for use in perceptual video and still image quality measurement when the quality measurement is achieved by making a comparison between a reference signal and a decoded signal which may have been degraded during encoding, transmission and decoding processes. In the transmission of video, the different processes involved in encoding, transmission and decoding of a video signal usually introduce a gain, offset and gamma modification. Video coding may also introduce minor changes in the colour components of the source signal. Similar modifications can occur during the encoding, storage and decoding of still images. The main cause of these modifications comes from the different colour space representations used internally by different steps in an encoding, transmission/storage, decoding chain.
Video frames and still images are typically stored in one of two colour space formats: YUV and RGB. Both formats decompose the picture into three components such that each pixel is represented by three component values. In YUV format the three components are a single luminance value (Y) and two chrominance values (U and V). In RGB format the three components are Red (R), Green (G) and Blue (B). Conversion between the two formats is based on a simple first-order linear mapping. The description of the present invention will focus on its application to YUV format video frames and still images; however, a description of its application to RGB format video frames and still images is provided at the end of the detailed description.
Modifications to the brightness of a video frame or still image arise if an offset is added to or subtracted from the luminance values of pixels. Modifications to the contrast occur if the luminance value of the pixels is scaled. Changes in colour space typically arise from the linear transformations between YUV and RGB representation.
Cathode ray tube (CRT) display devices are subject to a gamma or power-law relationship between the input (electrical) signal and the light (luminance) emitted at the surface of the display such that the intensity light i produced by the CRT display is proportional to the signal input voltage v raised to a power exponent called gamma:i∝vγwhere γ is the gamma value of the display. Digital video and still cameras produce signals in gamma-compensated form so that when the image is viewed on the display device the overall system will be linear. The gamma compensation processed can be expressed as:
  i  ∝      v          1      γ      where γ is the gamma value of the expected display. Most displays have a gamma value of approximately 2.2; however, this is not guaranteed and values between 1.8 and 2.5 are not uncommon. Note that liquid crystal displays (LCD) don't have the implicit power-law relationship between the electrical signal and pixel brightness but a transfer function is generally built-in to emulate the CRT gamma relationship.
Many systems include intermediate gamma correction stages that attempt to make the end-to-end system linear. For example, some computer systems use a colour management engine to convert to a different gamma value before storing the pixel values in video memory. This means that reference and decoded signals may have been exposed to multiple, different gamma modification steps. Moreover, in systems that store images in RGB format, different gamma corrections may be applied for the different colour components of the image.
A video or still image quality measurement system using a comparison between an original (reference) and a received (decoded) signals (termed full-reference measurement) needs to allow for the fact that minor changes in brightness, contrast, colour space and gamma do not influence perceived video or image quality, and therefore needs to correct such changes in order to make an accurate perceptual quality prediction. In a case where the received signal can only be captured at a point where display settings such as brightness, contrast and gamma correction have modified the received signal, correction of the above mentioned factors also allows measurement of the video or still image quality independently from the receiver's display settings.
The present invention addresses the problem of jointly normalising the effects of brightness, contrast, colour space and gamma correction errors between a pair of matched reference and a decoded video frames or still images. This is a non-trivial task because while brightness, contrast, colour space are essentially linear transformations, gamma is a power-law transformation. The present invention solves this problem by using a third-order polynomial mapping to approximate the combined effects of brightness, contrast, colour space and gamma correction errors. The coefficients of the mapping are optimised to normalise the Y (and optionally UV) components of the decoded video frame or still image relative to the corresponding reference video frame or still image. This optimisation step is performed by finding the set of polynomial coefficients that minimise a measure of the error between the two video frames or still images.
In the following description the invention is described in the context of its application to the measurement of video quality. However, as has already been mentioned, the invention has equal application in still image quality measurement systems, and the term frame shall be understood to include both video frames and still images.
b. Related Art
Computation of gain and offset errors between two video signals is proposed in M. H. Pinson and S. Wolf “A New Standardized Method for Objectively Measuring Video Quality”, IEEE Transactions on Broadcasting, vol. 50(3), pp. 312-322, September 2004. In this method, the original and processed frames are divided into small, square sub-regions, or blocks. The mean over space of the Y, U and V samples for each corresponding reference and processed sub-region are computed to form spatially sub-sampled images. A first order linear fit is used to compute the relative gain and offset between the sub-sampled original and processed frames. This linear fit is applied independently to each of the three channels: Y, U, and V. Non-linear corrections between the signals are not handled and the method makes the assumption that the different colour components (Y, U, and V) each have an independent gain and level offset. A reduced-reference approach based on the same method was also proposed by M. H. Pinson and S. Wolf “Reduced Reference Video Calibration Algorithms” www.its.bldrdoc.gov/pub/ntia-rpt/08-433b/. In the reduced-reference version, a pre-filtering step is applied to eliminate those blocks that contain a wide spread of pixel values but the principles of the method remain identical to those proposed in the original article. The method referred above for correcting gain and offset is then used in patent application no US2007088516A1 as part of a reduced-reference video quality assessment method.
None of the methods proposed in the prior art corrects problems due to gamma modification or errors due to colour space conversion when different colour space representations are used by different elements of the transmission chain.