The present invention relates to an apparatus and a concomitant method for optimal coding of images or sequences of images. More particularly, this invention relates to a method and apparatus that selects coding parameters for an image encoder to optimize the overall image fidelity, in accordance with a perceptual metric, while maintaining a specified coding rate.
To achieve interoperability for digital video equipment of different manufacturers, the Moving Pictures Experts Group (MPEG) created the ISO/IEC International Standards 11172 (1994) (generally referred to as MPEG-1) and 13818 (Jan. 20, 1995 draft) (generally referred to as MPEG-2), which are incorporated herein in their entirety by reference. One goal of these standards is to establish a standard decoding strategy with sufficient flexibility to accommodate a plurality of different applications and services such as desktop video publishing, video conferencing, digital storage media and television broadcast.
Although the MPEG standards specify the coding syntax for generating a MPEG compliant bitstream, MPEG does not define a specific algorithm necessary to produce a valid bitstream. As such, many variations are permitted in the values assigned to many of the encoding parameters, thereby supporting a broad range of applications and interoperability. Under the MPEG standards, MPEG encoder designers are accorded great flexibility in developing and implementing their own MPEG specific algorithms in areas such as image pre-processing, motion estimation, coding mode decision, scalability and rate control. This flexibility fosters development and implementation of different MPEG compliant encoding algorithms, thereby resulting in product differentiation in the marketplace. However, a common goal of MPEG encoders is to minimize distortion in the decoded video for a prescribed bit rate.
In the area of coding rate control, the MPEG standards do not define a specific algorithm for controlling the bit rate of an encoder. It is the task of the encoder designer to devise a rate control process for controlling the bit rate such that the decoder input buffer neither overflows nor underflows and for controlling the quantization scale to produce high fidelity video at the output of the decoder. To improve the xe2x80x9clookxe2x80x9d of the decoded image, one might like for the more important regions in the decompressed video to have better fidelity than the less important regions.
For example, in a video scene composed of a person talking in front of a background, it is likely that the talking person is of more interest, and thus, more important to the viewer of the decoded video, than is the background information. Consequently, it would be useful to have the ability xe2x80x9cto steerxe2x80x9d the encoder such that disproportionally more encoded bits are spent to represent the important regions of the scene and disproportionally fewer encoded bits are spent to represent the less important background information. In other words, during compression, one would like to, by varying over time and spatial location, the necessary encoder parameters, control the fidelity of the resulting decompressed image. This is generally referred to as user steerable image compression.
The current approach used in the user steered image compression is an iterative process. The xe2x80x9ccompression engineerxe2x80x9d, e.g., the user of the encoder, specifies a set of values for the relevant encoder parameters, compresses the image and observes the resulting decompressed image then decides where, spatio-temporally, the image looks better or worse than desired. In response, the compression engineer then adjusts the encoder parameters to affect the desired change in the visual fidelity of the different spatio-temporal regions of the decompressed image.
One problem of this approach is that since there is no objective image fidelity metric used in the process, the only way to measure image fidelity and determine whether the desired spatio-temporal distribution of image fidelity has been achieved, is for the compression engineer to actually examine the entire decompressed image. Furthermore, if the information that is compressed is a sequence of images (e.g., video) rather than a single image, the compression engineer must review the entire video sequence. This manual approach to user steered compression is very subjective and therefore inconsistent, time consuming, and fatiguing for the compression engineer. In addition, the process must be repeated xe2x80x9cfrom scratchxe2x80x9d for every image or image sequence that is to be compressed. Furthermore, this approach requires the compression engineer to have significant technical expertise and knowledge of the compression algorithm in order to know which encoder parameters should be adjusted to affect the desired change in the decoded image fidelity. The adjustment of the encoder parameters must often be made in a trial and error fashion.
Therefore, a need exists in the art for an apparatus and a method that dynamically adjusts the image encoding parameters in accordance with a perceptual metric and automatically performs steerable image compression such that an image is optimally encoded with regard to how the human visual system observes the image, i.e., with regard to perceptual image fidelity.
The present invention is a method and apparatus for selecting image encoding parameters in accordance with a perceptual metric derived from analyzing the contents of image being coded. Namely, one or more encoding parameters, e.g., encoder quantization scale, are selected as a result of comparing an original image to a reconstructed image and processing the comparison results using a quantitative perceptual difference metric. This metric represents the xe2x80x9cfidelityxe2x80x9d of the reconstructed image and is used to update the encoding parameters to optimize the coding of the image.
To facilitate steerable image compression, the invention uses a fidelity metric based encoder to generate a fidelity map while encoding an input image. The fidelity map is compared to an importance map that is user defined. If the fidelity and importance maps do not substantially match, the system adjusts the encoding parameters and reencodes the input image. The encoding process generates another fidelity map which is then compared to the importance map to determine the degree to which the two maps match. This iterative process adjusts the encoder parameters until the fidelity and the importance maps match to a substantial degree. At that point, the encoded image has certain regions that are encoded to a high fidelity and certain regions that are encoded to a lower fidelity, as specified by the importance map.