JPEG2000 is a sophisticated image compression technique that offers a rich set of features to meet the needs of different applications. These features can be achieved because of the properties of the various components that form a JPEG2000 compression system. A brief description of these components is provided in the following, and a more complete description can be found in the paper by M. Rabbani and R. Joshi entitled “An overview of the JPEG 2000 still image compression standard,” Signal Processing: Image Communication, Vol. 17, pp. 3-48, 2002.
FIG. 1 illustrates the basic components in a JPEG2000 compression system. The original image data is sent to a pre-processor 10, where several modifications to the original image data may take place. One possible modification is a color transformation known as the irreversible color transform (ICT). This transform is designed to convert red-green-blue (RGB) image data into a luminance (Y) and two chrominance (Cb, Cr) components for improved compression performance. The pre-processed data is then transformed using a discrete wavelet transform 12, which converts the spatial image data into wavelet coefficients that correspond to a plurality of spatial frequency subbands. The wavelet coefficients are quantized with a uniform quantizer 14 with a deadzone. Quantization is a many-to-one mapping that reduces the amount of data that is used to represent the wavelet coefficients, at the expense of errors in the image that is reconstructed from the quantized wavelet coefficients. The degree of quantization for each frequency subband b is determined by a quantizer step size (denoted as Q(b) in FIG. 1) that is associated with the subband. It is common in JPEG2000 systems to specify a single base step size for quantization, and then derive a set of quantizer step sizes that correspond to the various frequency subbands by using properties of the wavelet transform (i.e., from the band gain factors for each of the frequency subbands). Alternatively, it is possible to specify an explicit set of quantizer step sizes for the frequency subbands for more precise control of the quantization.
The quantized wavelet coefficients are then partitioned into small codeblocks, and the quantized coefficients from each codeblock are encoded as bitplanes using an adaptive binary arithmetic encoder 16. This step is referred to as Tier-1 coding. Finally, the compressed codestream is produced by arranging the encoded codeblock data in various ways using a bitstream organizer 18. This step is referred to as Tier-2 coding. In Tier-2 coding, it is possible to discard encoded data corresponding to one or more bitplanes of the quantized wavelet coefficients, which will reduce the amount of compressed data at the cost of increased error in the decompressed image. The discarding of a bitplane from a given wavelet coefficient is equivalent to coarser quantization of that coefficient. If the quantizer step size that is initially applied to a coefficient is denoted as Δ, the effective quantizer step size after discarding k of the least significant bit planes is 2k·Δ. This relationship between the discarding of coefficient bit planes and the resulting degree of quantization is described by Joshi et al. in U.S. Pat. No. 6,650,782.
The flexibility that is achieved in JPEG2000 comes at the price of high computational complexity, and it is desirable to reduce this complexity while still providing a desired level of image quality. This high computational complexity is particularly a problem in motion imaging applications, where there can be a large number of frames in a sequence that must be encoded with high throughput for the encoding system so as to be economical in time and computing resources. A significant part of the computational load in a JPEG2000 encoder occurs during the Tier-1 coding, in which quantized wavelet coefficients are encoded using an arithmetic coder to form compressed data. After Tier-1 coding, it is common to discard a significant amount of the compressed data during Tier-2 coding to achieve a pre-specified size for the compressed codestream, or to achieve a pre-specified level of distortion in the image that is reconstructed during decompression. This process of varying the amount of compressed data is known as rate control.
One common rate control method for JPEG2000 is post-compression rate-distortion optimization (PCRD-opt). Some variation of PCRD-opt is found in most JPEG2000 software and hardware encoders, and a complete description can be found in the book entitled JPEG2000 Image Compression Fundamentals, Standards and Practice, D. S. Taubman and M. W. Marcellin, pp. 339-348, Kluwer Academic Publishers, Boston, Mass., 2002. A typical goal in applying PCRD-opt is to achieve a fixed size for the compressed codestream regardless of the image content. Compression systems that produce a fixed or constant compressed size are known as constant bit rate (CBR). In a typical JPEG2000 implementation using PCRD-opt, the wavelet coefficients are finely quantized, which produces a large amount of compressed data during Tier-1 coding. A PCRD-opt algorithm determines which data should be discarded to meet the pre-specified size by analyzing the impact of discarding the data on the compressed codestream size and the resulting distortion that is incurred in the decompressed image. The goal is to minimize the overall distortion subject to achieving the desired final size of the compressed data. The advantage of using finely quantized wavelet coefficients with a PCRD-opt algorithm is that is possible to precisely control the amount of compressed data so that the codestream size is as close as possible to the pre-specified limit without exceeding the limit. However, this approach is also time-consuming because the formation of the compressed data from the finely quantized coefficients in Tier-1 coding requires significant computations, yet much of the compressed data is discarded in the subsequent rate control process. This is the problem of overcoding. Note also that the evaluation of the excess compressed data also increases the processing time during Tier-2 coding.
Although a PCRD-opt algorithm is typically used to minimize the overall distortion subject to a constraint on the maximum compressed codestream size, it is also possible to use variations of a PCRD-opt algorithm to minimize the compressed codestream size subject to a constraint on the overall distortion. Such a system would produce an approximately constant distortion and the compressed codestream size would fluctuate with the image content. Compression systems that attempt to produce constant quality by allowing the compressed size to vary with the image content are known as variable bit rate (VBR). VBR systems are generally preferred over CBR systems because VBR systems typically produce a smaller average compressed size than CBR for comparable image quality over an ensemble of images that vary in content. However, from a computational point of view, VBR systems still suffer from the problem of overcoding if the transform coefficients are finely quantized prior to the arithmetic coder in Tier-1 Coding.
One method for minimizing the amount of overcoding in motion image sequences is taught by Becker et al. (US Application 2005/0100229). However, this method requires accurate estimation of certain parameters and requires additional logic as a replacement for, or as an extension of, PCRD-optimization. Most JPEG2000 encoders would need modifications to practice the method taught by Becker et al., and the method is also not appropriate for compressing individual image frames. More importantly, this method may fail when image content changes rapidly, such as occurs at scene boundaries in motion sequences.
Another limitation in current systems, whether the system is CBR or VBR, is the distortion metric that is commonly used to quantify image quality is mean-squared error (MSE). MSE is a measure of the numerical difference between pixel values of the original image and the pixel values of the decompressed image. A related metric to MSE is peak signal-to-noise ratio (PSNR), which is specified using deciBell (dB) units. These metrics are mathematically convenient, but it is well known that they do not always correlate with perceived quality. As a result, even if a specific MSE or PSNR is obtained, there is no assurance of a given level of perceived image quality. For many applications, the goal may be visually lossless quality, i.e., the decompressed image appears to be identical to the original image when both are viewed by a human observer under specified viewing conditions. Knowledge of the MSE or PSNR for a compressed image does not provide any information about the viewing conditions under which visually lossless quality will be obtained.
For example, given a compressed image with a PSNR of 40 dB, it is impossible to know a priori what viewing conditions will yield visually lossless quality for the image. It may be that the perceived quality is not visually lossless under specified viewing conditions, or it may be that visually lossless quality is obtained but more compressed data was used than was necessary to achieve that quality. The first scenario results in lower perceived image quality than desired, while the second scenario results in a larger compressed codestream than is necessary for visually lossless quality. Thus, it is not sufficient to rely strictly upon MSE or PSNR as a quality metric. This is particularly the case for VBR systems where one wants to produce a known level of perceived image quality while also compressing the image to the minimum amount of data that is needed for the desired quality.
Another issue that arises in some JPEG2000 applications is the need to satisfy a maximum compressed codestream size for each image. One such application is in Digital Cinema (DCinema) compression. The Digital Cinema Initiative (DCI), a consortium of the major US studios, has recently developed a set of specifications to ensure a certain level of interoperability among DCinema equipment manufacturers. One specification calls for a constraint on the maximum compressed codestream size for each image, with additional restrictions on the maximum amount of compressed data that can be used to represent each color component in an image. The constraint on the maximum compressed codestream size is 1,302,083 bytes per frame for each image frame in an image sequence having 24 frames per second. This constraint on the maximum codestream size can be equivalently specified as an instantaneous maximum compressed data rate of 250 Megabits/sec (Mbps). The amount of compressed data for each color component in an image frame can be a maximum of 1,041,066 bytes, which is equivalent to an instantaneous compressed data rate of 200 Mbps for an image sequence having 24 frames per second. These specifications are driven mainly by the capabilities of current JPEG2000 decompression systems. While CBR systems could always meet these data constraints through a rate control algorithm such as PCRD-opt, it is not a preferred solution because of the typically larger average codestream sizes for CBR as compared to VBR systems. Thus, it is desirable to have a VBR system that provides constant perceived quality to the greatest extent possible while still meeting any constraints on the maximum amount of compressed data for each image.
Beyond these instantaneous rate constraints for each image frame or frame component in an image sequence, it may also be desirable to constrain the total size of the compressed codestream for the entire sequence. In this way, one can be assured that a movie of a given duration will fit onto a storage media with a certain capacity, e.g., a two-hour movie compressed onto a 160 gigabyte (Gbyte) hard drive. The total size of the compressed codestream is also important if compressed movie content is to be transmitted over a communications network instead of being stored on a physical media for transport from a content provider to movie theaters. A smaller total size for the compressed codestream means that the data can be sent more quickly and more cost efficiently to the theaters. Regardless of whether a physical storage media or a communications network is used, the preferred compression solution is VBR encoding because it will produce higher overall image quality than CBR encoding for the same total codestream size.
A constraint on the total size of the compressed codestream is equivalent to a constraint on the average data rate over the entire sequence, and the following discussion will use average data rate, instead of total filesize, for convenience. For example, if one has a movie that is 2 hours in length and wishes to fit the compressed movie onto a 160 Gbyte hard drive, the average compressed data rate must be less than 182 Mbps. Because of the variable bit rate encoding, some frames will use a higher instantaneous compressed data rate (up to the maximum of 250 Mbps in the case of DCI-compliant codestreams), while other frames will use an instantaneous data rate that is substantially less than 182 Mbps. Despite the variation in instantaneous compressed data rate, the goal is to have constant image quality for each frame.
To achieve a desired average data rate with a VBR system while maintaining constant image quality, it is necessary to perform some type of multi-pass encoding. This is because the complexity of the image content is not known a priori, and hence the compressed data rate for a given set of compression parameters is also not known a priori. Workflow efficiency would be greatly reduced if it were necessary to compress the entire content repeatedly with different compression parameters until the desired average data rate was achieved, so a more efficient approach is required. One could make use of a PCRD-opt algorithm to trim the compressed frames, but this is also time consuming in that one must first gather rate-distortion statistics for all frames in a sequence, and then determine which data to discard to meet the average rate requirement while also maintaining constant image quality as quantified by MSE or PSNR.
One approach to improving efficiency is to process only a subset of the data in the image sequence using an initial set of compression parameters and then use the resulting average rate as an estimate of the average rate for all frames. If the estimated average rate is more than the desired rate, the compression parameters are modified and the subset of image data is compressed again. Once the constraint on the desired average data rate is met with the subset of the image data, all of the image frames can be compressed with the determined compression parameters. U.S. Pat. No. 6,356,668 by Honsinger et al. (assigned to Kodak) describes a method for efficient rate control with JPEG compression, using sparsely sampled image regions within a single image to estimate the rate for the entire image. In addition, the patent to Honsinger et al. describes the use of threshold viewing distance as a quality metric in determining the JPEG compression parameters, and its use in constricting a rate-distortion (R-D) curve that can be used in adjusting the compression parameters if additional iterations are required to meet the target data rate.
However, this prior art method by Honsinger et al. has limitations when applied to JPEG2000 VBR compression for image sequence compression in Digital Cinema applications. First, the sparse spatial sampling of images is not efficient with JPEG2000 encoding because of the nature of the wavelet transform (as compared to the DCT that is used in JPEG compression). Second, the method by Honsinger may require a significant number of compression iterations to construct an R-D curve and to achieve the desired average data rate. Each compression iteration requires additional computations, which can lead to an inefficient overall compression process.
Thus, there is the need for a JPEG2000 VBR compression system for image sequences that: (1) minimizes the amount of overcoding so that computational complexity is reduced; (2) produces compressed codestreams that meet constraints on the maximum amount of compressed data both for individual frames and frame components, and for the entire compressed codestream in a computationally efficient manner; and (3) produces constant perceived image quality, while minimizing the amount of compressed data to achieve that level of image quality.