JPEG2000 is a sophisticated image compression technique that offers a rich set of features to meet the needs of different applications. These features can be achieved, because of the properties of the various components that form a JPEG2000 compression system. A brief description of these components is provided in the following, and a more complete description can be found in the paper by M. Rabbani and R. Joshi entitled, “An overview of the JPEG 2000 still image compression standard,” Signal Processing: Image Communication, Vol. 17, pp. 3-48, 2002.
FIG. 1 illustrates the basic components in a JPEG2000 compression system. The original image data is initially sent to a pre-processor 10, where several modifications to the original image data may take place. One possible modification is a color transformation known as the irreversible color transform (ICT). This transform is designed to convert red-green-blue (RGB) image data into a single luminance (Y) and two chrominance (Cb, Cr) components for improved compression performance. The pre-processed data is then transformed using a discrete wavelet transform 12, which converts the spatial image data into wavelet coefficients that correspond to a plurality of spatial frequency subbands. The wavelet coefficients are quantized with a uniform quantizer 14 that includes a deadzone. Quantization is a many-to-one mapping that reduces the amount of data that is used to represent the wavelet coefficients, at the expense of errors in the image that is reconstructed from the quantized wavelet coefficients. The degree of quantization for each frequency subband is determined by a quantizer step size (denoted as Q(f) in FIG. 1) that is associated with the subband. It is common in JPEG2000 systems to specify a single base step size for quantization, and then derive a set of quantizer step sizes that correspond to the various frequency subbands by using properties of the wavelet transform (i.e., from the band gain factors for each of the frequency subbands). Alternatively, it is possible to specify an explicit set of quantizer step sizes for the frequency subbands for more precise control of the quantization.
The quantized wavelet coefficients are then partitioned into small codeblocks, and the quantized coefficients from each codeblock are encoded as bitplanes using an adaptive binary arithmetic encoder 16. This step is referred to as Tier-1 coding. Finally, the compressed codestream is produced by arranging the encoded codeblock data in various ways using a bitstream organizer 18. This step is referred to as Tier-2 coding. In Tier-2 coding, it is possible to discard encoded data corresponding to one or more bitplanes of the quantized wavelet coefficients, which will reduce the amount of compressed data at the cost of increased error in the decompressed image. The discarding of a bitplane from a given wavelet coefficient is equivalent to coarser quantization of that coefficient. If the quantizer step size that is initially applied to a coefficient is denoted as Δ, the effective quantizer step size after discarding k of the least significant bit planes is 2k·Δ. This relationship between the discarding of coefficient bit planes and the resulting degree of quantization is described by Joshi et al. in U.S. Pat. No. 6,650,782.
The flexibility that is achieved in JPEG2000 comes at the price of high computational complexity, and it is desirable to reduce this complexity while still providing a desired level of image quality. This high computational complexity is particularly a problem in motion imaging applications, where there can be a large number of frames in a sequence that must be encoded with high throughput for the encoding system so as to be economical in time and computing resources. A significant part of the computational load in a JPEG2000 encoder occurs during the Tier-1 coding, in which quantized wavelet coefficients are encoded using an arithmetic coder to form compressed data. After Tier-1 coding, it is common to discard a significant amount of the compressed data during Tier-2 coding to achieve a pre-specified size for the compressed codestream, or alternatively to achieve a pre-specified level of distortion in the image that is reconstructed during decompression. This process of varying the amount of compressed data is known as rate control.
One common rate control method for JPEG2000 is post-compression rate-distortion optimization (PCRD-opt). Some variation of PCRD-opt is found in most JPEG2000 software and hardware encoders, and a complete description can be found in the book entitled JPEG2000 Image Compression Fundamentals, Standards and Practice, D. S. Taubman and M. W. Marcellin, pp. 339-348, Kluwer Academic Publishers, Boston, Mass., 2002. A typical goal in applying PCRD-opt is to achieve a fixed size for the compressed codestream, regardless of the image content. Compression systems that produce a constant compressed size are known as constant bit rate (CBR). In a typical JPEG2000 implementation, using PCRD-opt, the wavelet coefficients are finely quantized, which produces a large amount of compressed data during Tier-1 coding. A PCRD-opt algorithm determines which data should be discarded to meet the pre-specified size by analyzing the impact of discarding the data on the compressed codestream size and the resulting distortion that is incurred in the decompressed image. The goal is to minimize the overall distortion subject to achieving the desired final size of the compressed data. The advantage of using finely quantized wavelet coefficients with a PCRD-opt algorithm is that it is possible to precisely control the amount of compressed data, so that the codestream size is as close as possible to the pre-specified limit without exceeding the limit. However, this approach is also time-consuming, because the formation of the compressed data from the finely quantized coefficients in Tier-1 coding requires significant computations, yet much of it is discarded in the subsequent rate control process. This is the problem of overcoding. Note also that the evaluation of the excess compressed data also increases the processing time during Tier-2 coding.
Although a PCRD-opt algorithm is typically used to minimize the overall distortion subject to a constraint on the maximum compressed codestream size, it is also possible to use variations of a PCRD-opt algorithm to minimize the compressed codestream size subject to a constraint on the overall distortion. Such a system produces an approximately constant distortion and the compressed codestream size fluctuates with the image content. Compression systems that attempt to produce constant quality by allowing the compressed size to vary with the image content are known as variable bit rate (VBR). VBR systems are generally preferred over CBR systems, because VBR systems typically produce a smaller average compressed size than CBR for comparable image quality over an ensemble of images that vary in content. However, from a computational point of view, VBR systems still suffer from the problem of overcoding, if the transform coefficients are finely quantized prior to the arithmetic coder in Tier-1 Coding.
One method for minimizing the amount of overcoding is taught by Becker et al. (US Application 2005/0100229) for motion image sequences. However, this method requires accurate estimation of certain parameters and requires additional logic as a replacement for, or as an extension of, PCRD-optimization. Most JPEG2000 encoders would need modifications to practice the method taught by Becker et al., and the method is also not appropriate for compressing individual image frames. More importantly, this method may fail when image content changes rapidly, such as occurs at scene boundaries in motion sequences.
Another limitation in current systems, whether CBR or VBR, is the distortion metric that is commonly used to quantify image quality, i.e., mean-squared error (MSE). MSE is a measure of the numerical difference between pixel values of the original image and the pixel values of the decompressed image. A related metric to MSE is peak signal-to-noise ratio (PSNR), which is specified using deciBell (dB) units. These metrics are mathematically convenient, but it is well known that they do not always correlate with perceived quality. As a result, even if a specific MSE or PSNR is obtained, there is no assurance of a given level of perceived image quality. For many applications, the goal may be visually lossless quality, i.e., the decompressed image appears to be identical to the original image when both are viewed by a human observer under specified viewing conditions. Knowledge of the MSE or PSNR for a compressed image does not provide any information about the viewing conditions under which visually lossless quality will be obtained.
For example, given a compressed image with a PSNR of 40 dB, it is impossible to know a priori what viewing conditions will yield visually lossless quality for the image. It may be that the perceived quality is not visually lossless under specified viewing conditions, or it may be that visually lossless quality is obtained, but more compressed data was used than was necessary to achieve that quality. The first scenario results in lower perceived image quality than desired, while the second scenario results in a larger compressed codestream than is necessary for visually lossless quality. Thus, it is not sufficient to rely strictly upon MSE or PSNR as a quality metric. This is particularly the case for VBR systems, where one wants to produce a known level of perceived image quality, while also compressing the image to the minimum amount of data that is needed for the desired quality.
Another issue that arises in some JPEG2000 applications is the need to satisfy a maximum compressed codestream size for each image. One such application is in Digital Cinema (DCinema) compression. The Digital Cinema Initiative (DCI), a consortium of the major US studios, has recently developed a set of specifications to ensure a certain level of interoperability among DCinema equipment manufacturers. One specification calls for a constraint on the maximum compressed codestream size for each image, with additional restrictions on the maximum amount of compressed data that can be used to represent each color component in an image. The DCI constraint on the maximum compressed codestream size is 1,302,083 bytes per frame for each image frame in an image sequence having 24 frames per second. This constraint on the maximum codestream size can be equivalently specified as a maximum compressed data rate of 250 Megabits/sec (Mbps). The amount of compressed data for each color component in an image frame can be a maximum of 1,041,066 bytes, which is equivalent to a compressed data rate of 200 Mbps for an image sequence having 24 frames per second. These specifications are driven mainly by the capabilities of current JPEG2000 decompression systems. While CBR systems could always meet these data constraints through a rate control algorithm such as PCRD-opt, it is not a preferred solution, because of the typically larger average codestream sizes for CBR as compared to VBR systems. Thus, it is desirable to have a VBR system that provides constant perceived quality to the greatest extent possible, while still meeting any constraints on the maximum amount of compressed data for each image.
Thus, there is the need for a JPEG2000 compression system that: (1) minimizes the amount of overcoding so that computational complexity is reduced for both CBR and VBR systems; (2) produces compressed codestreams that meet constraints on the maximum amount of compressed data for both CBR and VBR systems; and (3) produces constant perceived image quality for VBR systems, while minimizing the amount of compressed data to achieve that level of image quality.