1. Field of the Invention
The invention relates to a method of rate control for scalably coded images, and more specifically to rate control that satisfies specified requirements on subsets of image data. Rate control may achieve high reconstructed image quality within the subsets subject to size requirements on the subsets or small compressed sizes of the subsets subject to quality requirements on the subsets.
2. Description of the Related Art
Over the past few decades, subband or wavelet coding has proven to be an efficient method for compression of images. Of particular importance is the new image compression standard JPEG2000, as described in ITU-T Rec. T.800/ISO/IEC 15444-1:2004 JPEG 2000 Image Coding System, which is hereby incorporated by reference. Similar to other compression standards, the JPEG2000 standard defines the decoder and the associated codestream syntax. The standard does not dictate the operations of the encoder as long as the generated codestream is compliant to the defined codestream syntax and can be decoded by a compliant decoder. This allows flexible encoder design. See “JPEG2000 Image Coding System,” 2004 and D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Practice and Standards, Kluwer Academic Publishers, Boston, 2002, which is hereby incorporated by reference.
FIG. 1 illustrates a representative JPEG2000 encoder 10 used to encode an image 11. Each image is (optionally) divided into non-overlapping rectangular tiles 12. Tiles allow spatial random access and limit the implementation memory requirements. Next, an optional component transform 14 can be used to improve compression efficiency. For example, if an image consists of Red, Green, and Blue color components, applying a color transform can improve compression performance. Each (transformed) color component of a tile is then referred to as a tile-component. Application of a wavelet transform 16 to each tile-component produces a number of transform coefficients, organized into subbands for each tile-component. The transform coefficients for each subband are then partitioned into rectangular blocks referred to as codeblocks 18. Each codeblock is then encoded independently by a codeblock encoder 19.
For a given codeblock, its encoding begins by quantizing its coefficients to obtain quantization indices. These quantization indices can be regarded as an array of signed integers. When reversible wavelet transforms are employed, quantization is not strictly required, as the wavelet coefficients are already integers. This array of signed integers can be represented using a sign array and a magnitude array. The sign array can be considered as a binary array where the value of the array at each point indicates whether the quantization index is positive or negative. The magnitude array can be divided into a series of binary arrays with one bit from each quantization index. The first of these arrays corresponds to the Most Significant Bits (MSBs) of the quantization indices, and the last one corresponds to the Least Significant Bits (LSBs). Each such array is referred to as a bitplane. Each bitplane of a codeblock is then entropy coded using a bitplane coder. The bitplane coder used in JPEG2000 is a context-dependent, binary, arithmetic coder. The bitplane coder makes three passes over each bitplane of a codeblock. These passes are referred to as coding passes. Each bit in the bitplane is encoded in one of these coding passes. The resulting compressed data are referred to as compressed coding passes.
The codeblock encoder also computes the amount of distortion (mean squared error) reduction provided by each compressed coding pass together with the length of the compressed coding pass. With this information, it is possible to define the ratio of the distortion reduction over the length of the compressed coding pass as the distortion-rate slope of the compressed coding pass. The distortion-rate slope of a compressed coding pass is the amount of distortion reduction per byte provided by the compressed coding pass. Thus, a compressed coding pass with a larger distortion-rate slope can be considered to be more important than one with a smaller distortion-rate slope. The codeblock encoder 19 provides the compressed coding passes 20, their lengths 22 and distortion rate slopes 21 to a codestream generation unit 23 that decides which compressed coding passes 20 from each codeblock 18 will be included in the codestream. The codestream generation unit includes the compressed coding passes with the largest rate-distortion slopes into the codestream until the byte budget is exhausted.
JPEG2000 allows great flexibility in the formation of codestreams. For example, the standard allows grouping of the compressed data into layers. Layers are formed by grouping compressed coding passes from a tile. Thus, it is possible to create a codestream with several layers such that the truncation of the later layers in the codestream results in reconstruction of the image (or a tile) at reduced quality. Typical practice involves the creation of each layer to a given fixed byte budget.
The JPEG2000 standard was designed as a still image coding standard. The encoder operation described above defines how a single image can be encoded using JPEG2000. However, JPEG2000 can be used to encode the individual images that make up an image sequence, e.g., video or motion pictures. This can be done with or without Part 3 of the standard, which describes a file format for image sequences. Part 3 of the standard is sometimes referred to as Motion JPEG2000.
When JPEG2000 is used to compress a sequence of images, there are only a few methods currently known for determining what rate to use for each image in the sequence. One possibility is to select a fixed rate (i.e. fixed number of bytes) to encode each image in the sequence. While this method is simple and allows easy implementation, it does not yield adequate performance in some applications. In many image sequences, the characteristics of the images in the sequence vary immensely. Since this method assigns a fixed number of bytes to each image, the resulting decompressed image sequence exhibits large variations in quality among images.
This shortcoming has been identified by Tzannes et al in US Patent Application US 2004/0047511 A1. Tzannes et al enable adaptive selection of compression parameters to achieve some performance improvement when the images are encoded in succession. The adaptation is performed for the current image using information gathered from only the previous images in the sequence: subsequent images are not considered when allocating rate for the current image. Furthermore, if two consecutive images in the sequence are not highly correlated (such as the case during a scene change), the adaptation falters. Another alternative to fixed rate coding was presented by Dagher et al. in Resource-Constrained Rate Control for Motion JPEG2000, IEEE Transactions on Image Processing, December 2003. In this method, compressed images are placed in a buffer. Compressed data are pulled out of the buffer at a constant rate. New compressed images are added to the buffer when they become available. If the buffer is full when a new compressed image is to be added, the new compressed image, as well as the other images already in the buffer, are truncated so that all compressed data fit into the buffer. The resulting images have relatively low quality variation within a “sliding time window” corresponding to the length of the buffer employed. However, quality can vary widely over time-frames larger than the length of the sliding window.
Additionally, none of the methods above provide a capability to place size or quality requirements on subsets of image data, such as individual images, individual components, etc.