Video surveillance, cellular telephones, digital cameras, printers, scanners, facsimile, copiers, medical imaging, satellite imaging, the Internet, and compound documents have increased the demand for image and video applications. However, due to limited resources such as bandwidth, storage and processors, high quality images are often not possible. The quality of an image depends on the number of pixels in the image and the number of bits that are allocated to each pixel. For example, a 1024×1024 pixel image with 24 bits for each pixel will be a 25 Mb high quality color image, while a 10×10 pixel image with 1 bit per pixel will be a 100 bit low quality black and white ‘thumbnail’ image.
One solution distinguishes a region-of-interest (ROI) in an image from background regions. More bits are used to encode the ROI than the background regions. By allocating more bits to the ROI than the background regions, the total number of bits used to encode the image can be reduced without decreasing the perceived resolution and quality of the ROI in the encoded image. Fewer bits reduce the required resources.
One ROI encoding method selectively scales up wavelet transformed coefficients for ROIs, Atsumi et al., “Lossy/lossless region-of-interest image coding based on set partitioning in hierarchical trees,” IEEE Proceeding of ICIP, October 1998. The ROIs are also transferred at a higher priority. However, depending on the scaling value, the ROIs can appear to blend into the background. Therefore, the decoder also needs shape information to distinguish the ROIs from the background.
The JPEG 2000 standard defines a max-shift method for ROI encoding, ISO/IEC 15444-1, “Information technology-JPEG 2000 image coding system-Part 1: Core coding system,” 1st Edition, 2000. The JPEG 2000 standard uses color conversion, quantization, wavelet transform, progressive bit-plane coding, and entropy coding. The encoded images are transferred as a layered stream of packets. With JPEG 2000, the size and quality of the output image is selected during the encoding. The max-shift method separates the ROI from the background by scaling the ROI into non-overlapping bit planes, Skodras et al., “The JPEG 2000 still image compression standard,” IEEE Signal Processing Magazine, September 2001, incorporated herein by reference. The scaling value is sufficiently large to ensure that the minimum coefficient associated with the ROI is larger than the maximum coefficient of the background. When the decoder receives the scaling value, the decoder identifies the ROI coefficients by their magnitudes. The max-shift method enables the encoding of ROIs with arbitrary shapes without explicitly transmitting the shape information of the ROI to the decoder. However, max-shift encoding increases overhead due to extra code blocks that are required to define the boundaries of the ROI.
Another method shifts bits on a plane-by-plane basis to adjust for the relative importance of the ROI, Wang et al., “Bitplane-by-bitplane shift (BbBShift)—A suggestion for JPEG 2000 Region of Interest image coding,” IEEE Signal Processing Letters, Vol. 9, No. 5, May 2002. However, the BbBShift method is not compatible with the JPEG 2000 standard.
Another method is named a “partial significant bit-planes shift” (PSBShift), Liu et al., “A new JPEG 2000 region-of-interest image coding method: partial significant bitplanes shift,” IEEE Signal Processing Letters, Vol. 10, No. 2, February 2003. The PSBShift method tries to sustain a high quality for ROIs. The BbBShift method is also incompatible with JPEG 2000 standard.
All of the above ROI encoding methods use static coding. That is, the ROI is defined during the encoding. That is a problem when the ROI information is only available during decoding, such as when a viewer desires to specify the ROI. That is also a problem if the ROI information is supplied dynamically by an external source as when, for example, an external process, such as object tracking, analyzes the images before the decoding and determines the ROIs.
A dynamic ROI coding method is described by Rosenbaum et al., “Flexible, dynamic and compliant region of interest coding in JPEG 2000,” IEEE Proceeding of ICIP, Rochester, N.Y., September 2002. That method handles dynamic ROI information in an interactive environment. That method uses a precinct/layer mechanism, as defined by the JPEG 2000 standard, to arrange the precinct priority in each layer. That method dynamically inserts layers. ROI packets remain in the same layer, while other packets are shifted up one layer. However, dynamic layer insertion requires recoding of the packet header. This requires rate-distortion recalculation, which is an undesirable feature for real-time image transmission applications.
Due to the problems of the prior art encoding methods, it is desired to provide a new encoding mechanism that avoids re-encoding of the packet header, makes the ROI coding flexible and dynamic, and has a low computational complexity. Such a method has been described by Kong et al. in U.S. patent application Ser. No. 11/002,817, “Image Transcoding,” filed on Dec. 2, 2004.
The method described by Kong et al., encodes an image sequence and stores it as a JPEG 2000 bitstream, and then the stored images are transcoded in the compressed-domain using a low-complexity adaptation technique that replaces data packets corresponding to higher quality layers with empty packets. In one particular streaming mode, the ROIs are transcoded with higher quality than the background to satisfy network constraints.
One critical issue that is not addressed by Kong et al. is the problem of rate allocation to each frame. One straightforward method, which is referred to as static rate control, allocates an equal amount of rate to each frame based on available channel bandwidth. The obvious drawback of that method is that it is not adaptive to the scene contents. Also, because there is a fixed set of rate points that could be achieved by the transcoder depending on the rate allocated to each quality layer and other transcoding parameters such as output resolution level and ROI, it is very likely that the available bandwidth is not fully utilized.
Therefore, there is a need for a method that improves overall quality over the static rate control method by utilizing more of the available bandwidth, while satisfying buffer constraints and maintaining consistent quality over time.
It is important to distinguish rate control for predictive video coding and transcoding, such as coding and transcoding of typical MPEG bitstreams, versus rate control for image transcoding, e.g., based on JPEG 2000. One key difference is that with predictive coding, a method to estimate the actual rate of the output given the set of output parameters, such as quantization step size, is required. This is difficult due to the prediction dependency between frames as well as the use of variable length coding for coding data symbols. This uncertainty is further complicated when frame skipping is employed. More specifically, when a video coder skips a frame due to high buffer occupancy, it is difficult to estimate, ahead of time, the amount of bits that will be used to code the next frame because the correlation between frames varies and many complicated processes such as motion estimation need to be performed in order to determine the symbols that are actually be coded. This dependency is not present in the transcoding of an encoded image sequence and the use of embedded coding methods allows us to precisely determine the bit rate for different quality settings and when a frame is skipped.