The present invention relates to a method and a system for coding of Region of interest (ROI) in still image coding schemes. The method and the system are particularly well suited for use in the JPEG 2000 standard and other wavelet based coders (as in MPEG 4) for still image compression.
In the JPEG 2000 standard there is support for the encoding of various parts of the image at various bitrates. A region encoded at a higher bit rate than the other parts of the image is considered a Region of Interest (ROI). Encoding of images with Regions of Interest has been a key issue in recent years. The JPEG 2000 standard under development has addressed the issue of efficient encoding of ROI""s, see Charilaos Christopoulos (editor), ISO/IEC JTC1/SC29/WG1 N988 JPEG 2000 Verification Model Version 2.0/2.1., Oct. 5, 1998. One of the modes for ROI coding in the JPEG 2000 verification model (VM) is called xe2x80x9cscaling based methodxe2x80x9d. In this method, the ROI coefficients are scaled up (basically shifted up), so that they are coded first during the encoding process. This gives the ability to see the important parts of the image at earlier stages of the transmission. The method increases slightly the bitrate for lossless coding of the image compared to not shifting the coefficients at all, but gives the ability of fast viewing of the important elements of the image, i.e. the ROI""s.
In JPEG 2000 the transformed images are encoded bitplane wise. This means that the information about high transform coefficients will be placed earlier in the bit stream than the rest of the information. The current xe2x80x9cscaling based coding methodxe2x80x9d for ROI coding is based on this fact. The coefficients corresponding to the ROI are upshifted prior to arithmetically encoding them. This means that information for these coefficients will be transmitted earlier in the bitstream than it would have without the shifting. At the early stages of the transmission, the ROI will be reconstructed with better quality than the background (BG). The whole operation is progressive by resolution or by quality.
Furthermore, E. Atsumi and N. Farvardin, xe2x80x9cLossy/lossless region-of-interest coding based on set partitioning in hierarchical treesxe2x80x9d, Proceedings of IEEE International Conference on Image Processing (ICIP-98), Chicago, Ill., USA, Oct. 4-7, 1998 describes the general idea of the scaling based coding method. In addition, encoding of ROI""s is disclosed in U.S. Pat. No. 5,563,960, Oct. 8, 1996, although the ROI coding method described only performs scaling of the image data and not of the coefficients.
Using the methods as described above when encoding an image at various bitrates, information about what parts of the image should be encoded at what bit rate need be available to the encoder. Whereas the ROI might easily be described in the spatial domain, it will be more complicated in the transform domain. So far the information about the ROI shape must be available to the encoder and the decoder, thus it requires extra bits in addition to the bits representing the texture information. Moreover, a shape encoder is required (at the transmitter) and a shape decoder (at the receiver), making the whole system more complex and expensive to implement. The decoder has also to produce the ROI mask, i.e. it has to define which are the coefficients needed for the reconstruction of the ROI, see Charilaos Christopoulos (editor), ISO/IEC JTC1/SC29/WG1 N988 JPEG 2000 Verification Model Version 2.0/2.1., Oct. 5, 1998, and this adds to the computational complexity and memory requirements of the receiver, which should be as simple as possible.
The currently used method to solve these problems is to include the description of the ROI in the spatial domain, in the bitstream. The necessary mask of ROI coefficients (ROI mask) for the transform domain is then created in both the encoder and the decoder, see for example Charilaos Christopoulos (editor), ISO/IEC JTC1/SC29/WG1 N988 JPEG 2000 Verification Model Version 2.0/2.1., Oct. 5, 1998. The encoder encodes the shape information, and the encoded bitstream with the shape information is added to the total bitstream and transmitted to the receiver. The receiver, from the shape information decodes the shape, makes the ROI mask, and then decodes the texture information of the image.
In the case where the ROI shape is simple, (for example rectangle or circle), the shape information is not requiring many bits. However, even in these simple cases, the receiver has to produce the ROI mask, which means that the receiver requires memory as large as the whole image (but of 1 bit/pixel) and has a certain computational complexity (since the creation of the mask is similar to doing a wavelet transform). For a complex ROI, this means that a lot of information need be transmitted between encoder and decoder and computational complexity becomes an issue. The additional overhead for shape information is significant, particularly for low bitrates.
Also, the co-pending Swedish Patent Applications 9703690-9 and 9800088-8, corresponding to co-pending U.S. application Ser. No. 09/532,768, filed on Mar. 22, 2000, describe a method in which both encoder and decoder need to use and to define the ROI mask, i.e. to find which coefficients belong to the ROI or are needed for the ROI.
It is an object of the present invention to provide a method and a system whereby no shape information needs to be transmitted in an ROI coding scheme.
This object is obtained by a method and a system wherein the ROI coefficients are encoded so that they are transmitted first and can be decoded by a receiver without transmission of the boundary of the ROI.
In a preferred embodiment the coefficients belonging to the ROI are shifted so that the minimum ROI coefficient is larger than the largest background coefficient. A receiver can then perform an opposite procedure and thereby obtain the ROI.
By specifying how much the coefficients needs to be shifted in order to avoid sending shape information several advantages are achieved. Thus, it is possible to avoid sending shape information and to avoid shape encoding at encoder side. Furthermore, there is no need for a shape decoder at receiver side, and there is no need for the receiver to produce the ROI mask.
Also, in another preferred embodiment the shifting (or scaling operations) required at encoder and decoder are also avoided.