1. Field of the Invention
This invention relates to the method of automatically determining the region of interest (ROI) from an image during an encoding process, more particularly, to a method of generating the effective ROI mask improving the quality of the decoded image for applications of image compression.
2. Description of the Prior Art
According to development and advancement of information technology, the information of speech, audio, image and video is transmitted by the wireless or wired channel, such as the personal mobile phone or internet, to reach the multimedia transmission. In addition, it is difficult to store a large amount of multimedia information. Consequently, the multimedia information must be compressed for the purposes of transmission and storage.
The Joint Photographic Experts Group (JPEG) was organized under ISO and ITU-T in 1986 for developing and promoting the still image compression standards, such as JPEG and JPEG 2000. Presently, the discrete cosine transform (DCT) or discrete wavelet transform (DWT) are utilized to reduce the spatial redundancy in most of the image and video compression standards, including JPEG, JPEG 2000, MPEG-1, MPEG-2, MPEG-4, H.261, H.263+ and so on. In the JPEG compression, an image frame is partitioned into many 8×8-pixel blocks where the 8×8-pixel block is a coding unit. For each block, the DCT, quantization and entropy coding are performed to compress the image. On the other hand, the JPEG 2000 utilizes the DWT to remove the spatial redundancy for compressing an image. Additionally, the transformed coefficients go through bit-plane coding, arithmetic coding and bit-stream arrangement to accomplish the compression process.
FIG. 1 shows the coding process of the JPEG where an image frame is partitioned into many 8×8-pixel blocks, each of which is transformed by the DCT to obtain the DCT coefficients. Further, the DCT coefficients are quantized, and then go through run length coding and entropy coding to create the encoded bit-stream. The main property of the DCT is energy compaction effect which causes the transformed coefficients displayed from the low-frequency components at the left-upper region to the high-frequency components at the right-lower region of an 8×8-pixel block. From this property, the statistic characteristics of low-frequency or high-frequency components can be investigated to determine and generate the ROI during compression process according to the bit-rate requirement.
As compared to the JPEG, the JPEG 2000 has many features such as a high compression rate, an embedded bit stream, multiple resolution representation, lossy and lossless compression, ROI and error resilience, et al. Especially, the ROI enhances the picture quality at the interested region during the lossy compression for transmission at a limited bandwidth. The compression standard of the JPEG 2000 has six parts where the part 1 builds a basic compression standard, and the part 2 to part 6 are expanded from the part 1. In the JPEG 2000, an image frame goes through the discrete wavelet transform (DWT) and its transformed coefficients are then quantized. The transformed coefficients after quantization are partitioned into N×N-pixel codeblocks where each codeblock is processed by using bit-plane coding. With a bit-plane as the coding unit, a codeblock is processed by the embedded block coding with optimized truncation (EBCOT), including pass coding and arithmetic coding, bit-plane by bit-plane to yield a high-efficiency embedded bit stream. Referring to FIG. 2, the coding process of JPEG 2000 comprises the three steps of:    1. Providing the pre-process of an image frame, including tile dividing and color transform where the size for tile dividing is determined by the system requirement and each tile divided from an image frame is used for color transform;    2. Providing the block after the color transform for going through the DWT to remove the spatial redundancy and the transformed coefficients are quantized; and    3. Coding the transformed coefficients after quantization be bit-plane by the EBCOT to eliminate the bit redundancy to generate an output bit stream based on the packed unit.
The JPEG 2000 in the part 1 provides the option of the ROI coding that would sacrifice image quality of the uninterested region to improve image quality of the interested region. In the JPEG 2000 coding process, the ROI is firstly coded to yield a bit stream and has good visual quality at a limited bandwidth so that the ROI coding is very important in applications of internet and wireless communications. The picture content of an image frame can be partitioned into the interested and uninterested regions in the ROI applications. The position of the ROI need be embedded in the coding bit stream such that the decoder can exactly extract the ROI at good visual quality. However, the JPEG 2000 in part 1 comprising the maxshift coding scheme to embed the information of ROI does not need additional bits to store the position of the ROI, and its decoder can effectively decode the bit stream to obtain the good visual quality at the ROI.
The image coding has a function of ROI for enhancing the quality of the decoded image at the particular objects which are manipulated by using the enhancement techniques. These particular objects of an image are treated as the region of interest, which are provided by more bits to interpret themselves or are quantized by using small quantization steps in order to achieve good visual quality. The above mentioned techniques could enhance the perceptual effect of the decoded ROI, but the ROI is determined by using object segmentation and recognition or by user's hand operation. These two determination techniques of the ROI have the following disadvantages:    (1) Using the object segmentation and recognition to determine the ROI takes a lot of computation time, and cannot adjust the sizes of the recognized objects in the ROI to meet the bit-rate requirement; and    (2) Using the hand operation to select the fixed region of an image for the ROI also cannot adjust the size of the ROI to meet the bit-rate requirement.Therefore, the present invention discloses a method for analyzing the coefficients after transformation and automatically determining the ROI from the transformed coefficients. To obtain good visual perception and quality of the decoded image, this invention explores the picture content and bit-rate requirement to generate an adequate size and location of the ROI during the encoding process.