1. Field of the Invention
The present invention relates to an image enhancement device that enhances portions of an image by assigning larger amounts of code to regions of interest extracted from the image data.
2. Description of the Background Art
Images, particularly multi-valued images, contain huge amounts of information so that storing or transmitting images involves storage or transmission of huge amounts of data. Accordingly, high-efficiency coding is used to reduce the amounts of data by, e.g., removing redundancy of images or altering the contents of images to such an extent that the deterioration of image quality is visually unrecognizable.
On the other hand, uniformly applying the high-efficiency coding to the entire image undesirably lowers image quality of highly significant regions in the image. It is therefore desirable to apply the high-efficiency coding only to less significant regions in the image so as to reduce the total amount of data while avoiding deterioration of image quality of highly significant regions.
As a next-generation high-efficiency image data coding scheme, the ISO (International Organization for Standardization) and ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) provide the JPEG 2000 (Joint Photographic Experts Group 2000) system. As compared with the currently dominating JPEG (Joint Photographic Experts Group) system, the JPEG 2000 system has superior functions and is characterized in that it adopts the DWT (Discrete Wavelet Transform) as orthogonal transform and the EBCOT (Embedded Block Coding with Optimized Truncation), implementing bit-plane coding, as entropy coding.
FIG. 9 is a functional block diagram schematically showing the configuration of a compression coding device 100 that performs a compression image coding based on the JPEG 2000. Now, referring to FIG. 9, the compression coding procedure according to the JPEG 2000 system will be briefly described.
An image signal inputted to the compression coding device 100 undergoes, as needed, DC level shifting in a DC level shifting block 102, and is then inputted to a color space transform block 103. The color space transform block 103 applies color space transform to the image signal provided from the DC level shifting block 102. For example, an RGB signal inputted to the color space transform block 103 is transformed into a YCbCr signal (a signal composed of a luminance signal Y and color difference signals Cb and Cr).
A tiling block 104 divides the image signal provided from the color space transform block 103 into a plurality of rectangular-shaped region components called “tiles”. A DWT block 105 applies an integer type or real-number type DWT on a tile basis, to the image signal inputted from the tiling block 104, and the DWT block 105 outputs the resultant transform coefficients. The DWT adopts one-dimensional filters to divide a two-dimensional image signal into a high-band component (high-frequency component) and a low-band component (low-frequency component) in each of vertical and horizontal directions. The JPEG 2000 basic system adopts an octave division scheme that recursively band-divides only a band component that has been divided into the low band both in vertical and horizontal directions. The number of recursive band-divisions is called a decomposition level.
FIG. 10 is a schematic diagram illustrating a two-dimensional image 120 that has undergone a DWT with cubic decomposition levels according to the octave division scheme. At the decomposition level 1, the two-dimensional image 120 is divided into four band components including HH1, HL1, LH1, and LL1 (not shown) through sequential applications of the above-mentioned one-dimensional filters in vertical and horizontal directions. Where, “H” indicates a high-band component and “L” indicates a low-band component. For example, HL1 is a band component composed of a high-band component H in the horizontal direction and a low-band component L in the vertical direction at the decomposition level 1. The notation is generalized as “XYn” (X, Y are H or L, and n is an integer of 1 or more), which indicates a band component composed of a band component X in the horizontal direction and a band component Y in the vertical direction at a decomposition level n. At the decomposition level 2, the low-band component LL1 is band-divided into HH2, HL2, LH2, and LL2 (not shown). At the decomposition level 3, the low-band component LL2 is further band-divided into HH3, HL3, LH3, and LL3. While FIG. 10 shows an example of DWT of cubic decomposition levels, the JPEG 2000 system generally adopts decomposition levels of degree three to degree eight.
Referring to FIG. 9, a quantization block 106 applies, as needed, a scalar quantization to the transform coefficients inputted from the DWT block 105. The quantization block 106 also has a function of performing bit shifting to give priority to image quality of regions of interest (ROIs) having greater significance in the image, on the basis of a signal input from an ROI block 107. The scalar quantization in the quantization block 106 is not done when a reversible (lossless) transform is performed. The JPEG 2000 system prepares two kinds of quantization means including the scalar quantization by the quantization block 106 and a post-quantization (truncation) described later.
As for methods for extracting regions of interest, Japanese Patent Application Laid Open Nos. 2004-200739 and 2001-119696 disclose techniques of comparing a plurality of images to extract objects with larger motion vectors as regions of interest. Also, there is a simple method that extracts tiles in a central portion of an image as a region of interest on the assumption that the central portion of the image is important. Furthermore, there is a method in which, with a plurality of templates prepared for various objects, a matching is performed between a taken image and the templates, and when the taken image includes an object matching with a template, that object is extracted as a region of interest.
Typical usage of ROI includes a Max-shift scheme as an optional function of the JPEG 2000.
The Max-shift scheme specifies ROI portions in an arbitrary manner and compresses those portions with higher image quality while compressing the non-ROI portion with lower image quality. Specifically, first, the wavelet transform is applied to an original image to obtain a distribution of wavelet coefficients, and a value Vm of the largest wavelet coefficient in the coefficient distribution corresponding to the non-ROI portion is obtained. Then, a number of bits, S, is obtained so that S>=max (Vm), and only the wavelet coefficients of the ROI portions are shifted by S bits in the increasing direction. For example, when the value of Vm is “255” in decimal (i.e., “11111111” in binary), then S=8, and when the value of Vm is “128” in decimal (i.e., “10000000” in binary), then S=8, too, in which cases the wavelet coefficients of the ROI portions are shifted by S=8 bits in the increasing direction. Thus, the compression ratio is set lower for the ROI portions than for the non-ROI portion, whereby compressed data of higher image quality can be obtained about the ROI portions.
Transform coefficients outputted from the quantization block 106 undergo a block-based entropy coding by a coefficient bit modeling block 108 and an arithmetic coding block 109 according to the EBCOT mentioned above, which is followed by rate control by a code rate control block 110. Specifically, the coefficient bit modeling block 108 divides the band components of the input transform coefficients into a plurality of regions, called “code blocks”, of, e.g., 16×16, 32×32, or 64×64, and further decomposes each code block into a plurality of bit planes composed of two-dimensional arrangements of bits.
FIG. 11 is a schematic diagram illustrating the two-dimensional image 120 that has been divided into a plurality of code blocks 121. FIG. 12 is a schematic diagram illustrating n bit planes 1220 to 122n-1 (n: a natural number) that forms a code block 121. As shown in FIG. 12, when the binary value 123 of the transform coefficient at a point in the code block 121 is “011 . . . 0”, individual bits forming the binary value 123 are decomposed so that they belong respectively to the bit planes 122n-1, 122n-2, 122n-3, . . . , 1220. The bit plane 122n-1 is the most significant bit plane composed only of the most significant bits (MSBs) of the transform coefficients and the bit plane 1220 is the least significant bit plane composed only of the least significant bits (LSBs) of the transform coefficients.
The coefficient bit modeling block 108 performs a context judgment of each bit in each bit plane 122k (k=0 to n−1), and as shown in FIG. 13, it decomposes a bit plane 122k into three kinds of coding passes, i.e., SIG pass (SIGnificance propagation pass), MR pass (Magnitude Refinement pass), and CL (CLeanup pass), according to the significance of each bit (the results of judgement). The algorithm of the context judgement about each coding pass is defined by the EBCOT. According to the definition, “significant” means a state in which it is known in the preceding coding process that a target coefficient is not zero, and “insignificant” means a state in which the coefficient value is zero or may be zero.
The coefficient bit modeling block 108 performs bit-plane coding using the three kinds of coding passes, including the SIG pass (a coding pass of insignificant coefficients around which there is a significant coefficient or coefficients), the MR pass (a coding pass of significant coefficients), and the CL pass (a coding pass of the remaining coefficient information not corresponding to the SIG pass and MR pass). The bit-plane coding is achieved by scanning the bits of each bit plane on a 4-bit basis, starting from the most significant bit plane to the least significant bit plane, and judging whether there are significant coefficients. The number of bit planes composed only of insignificant coefficients (0 bits) is recorded in a packet header and the actual coding is started at a bit plane where a significant coefficient appears first. The coding-starting bit plane is coded only with the CL pass and the bit planes below that bit plane are sequentially coded with the three kinds of coding passes.
FIG. 14 is a diagram showing an R-D curve that represents a relation between rate (amount of code; R) and distortion (D). In FIG. 14, R1 shows the rate before bit-plane coding, R2 shows the rate after bit-plane coding, D1 shows the distortion before bit-plane coding, and D2 shows the distortion after bit-plane coding. Also, A, B, C are labels representing the above-described coding passes. As to the route from the starting point P1 (R1, D1) to the end point P2 (R2, D2), it is preferable, for more efficient coding, to adopt the route of concave-shaped curve A-B-C than the route of convex-shaped curve C-B-A. It is known that such a concave-shaped curve can be implemented by coding from the most significant bit plane to the least significant bit plane.
Referring to FIG. 9, according to the results of the context judgement, the arithmetic coding block 109, using an MQ coder, applies arithmetic coding on a coding-pass basis to the rows of coefficients inputted from the coefficient bit modeling block 108. There is also a mode of bypass processing in which part of the rows of coefficients inputted from the coefficient bit modeling block 108 are not arithmetic-coded.
The code rate control block 110 applies post-quantization of truncating lower-order bit planes of the code rows inputted from the arithmetic coding block 109, so as to control the final amount of code. The bit stream generating block 111 multiplexes the code rows inputted from the code rate control block 110 and additional information (header information, layer structure, scalability information, quantization tables, etc.) to generate a bit stream, which is outputted as a compressed image.
As mentioned above, methods of extracting regions of interest include a method of extracting objects with larger motion vectors, a method of extracting a central portion of an image as a region of interest, and a method of performing object template matching.
However, according to the method of extracting objects with larger motion vectors as regions of interest, it is difficult to extract still objects and objects with smaller motion vectors (e.g., plants and scenery) as the regions of interest. Moreover, obtaining motion vectors requires comparing at least two successive images, which leads to increased circuit scale and more complex processing.
According to the method of extracting a central portion of an image as a region of interest, a significant object cannot be extracted as a region of interest when the significant object is not contained in central tiles of the image.
Also, the method of adopting object template matching requires increased circuit scale because this method requires previously storing a huge amount of template data about various objects in memory.