This application claims the benefit of Japanese Patent Applications No. 2001-175081 filed Jun. 11, 2001 and No. 2001-178310 filed Jun. 13, 2001, in the Japanese Patent Office, the disclosure of which is hereby incorporated by reference.
1. Field of the Invention
The present invention generally relates to image compression methods and apparatuses, image expansion methods and apparatuses, and storage media, and more particularly to an image compression method and apparatus, an image expansion method and apparatus, and a computer-readable storage medium which stores a program for causing a computer to carry out an image compression and/or an image expansion so as to suppress a quantization rate in a vicinity of a tile boundary.
2. Description of the Related Art
Due to improvements made in image input techniques and image output techniques, there are increased demands to improve the high definition property of color still images. One example of an image input apparatus is a digital camera (DC), and helped by reduced cost of high-performance charge coupled devices (CCDs) having 3,000,000 or more pixels, such high-performance CCDs are widely used for digital cameras in the popular price range. It is expected that products using CCDs having 5,000,000 or more pixels will be put on the market in the near future. Such high-performance input devices, typified by the CCDs, have been realized mainly due to improvements in silicon processes and device technologies, and have solved the tradeoff problem between miniaturizing the input device and suppressing signal-to-noise (S/N) ratio. This trend of increasing the number of pixels of the input device is expected to continue.
On the other hand, considerable improvements have also been made in image output devices and image display devices, such as hard-copy devices including laser printers, ink jet printers and thermal printers, and soft-copy devices including flat panel displays made of CRTs, liquid crystal displays (LCDs) and plasma display panels (PDPs). The high definition property of such image output devices and image display devices have improved considerably, and the cost has greatly been reduced.
Because of these high-performance and inexpensive image input devices and image output devices on the market, the use of high-definition still images have become popular. It is expected that the demands for high-definition still images will increase in the future in various fields. Actually, the developments in personal computers (PCs) and network-related technologies including the Internet have accelerated such trends. Especially in recent years, mobile equipments such as portable telephones and lap-top computers have become extremely popular, and there are more and more opportunities to transmit or receive high-definition images via a communication means. Consequently, it is expected that the demands to further improve the performance or function of the image compression and/or expansion techniques will increase so as to facilitate processing of the high-definition still images.
As one of image compression and expansion algorithms for facilitating the processing of such high-definition still images, the JPEG (Joint Photographic Experts Group) system is popularly used. In addition, the JPEG2000 which has become an international standard in 2001 uses an image compression and expansion algorithm with a high performance which is further improved compared to the JPEG. Hence, the JPEG2000 is extremely flexible and extendible with respect to various functions and various applications. Accordingly, there is much expectation on the JPEG2000 as a next-generation high-definition still image compression and expansion format which will succeed the JPEG.
FIG. 1 is a system block diagram for explaining the operating principle of the JPEG algorithm. The JPEG algorithm is realized by a color space transform and inverse transform section 40, a discrete cosine transform and inverse transform section 41, a quantization and inverse quantization section 42, and an entropy coding and decoding section 43. Normally, a non-independent-function is used in order to obtain a high compression rate, and the so-called lossless (or no-loss) compression and expansion will not be carried out. Although the original image data is not stored in their entirety, no problems will occur from the practical point of view. For this reason, the JPEG system can suppress the amount of memory capacity required to carry out the compression and expansion processes and to store the compressed image data. In addition, the JPEG system greatly contributes to reducing the time which is required for the data transmission and reception. Because of these advantages, the JPEG system is presently the most popularly used still image compression and expansion algorithm.
FIG. 2 is a system block diagram for explaining the operating principle of the JPEG2000 algorithm. The JPEG algorithm is realized by a color space transform and inverse transform section 50, a two-dimensional wavelet transform and inverse transform section 51, a quantization and inverse quantization section 52, an entropy coding and decoding section 53, and a tag processing section 54.
As described above, the JPEG system is the most popularly used still image compression and expansion system at the present. However, the demands to further improve the high definition properties of still images continue to increase, and the technical limits of the JPEG system are beginning to surface. For example, block noise and mosquito noise appearing in the image were not conspicuous in the past, but are gradually becoming more conspicuous as the high definition property of the original image improves. In other words, the image deterioration of the JPEG file which did not cause problems in the past are now becoming notable and no longer negligible from the practical point of view. As a result, image quality improvement at the low bit rate, that is, in the high compression rate region, is recognized as the most important problem to be solved for the algorithm. The JPEG2000 was developed as an algorithm capable of eliminating this problem, and it is expected that the JPEG2000 system will be used concurrently with the existing JPEG system.
When FIGS. 1 and 2 are compared, it may be seen that the transformation method is one of the largest differences between the JPEG and the JPEG2000. The JPEG system employs the discrete cosine transform (DCT), while the JPEG2000 system employs the discrete wavelet transform (DWT). Compared to the DCT, the DWT has an advantage in that the image quality is good in the high compression region, which is the main reason for employing the DWT in the JPEG2000 system.
Another large difference between the JPEG and the JPEG2000 is that the JPEG2000 additionally uses a functional block called the tag processing section 54 at the last stage for forming codes. The tag processing section 54 generates the compressed data as a code stream at the time of the compression operation, and interprets the code stream necessary for the expansion at the time of the expansion operation. The JPEG2000 can realize various convenient functions by the code stream. For example, FIG. 3 is a diagram showing an example of a subband at each decomposition level for a case where the decomposition level is 3. It is possible to freely stop the still image compression and expansion operation at an arbitrary level corresponding to the octave division in the DWT of the block base shown in FIG. 3.
At the original image input and output sections shown in FIGS. 1 and 2, the color space transform and inverse transform sections 40 and 50 are connected in most cases. For example, a transformation from the RGB colorimetric system made up of red (R), green (G) and blue (B) components of the primary color system or, from the YMC colorimetric system made up of yellow (Y), magenta (M) and cyan (C) components of the complementary color system, to the YUV or YCrCb colorimetric system or, an inverse transformation, is carried out in the color space transform and inverse transform sections 40 and 50.
Next, a description will be given of the JPEG2000 algorithm. The technical terms related to the JPEG2000 are in conformance with the JPEG2000 Final Draft International Standard (FDIS). Typical technical terms are defined as follows.    1. “Bit-Plane”: A two-dimensional array of bits. In this Recommendation International Standard a bit-plane refers to all the bits of the same magnitude in all coefficients or samples. This could refer to a bit-plane in a component, tile-component, code-block, region of interest, or other.    2. “Code-Block”: A rectangular grouping of coefficients from the same subband of a tile-component.    3. “Decomposition Level”: A collection of wavelet subbands where each coefficient has the same spatial impact or span with respect to the source component samples. These include the HL, LH, and HH subbands of the same two-dimensional subband decomposition. For the last decomposition level the LL subband is also included.    4. “Layer”: A collection of compressed image data from coding pass of one, or more, code-blocks of a tile-component. Layers have an order for encoding and decoding and decoding that must be preserved.    5. “Precinct”: A one rectangular region of a transformed tile-component, within each resolution level, used for limiting the size of packets.
FIG. 4 is a diagram showing an example of each component of a color image divided into tiles. Generally, each of components 70, 71 and 72 (RGB primary color system in this case) of the original image is divided into rectangular regions (tiles) 70t, 71t and 72t in the color image as shown in FIG. 4. Each of the tiles, such as R00, R01, . . . , R15, G00, G01, . . . , G15, B00, B01, . . . , B15, becomes a basic unit for executing the compression and expansion process. Accordingly, the compression and expansion operation is independently carried out for every component and for every tile. At the time of the coding, the data of each tile of each component is input to the color space transform and inverse transform section 50 and subjected to a color space transform, and is thereafter subjected to a two-dimensional wavelet transform (forward transform) in the two-dimensional wavelet transform section 51 and spatially divided into frequency bands.
FIG. 3 described above shows the subband at each decomposition level for the case where the decomposition level is 3. In other words, the two-dimensional wavelet transform is carried out with respect to the tile original image (0LL) (decomposition level 0 (60)) obtained by the tile-division of the original image, so as to separate the subbands (1LL, 1HL, 1LH, 1HH) indicated by the decomposition level 1 (61). The two-dimensional wavelet transform is then carried out with respect to the low-frequency component 1LL at this level, so as to separate the subbands (2LL, 2HL, 2LH, 2HH) indicated by the decomposition level 2 (62). Similarly thereafter, the two-dimensional wavelet transform is also carried out with respect to the low-frequency component 2LL, so as to separate the subbands (3LL, 3HL, 3LH, 3HH) indicating the decomposition level 3 (63).
Furthermore, in FIG. 3, the color of the subband which is the target of the coding at each decomposition level is indicated by gray. For example, when the decomposition level is 3, the subbands (3HL, 3LH, 3HH, 2HL, 2LH, 2HH, 1HL, 1LH, 1HH) indicated by gray are the coding targets, and the 3LL subband is not coded.
Next, the bits which are the coding targets are determined in the specified coding order, and context is generated from the peripheral bits of the target bit in the quantization and inverse quantization section 52.
The wavelet coefficients after the quantization process ended are divided into non-overlapping rectangular regions called precincts, for each of the individual subbands. The precincts are introduced to efficiently utilize the memory upon implementation.
FIG. 5 is a diagram for explaining one example of the relationship of the precinct and the code block. An original image 80 is divided into 4 tiles 80t0, 80t1, 80t2 and 80t3 at the decomposition level 1. As shown in FIG. 5, a precinct 80p4, for example, is made up of 3 spatially matching rectangular regions, and the same holds true for a precinct 80p6. Furthermore, each precinct is divided into non-overlapping rectangular blocks called code blocks. In this particular example, each precinct is divided into 12 code blocks 0 to 11, and for example, a code block 80b1 indicates a code block number 1. The code block becomes a basic unit when carrying out the entropy coding.
The coefficients after the wavelet transform may be quantized and coded as they are. However, in order to improve the coding efficiency, the JPEG2000 decomposes the coefficient values into bit-plane units, and the bit-planes may be ordered for every pixel or code block.
FIG. 6 is a diagram for explaining the procedure for ordering the bit-planes. In the particular example shown in FIG. 6, an original image 90 (32×32 pixels) is divided into 4 tiles 90t0, 90t1, 90t2 and 90t3 each having 16×16 pixels. The sizes of the code block and the precinct at the decomposition level 1 respectively are 4×4 pixels and 8×8 pixels. The numbers of the precincts and the code blocks are assigned in a raster sequence. In this particular example, numbers 0 to 3 are assigned to the precincts, and numbers 0 to 3 are assigned to the code blocks. A mirroring method is used for the pixel expansion with respect to the outside of the tile boundary, and the wavelet transform is carried out by a independent-function (5, 3) integer transform filter to obtain the wavelet coefficients of the decomposition level 1.
In addition, FIG. 6 also generally shows the typical layer structure for the tile 90t0 (tile 0) precinct 90p3 (precinct 3) and the code block 90b3 (code block 3). A code block 90w3 after the transform is obtained by subjecting the code block 90b3 to the wavelet transform by the independent-function (5, 3) integer transform filter and obtaining the wavelet coefficient values of the decomposition level 1. The code block 90w3 after the transform is vided into the subbands (1LL, 1HL, 1LH, 1HH), and the wavelet coefficient values are allocated to each of the subbands.
The layer structure is easier to understand when the wavelet coefficient values are viewed from a horizontal direction (bit-plane direction). One layer is made up of an arbitrary number of bit-planes. In this example, each of the layers 0, 1, 2 and 3 is made up of 3 bit-planes 1, 3 and 1. The layer which includes a bit-plane closer to the LSB becomes the quantizing target earlier, and the layer including the bit-plane closer to the MSB becomes the quantizing target later and remains unquantized to the last. The method of discarding the layer closer to the LSB is called truncation, and the quantization rate can finely be controlled by this truncation.
In the entropy coding section 53 shown in FIG. 2, the coding with respect to the tiles of each of the components is carried out by probability estimation from the context and target bits. Hence, the coding process is carried out in units of tiles for all of the components of the original image.
Finally, the tag processing section 54 carries out a process of connecting all code data from the entropy coding section 53 into one code stream and adding a tag to this code stream. FIG. 7 is a simplified diagram showing an example of the code stream structure. As shown in FIG. 7, tag information called a header is added to the head of the code stream and to the head of the partial tile forming each tile. A main header 100 is added to the head of the code stream, and a tile-part header 101 is added to the head of the partial file. The coded data (bit stream 102) of each tile follows the tile-part header 101. An end tag 103 is also added to the end of the code stream.
On the other hand, at the time of the decoding, contrary to that at the time of the coding, the image data is generated from the code stream of each tile of each component, as will now be described briefly in conjunction with FIG. 2. In this case, the tag processing section 54 interprets the tag information added to the code stream which is input from the outside, decomposes the code stream into the code stream of each tile of each component, and carries out the decoding process for every code stream of each tile of each component. The position of the bit which is the target of the decoding is determined by the order based on the tag information within the code stream, and the quantization and inverse quantization section 52 generates the context from the arrangement of the peripheral bits (which have already been decoded) of the target bit position. The entropy coding and decoding section 53 carries out a decoding according to the probability estimation from the context and the code stream to generate the target bit, and the target bit is written at the target bit position.
The data decoded in this manner has been spatially divided for every frequency band. Hence, the decoded data is subjected to a two-dimensional wavelet inverse transform in the two-dimensional wavelet transform and inverse transform section 51, so as to restore each tile of each component of the image data. The restored data is transformed into the original colorimetric system data by the color space transform and inverse transform section 50.
In the case of the conventional JPEG compression and expansion system, the tile used in the JPEG2000 may be regarded as a square block having each side made up of 8 pixels and used for the two-dimensional discrete cosine transform.
The description given heretofore relates to the general still image. However, the technique described above may be extended to the moving (or dynamic) image. In other words, each frame of the moving image may be formed by one still image, and the still images may be displayed at an optimum frame rate for the application so as to obtain the moving image. Video data is obtained by coding the original still image or decoding compressed still image data, continuously, and thus, the compression and expansion operation is basically the same as that for the still image. Such an compression and expansion operation is sometimes also referred to as a motion still image compression and expansion process. This function of carrying out the motion still image compression and expansion process does not exist in the MPEG system video file which is presently used popularly for the moving image. Since this function of carrying out the motion still image compression and expansion process has an advantage in that high-quality still images can be edited in units of frames, there is much attention on this function for business use in broadcasting stations or the like. Hence, this function has the possibility of eventually being used by the general consumers.
The specification required of the motion still image compression and expansion algorithm but greatly differs from that required of the general still image compression and expansion algorithm is the processing speed (or rate). This is because the processing speed determines the frame rate which greatly affects the quality of the moving image. Because of the need to carry out the process in real-time, the method of realizing the motion still image compression and expansion algorithm is limited to methods having a high dependency on the hardware such as ASIC and DSP. Although it may eventually become possible in the future to realize a sufficiently high speed process by software, it seems necessary until then to wait for further progress in the fields such as semiconductor process and device techniques and software parallel compiler techniques.
However, according to the conventional techniques, there is a problem in that the tile boundary becomes conspicuous when the compression and expansion process is carried out under a high compression rate condition. The amount of data of the image becomes extremely large when the original image which is the target of the compression and expansion process is spatially large in area or includes large number of gradation levels for each of the color components. The concept of using tiles was developed to simultaneously cope with the demands to improve the high-definition still image described above and the technical problem of increasing amount of image data.
If the original image having an extremely large amount of data is processed as it is, an extremely large memory region is required to provide a working area for processing the image data and to provide an area for holding the processed result. In addition, the processing time required for the compression or expansion becomes extremely long. In order to avoid such problems, the original image is divided into units called tiles (blocks in the case of the JPEG) which are rectangular regions, and the compression and expansion process is normally carried out for each of such regions. By employing this concept of spatially dividing the original image into tiles, it has become possible to suppress the increase in the required memory capacity and processing time to a practical level.
However, the division of the original image into tiles has introduced a new problem, namely, the conspicuous tile boundary described above. This phenomenon of conspicuous tile boundary occurs when the compressed image data which is generated under a high compression rate condition by nonreversibly compressing (lossy encoding) the original image is expanded (decoded) back to the original image. Particularly when displaying a high-definition still image having a large area or moving image frequently using a high compression rate, the conspicuous tile boundary subjectively causes considerable effects even if the image quality within the tile is maintained satisfactory. Hence, this conspicuous tile boundary may cause serious consequences in the future with respect to one advantage of the JPEG2000, that is, the reduced image quality deterioration under the high compression rate.
FIG. 8 is a diagram showing an example of an image which is obtained by compressing the original image to 1/75  by a lossless (no-loss) compression and thereafter expanding the compressed image. FIG. 9 is a diagram showing an example of an error image between the original image and the image after expansion. In FIGS. 8 and 9, portions indicated by arrows 110a and 111a correspond to boundaries of the mutually adjacent tiles. It may be seen that a conspicuous discontinues line exists at these portions 110a and 111a. 
When the image is compressed at a high compression rate and then expanded, it may be regarded that the two-dimensional wavelet transform process causes the conspicuous tile boundary. In other words, when the lowpass filter and highpass filter in the horizontal direction and the lowpass filter and highpass filter in the vertical direction carry out the respective filtering operations, the region which is the target of the operations extends outside the tile where the image data does not exist. The rate at which the operation target region extends outside the tile increases as the decomposition level increases.
According to the JPEG2000 format, various filters such as the non-independent-function (9, 7) floating point transform filter and the independent-function (5, 3) integer transform filter are recommended for use as the wavelet filter. For the sake of convenience, the detailed operation of the wavelet transform and the reason why the tile boundary appears will now be described for a case where the independent-function (5, 3) integer transform filter is used as the wavelet filter.
FIG. 10 is a diagram showing a pixel expansion using the mirroring method. As shown in FIG. 10, a case will be considered where characters “RICOH” are arranged in one row of a target tile 112. It is assumed that each character corresponds to the value of 1 pixel, and the first character “R” is the kth pixel and the last character “H” is the mth pixel. When carrying out the wavelet transform with respect to this tile 112, several pixels before the kth pixel and several pixels after the mth pixel become necessary. Hence, it is necessary to extend the pixels outside a tile boundary 112a according to the mirroring method, as shown in FIG. 10. Expanded pixels are denoted by a reference numeral 113.
In the (5, 3) independent-function wavelet filter, the values of the odd numbered pixels and the wavelet coefficient values of the even numbered pixels are respectively calculated according to formulas (1) and (2), where C(2i+1), C(2i), . . . are wavelet coefficient values and P(2i+1), P(2i), . . . are pixel values.C(2i+1)=P(2i+1)−|_(P(2i)+P(2i+2))/2_| for k−1≦2k+1<m+1  (1)C(2i)=O(2i)+|_(C(2I−1)+C(2I+1)+2)/4_| for k≦2I<m+1  (2)
FIGS. 11A through 11G are diagrams showing pixel values and wavelet coefficient values when the decomposition level is 1, for the case where a lossless (no-loss) (5, 3) independent-function wavelet transform is carried out with respect to a square tile made up of 16×16 pixels. In FIG. 11A, the numerals arranged outside the tile indicate the pixel values extended by the mirroring method.
A vertical direction highpass filter operation shown in FIG. 11B and a vertical direction lowpass filter operation shown in FIG. 11C are carried out with respect to the tile having the pixel values shown in FIG. 11A. Next, a horizontal direction lowpass filter operation and a horizontal direction highpass filter operation are carried out with respect to the result of the vertical direction lowpass filter operation shown in FIG. 11C, so as to obtain the LL component shown in FIG. 11D and the HL component shown in FIG. 11E of the wavelet coefficients at the decomposition level 1. On the other hand, a horizontal direction lowpass filter operation and a horizontal direction highpass filter operation are carried out with respect to the result of the vertical direction highpass filter operation shown in FIG. 11B, so as to obtain the LH component shown in FIG. 11F and the HH component shown in FIG. 11G of the wavelet coefficients at the decomposition level 1.
FIGS. 12A through 12C are diagrams showing examples of pixel values of a square tile made up of 16×16 pixels which is obtained by carrying out an inverse transform from the wavelet coefficients derived in FIGS. 11A through 11G.
FIG. 12A shows the coefficient values of each of the subbands of the decomposition level 1, which are obtained by the forward wavelet transform described above in conjunction with FIGS. 11A through 11G, and are rearranged by interleaving.
FIG. 12B shows the result which is obtained by carrying out the horizontal direction inverse transform filter operation on the odd number pixels, followed by the horizontal direction inverse transform filter operation on the even numbered pixels, with respect to the coefficient values shown in FIG. 12A, and FIG. 12C shows the result which is obtained by carrying out the vertical direction inverse transform filter operation on the even number pixels, followed by the vertical direction inverse transform filter operation on the odd numbered pixels.
FIG. 13 is a diagram showing an example of the comparison result which is obtained by comparing the pixel values of the original image shown in FIG. 11A and the pixel values which are obtained by carrying out the lossless transform and inverse transform by the pixel expansion according to the mirroring method shown in FIG. 12C. In the case shown in FIG. 13, the error is indicated by a difference between the individual pixels. It may be seen from FIG. 13 that, the pixel values after the compression and expansion process perfectly match the pixel values of the original image, for all of the tiles.
FIGS. 14A through 14G are diagrams showing pixel values and wavelet coefficient values when the decomposition level is 1, for the case where a lossy (5, 3) independent-function wavelet transform is carried out with respect to a square tile made up of 16×16 pixels. In FIG. 14A, the numerals arranged outside the tile indicate the pixel values extended by the mirroring method. However, in order to facilitate comparison with FIG. 11A, FIG. 14A shows the result of the quantization and inverse quantization.
A vertical direction highpass filter operation shown in FIG. 14B and a vertical direction lowpass filter operation shown in FIG. 14C are carried out with respect to the tile having the pixel values shown in FIG. 14A. Next, a horizontal direction lowpass filter operation and a horizontal direction highpass filter operation are carried out with respect to the result of the vertical direction lowpass filter operation shown in FIG. 14C, so as to obtain the LL component shown in FIG. 14D and the HL component shown in FIG. 14E of the wavelet coefficients at the decomposition level 1. On the other hand, a horizontal direction lowpass filter operation and a horizontal direction highpass filter operation are carried out with respect to the result of the vertical direction highpass filter operation shown in FIG. 14B, so as to obtain the LH component shown in FIG. 14F and the HH component shown in FIG. 14G of the wavelet coefficients at the decomposition level 1. In this particular example, the quantization step size is 4 (LL component)/32 (HL & HH components)/64 (HH component). The wavelet coefficient values after the quantization are obtained by adding the original positive or negative sign to the floor function of the value obtained by dividing the coefficient of each subband by the quantization step size.
FIGS. 15A through 15C are diagrams showing examples of pixel values of a square tile made up of 16×16 pixels which is obtained by carrying out an inverse transform from the wavelet coefficients after the quantization and inverse quantization and are derived in FIGS. 14A through 14G. The forward and reverse wavelet transforms are similar to those for the lossless independent-function wavelet transform, and a detailed description thereof will be omitted.
FIG. 16 is a diagram showing an example of the comparison result which is obtained by comparing the pixel values of the original image shown in FIG. 14A and the pixel values which are obtained by carrying out the lossy transform and inverse transform by the pixel expansion according to the mirroring method shown in FIG. 15C. In this example, unlike the case where the lossless transform and inverse transform (quantization step size 1) are carried out, errors are generated. Large errors are seen particularly near the tile boundary. This is the cause of the tile boundary becoming visually conspicuous at the low bit rate.
Conventionally, in order to eliminate the problems described above, a proposal has been made to use the image data of the adjacent tiles, that is, to mutually overlap the boundaries of the adjacent tiles (although the adjacent tile boundaries should not overlap according to the baseline system of the JPEG2000). In addition, the tile boundary which becomes visually conspicuous as a result is subjected to a so-called post-filtering process or the like by employing completely different image processing algorithms for each to make the tile boundary less conspicuous.