1. Field of the Invention
This invention relates to an image encoding apparatus for compressing the image data to reduce the data quantity.
2. Description of the Related Art
The larger number of gradation and higher resolution are required to obtain the higher quality digital image. The capacity of an image is represented by “number of pixel×number of gradation bits” and is very huge. To reduce the image storage cost, or to reduce the image transmission time, the image compression has been employed.
Heretofore, the image compression method that utilizes the human visual characteristic has been employed. As the visual characteristic, a reference 1 (Peter G. J. Barten, “Evaluation of subjective image quality with the square-root integral method”, J. Opt. Soc. Am. A/Vol. 7, No. 10/October 1990, pp. 2024-2031) is exemplified.
In this reference, an approximate expression of the visual frequency characteristic is shown. FIG. 27 is a graph that represents the approximate expression described in the reference. The axis of abscissa of this graph represents the cycle for every degree angle of visibility and the axis of ordinate represents the contrast sensitivity. According to the graph, the contrast sensitivity decreases in the high frequency region. In other words, the perception is poor in the high frequency region even though the number of gradation is reduced. A compression method that utilizes this nature has been employed heretofore.
Examples of the compression method that utilizes the visual characteristic is shown hereunder. Two categories are described hereunder. In the first category, the compression is realized by reducing the number of pixels of the image or by reducing the number of gradations directly (conventional example 1-1 and conventional example 1-2). In the second category, the compression is implemented by a method in which the input image is subjected to orthogonal transformation and quantization of the transformation coefficient is matched with the visual characteristic (conventional example 2).
At first, the first category example will be described. Because the capacity of an image is represented by (number of pixels×number of gradation bits)the image compression is realized by reducing the number of pixels or by reducing the number of gradation bits. In the region of characters or line drawing in the image, the large number of gradation is not necessary because high frequency component is predominant in the region. It is possible to compress the image by reducing the number of gradation bits in such a region.
In the case where the frequency component of the input image is low, the number of gradations cannot be reduced. However, because the frequency band of the image is small, it is possible to compress the image without aliasing. In detail, because there are some portions where the high frequency component is not predominant on the natural image portion such as a photograph on the image, it is possible to compress the image by reducing the number of pixels on such a region.
In summary, the low frequency region can be compressed by reducing the number of pixels. On the other hand, the high frequency region can be compressed by reducing the number of gradations.
Japanese Published Examined Patent Application No. Hei 6-40661 is exemplified as an example of compression method described hereinabove. Japanese Published Examined Patent Application No. Hei 6-40661 is referred to as the conventional example 1-1 hereinafter. In the conventional example 1-1, an input image having 6 bits per pixel is divided into 3×3 pixel blocks. Next, whether this pixel block is to be binary-encoded or multi-level encoded is checked.
As shown in FIG. 25, the average value of 9 pixels in 3×3 block is calculated in the case of multi-level encoding to be 1-pixel information having 6 bits. On the other hand, 9 pixels in 3×3 block are binarized respectively to form 9 pixel information having 1 bit in the case of binary encoding. 3×3 block information is converted to 1-pixel information having 6 bits or 9-pixel information having 1 bit. 1-bit information for indicating whether it is a multi-level block or binary block is added to every block in encoding. In summary, 1 block of 3×3 pixel is encoded into 10 bits at worst.
Japanese Published Unexamined Patent Application No. Hei 8-317220 is exemplified as another example. Japanese Published Unexamined Patent Application No. Hei 8-317220 is referred to as the conventional example 1-2 hereinafter. In the conventional example 1-2, for example, an input image is divided into 2×2 pixel blocks. Next, whether this pixel block is to be binary-encoded or multi-level encoded is checked. In the case of multi-level encoding, the average value of 4 pixels in 2×2 block is encoded. On the other hand, in the case of binary encoding, the number of black pixels in 2×2 pixel block is encoded.
As an example, in the case of multi-level encoding, the number of average levels is quantized into 12 levels ranging from 0 to 11. On the other hand, in the case of binary encoding, the number of black pixels is any one of 0, 1, 2, 3, or 4. When the number of black pixels is 0, the same state as that of the multi-level 0 is represented. When the number of black pixels is 4, the same state as that of the multi-level 11 is represented. Accordingly, as shown in FIG. 26, it is possible to encode at 15 levels in the total cases of multi-level encoding and binary encoding.
The conventional example 1-2 is different from the conventional example 1-1 in that the multi-level encoding and binary encoding are involved in the same encoding space and the multi-level/binary determination bit is thereby unnecessary. However, the conventional example 1-2 is similar to the conventional example 1-1 in that it is necessary to check to see for every block whether multi-level encoding is to be employed or binary encoding is to be employed for encoding.
As described hereinabove, in the conventional example 1-1 and the conventional example 1-2, whether an input image block is the low frequency information that requires the number of gradations or the high frequency information that does not require the number of gradations is checked. Next, in the case of the low frequency information that requires the number of gradations, the average value of the pixel value in the block is encoded. On the other hand, in the case of the high frequency information that does not require the number of gradations, the number of gradation of the pixel value in the block is reduced (in the abovementioned example, the number of gradations is binary) to thereby increase the compression rate.
Next, the second compression method example that utilizes the visual characteristic (conventional example 2) will be described. Currently, still image encoding international standard JPEG (ITU-T recommendation T-81) system has been most widely used as the still image encoding system. “International Standard of Multi-Media Coding” (edited by H. Yasuda, Maruzen Co., Ltd., 1991, pp. 16-47) is exemplified as a reference for describing the outline of JPEG system. JPEG system is referred to as the conventional example 2 hereinafter.
In JPEG system, an input image is divided into 8×8 pixel blocks, and each block is subjected to DCT (discrete cosine transformation) to obtain 8×8 DCT coefficients. Furthermore, 8×8 DCT coefficients are quantized for compression. 8×8 quantized step widths (quantization table) are not specified in the standard, and can be set without restriction. Therefore, the quantization table may be designed with consideration of the visual characteristic to thereby implement encoding with a higher compression rate.
For example, a reference (Saito et. al. “Optimal Quantization for DCT Coefficients in Variable Length Coding Considering the Human Perception” Technical Report of IEICE, IE 90-101, 1990, pp. 39-46) is exemplified as an exemplary design of the quantization table. This reference describes a method for designing a quantization table with consideration of the visual characteristic in the encoding system in which DCT (discrete cosine transformation) is used.
Because DCT is used in JPEG system, the system described in this reference can be used when a quantization table according to JPEG system is designed. Each quantization step width (quantization table) of 8×8 DCT coefficients is designed so that the visual strain is minimized for the given code quantity with consideration of the visual characteristic. The visual characteristic to be addressed is the visual frequency characteristic that is similar to that shown in the reference 1.
Herein, the visual frequency characteristic is calculated under the assumption H(f)=2.46 (0.1+0.25 f) exp(−0.25 f). Wherein, f in the abovementioned equation denotes the number of cycles per unit angle of visibility (cycle/degree).
However, the abovementioned examples are involved in the problem as described hereunder.
(Problem 1) the case where the portion that requires the resolution and the portion that requires the number of gradations are mixed in a block cannot be processed.
(Problem 2) the case where the gradation level represents the edge position results in the image strain.
The problem 1 and the problem 2 will be described in detail hereinafter. At first, the problem 1 will be described. A block having a character or line drawing that is overwritten on a picture background as shown in FIG. 28 is addressed. The picture is the data that requires the number of gradations and the character or line drawing is the data that requires the resolution. In the case of the block as described hereinabove, if the block is determined as the multi-level information and the average value of the block is encoded, then the resolution information of the character line drawing is lost.
On the other hand, if the block is determined as the binary (or a small number) block and the pixel value in the block is quantized, then the number of gradations of the picture is lost. To avoid the loss, the block size of the determination unit should be minimized. At worst, 1 pixel per block is necessary. The small block size results in reduced compression rate. In other words, the conventional example 1-1 and the conventional example 1-2 cannot deal with the case in which the portion that requires the resolution and the portion that requires the number of gradations are mixed in a block.
Next, the problem 2 will be described. Gray font is exemplified as an example in which the gradation level represents the edge position. Usually, an imager that is the software for generating a printing image generates the character line drawing with binary image. In this case, the character cannot be represented with the resolution equal to or higher than the pixel resolution.
An image generated with binary is shown in FIG. 29. A rectangle in FIG. 29 corresponds to one pixel. As shown in FIG. 29, jaggies are caused on the slant line. Otherwise, the line width and the edge position are limited to the integral multiple of the resolution of the pixel. The limitation is the problem. To solve the problem, in the case of the gray font, the character image quality is improved without increase of the pixel resolution by generating the character image with multi-level.
As shown in FIG. 30, the pixel is expressed with multi-level. The pixel expressed with multi-level is complemented with the visual characteristic (or by controlling the marking position correspondingly to the pixel value), and a slant line or a line of the width and position other than the pixel resolution integral multiple can be expressed as shown in the lower diagram of FIG. 30.
As described hereinabove, in the gray font, the pixel value provides the information for controlling the edge position. The gray font is an example that generates artificially the multi-gradation image so that the gradation level represents the edge position. The image as described hereinabove exists in the natural image. The case where a character image is scanned in with multi-level likely provides the same effect as the artificially generated gray font. The case where an input image has an edge likely provides the same effect that the gradation level represents the edge position as in the abovementioned case though the magnitude of the effect on the visual sensation is different each other.
In the case of the image that the number of gradation represents the edge position, if the number of gradation is changed, then the edge position is resultantly shifted, and the shifted position is conceived as the image strain. A reference (edited by Oyama et. al. “New Edition, Sensation/Perception Psychology Handbook” Seishin Shobo, 1994, pp. 557-558) describes the perception threshold value of the edge position that is referred to as vernier acuity. According to this reference, the minimum deviation that is the perceivable deviation between two lines as shown in FIG. 31, namely the threshold value of the vernier acuity, corresponds to 2″ angle of visibility. The deviation of the edge position should be suppressed within 2″ angle of visibility.
In the case of the conventional example 1-1 and the conventional example 1-2, if the block is determined to be multi-level, then the resolution is changed to the resolution lower than the pixel resolution. At that time, the edge position information higher than the pixel resolution as described hereinabove is lost. If the block is determined to be binary (diminished level)then the pixel is quantized to cause the edge position error. Therefore, the image strain is caused anyway.
In the case of the conventional example 2, because the quantization is performed in the orthogonal transformation region, it is difficult to guarantee the error magnitude of individual pixel values. Though the quantization table designing method with consideration of the visual frequency characteristic has been known, the quantization table designing method that can guarantee the edge position with consideration of the vernier acuity has not been known.
Therefore, the conventional example 2 cannot guarantee the edge position. The conventional example 2 could guarantee the edge position by sufficiently minimizing the quantization step width only when the compression rate is suppressed low.
The present invention has been accomplished to solve the abovementioned problem, and provides an image encoding apparatus that is capable of guaranteeing the edge position specified by perceptible resolution, namely vernier acuity, and also guaranteeing the perceptible number of gradations, and capable of high efficiency encoding, and provides an image decoding apparatus that is capable of decoding the encoded code.