(1) Field of the Invention
The present invention relates to a digital image process, and is preferably directed to an image coding/decoding apparatus. The image coding apparatus which encodes image data at high efficiency and the image decoding apparatus preferably decodes the data encoded by the image coding apparatus.
(2) Disclosure of the Prior Art
In recent years, sub-band coding schemes have been proposed as the schemes for high-efficiency coding and decoding of images. Other than the sub-band coding schemes, block transform coding typically represented by discrete cosine transforms has been used as the scheme for high-efficiency coding and decoding of images. In block transform coding, an image is divided into blocks and each block is processed separately, by itself. This method has a problem in that block distortion occurs at block boundaries with neighboring blocks. In contrast to this, sub-band coding is an overlapping block transform, that is, the neighboring blocks overlap each other so that no block distortion occurs. This is known to be especially effective in image coding at low bit rates.
Typical sub-band coding schemes, a scheme in which the input image is analyzed through a sub-band filter bank to perform sub-band decomposition as shown in FIG. 1, are generally known as coding schemes of high efficiency. This is described in, for example, `Topics on Wavelet Transform` by Fujii and Nomura, IEICE Technical Report IE92-11 (1992).
FIG. 1 shows a result which was obtained by subjecting the input image signal to two levels of two-dimensional sub-band decomposition: HL1 represents a sub-band of high frequencies in the horizontal direction and low frequencies in the vertical direction; LH1 represents a sub-band of low frequencies in the horizontal direction and high frequencies in the vertical direction; and HH1 represents a sub-band of high-frequencies in the horizontal direction and high frequencies in the vertical direction. The region of low frequencies in the horizontal direction and low frequencies in the vertical direction, is further two-dimensionally divided into sub-bands to obtain band HL2, LH2, HH2, in a similar manner as above. In this case, LL2 represents a sub-band of low frequencies in the horizontal direction and low frequencies in the vertical direction. For the filter bank for sub-band decomposition, a filter bank for wavelet transform, a filter bank for sub-band decomposition/composition, or the like, can be used.
The image signal divided as in FIG. 1 has a hierarchical structure, and it is possible to change the bit allocation pixel to pixel for specified regions in accordance with the shape information which will be described later. In this way, the high-efficiency of coding becomes a characteristic of the sub-band coding scheme.
In recent years, many researchers in various locations have studied the methods of preferentially coding information which is of greater visual importance, by making use of the fact that the positions and orientations of the coefficients for sub-bands are preserved and using the hierarchical relations between sub-bands. An example of the method is described in `Embedded Image Coding Using Zerotrees of Wavelet Coefficients`, IEEE Transaction on Signal Processing, Vol. 41. No.12 (1993).
The above method uses the characteristic that in almost all the cases, the coefficients of different sub-bands at the same position decrease in their magnitudes and approach zero as the band enters higher frequencies. Code words represent the relations between the hierarchical layers of these coefficients. In order to improve the efficiency of coding, it is necessary to realize distribution of coefficients which satisfy the above characteristic. Further, since the important information is coded in advance, it is possible to capture the overall image at a relatively low bit rate.
FIG. 2 shows an image coding apparatus using sub-band decomposition and FIG. 3 shows an image decoding apparatus using sub-band decomposition. Here, 1501 designates a region extracting section for extracting specific regions in the input image. When, for instance, the region of the face on the image of the visual telephone, etc., is extracted, a method as described in `Real-Time Auto Face-Tracking System` (A study report 93-04-04 pp.13-16(1994) of The Institute of Image electronics engineers of Japan) can be used to extract the face region. Further, for an arbitrary object in a motion picture, it is possible to extract the region based on the outline of the object using the method as disclosed in Japanese Patent Application Laid-Open Hei 6 No.251,148. A reference numeral 1502 designates a sub-band decomposing section for performing sub-band decomposition through a 2-dimensional decomposing filter; 1503 designates a coding/quantizing section for coding or quantizing the data which underwent the sub-band decomposition; 1504 designates an entropy coding section which subjects the output from the coding/quantizing section to entropy coding; and 1505 designates a shape information coding section for coding the positions and shapes of the regions obtained by region extracting section 1501. The position and shape of a region will be referred to hereinbelow as shape information. An example of methods for coding shape information is chain coding in which the outline of the region is processed into codes. Designated at 1506 is a coded-data multiplexing section for multiplexing the coded data of the image and the coded data of the shape information. Here, for the process in coding/quantizing section 1503, it may perform quantization only, or may perform the DPCM as a coding scheme and then quantize the result. Alternatively, it may use vector quantization or other various schemes.
Next, in the image decoding apparatus using sub-band decomposition shown in FIG. 3, 1601 designates a coded-data demultiplexing section for demultiplexing the coded data into the coded data of the image and the coded data of the shape information; 1602 designates an entropy decoding section for subjecting the coded data of the image to entropy decoding; 1603 designates a shape information decoding section for decoding the coded data of the shape information; 1604 an inverse-quantizing/decoding section for subjecting the output from entropy decoding section 1602, to inverse quantization or decoding in the scheme corresponding to coding/quantizing section 1503; and 1605 designates a sub-band composing section for composing the sub-bands of the signals which have been inverse quantized and/or decoded, into a predetermined band. Here, coding/quantizing section 1503 and inverse-quantizing/decoding section 1604 corresponding thereto can selectively perform a scheme that is adapted to the characteristic of each of the sub-bands LL, LH, HL and HH, or can perform the same scheme for all of them.
Japanese Patent Application Laid-Open Hei 6 No.350,989 introduces a technique which allocates a greater quantity of data to a region in which an object exists (a region to be determined as important), for images which were previously decomposed into sub-bands. FIG. 4 is an example from region extracting section 1501, in which the background region and the object region are extracted from the input image. FIG. 5 shows the result of the operations in which the input image is decomposed into sub-band images corresponding to FIG. 1 by sub-band decomposing section 1502, and then each sub-band is quantized in accordance with the shape information in FIG. 4. All the regions are quantized for the sub-bands of low frequencies (LL2, HL2, LH2, HH2) on which visually important information concentrate, though in some regions, the quantization parameter may be varied. For example, the object region may be subject to a finer quantizing process than the background region, so as to achieve a more visually efficient bit allocation. For the sub-bands (HL1, LH1, HH1) for high frequencies, data is allotted to only the object region so as to perform visually efficient coding, again.
In the image coding technique aiming at a very low bit rate which has been actively studied in recent years, it is difficult to obtain a visually high quality image by only allocating different qualities of information in each of the sub-bands, of the sub-band images, by using different coding schemes. This is because in the conventional sub-band decomposing scheme, if an edge exists in the original image, high-frequency components having large amplitudes arise in the sub-band images at the locations corresponding to the edge in the original image. Therefore the information for the edge portion increases. In addition, since the allocation of information for the higher frequency components is in general less than that needed for the lower frequency components, distortion occurs around the edge in the reproduced image.
The sub-band decomposition is an overlap transform wherein the neighboring pixels in the reproduced image overlap each other. In the method in which a greater quantity of information is allotted to the sub-band images corresponding to the object region while a lower quantity of information is allotted to the sub-band images corresponding to the background region, distortion occurs again in the portion where the object adjoins the background region. For example, in the case where an original image having the shape information shown in FIG. 6(a) is coded, if a lower number of bits are allotted to the background region and a greater number of bits are allotted to the object region, distortion occurs as shown in FIG. 6(b), in the boundary portion where the object adjoins the background, thus giving a bad visual impression.
In order to solve this problem, a method by which the background and the object are totally separated can be considered. Up to now, however the sub-band decomposition/composition could be performed only for a rectangular region, and it was impossible to perform sub-band decomposition only for a specific region using shape information of an arbitrary shape.
When an image of this kind is divided into sub-bands, all obtained coefficients can be coded in some way. All these coefficients are essentially information needed in order to reproduce the original image, but all the coefficients need not be coded in order to obtain the required quality of image, especially for the low bit rate, since importance of the original information itself differs in parts.
When a motion picture is coded, in order to reduce redundancy across time, a typical technique is used in which a predictive image is formed using motion vectors and a reference image previously obtained, and its differential from the image to be coded is divided into sub-bands and then coded. Also in this case, it is not necessary to code all the coefficients obtained due to a similar reason to the above.