In order to store/transmit digital image information with good efficiency, it is required to perform compressive coding to the digital image information and, in the present circumstances, there are Discrete Cosine Transformation (DCT) which is representative in JPEG (Joint Photographic Coding Experts Group) and MPEG (Moving Picture Experts Group), and waveform coding methods such as subband, wavelet and fractal, as methods for compressively coding the digital image information.
As a method for removing redundant image information between adjacent pictures such as frames, there is a method of performing inter-picture prediction using motion compensation, that is, representing pixel values of pixels of the present picture using a difference between these pixel values and pixel values of pixels of the previous picture, and performing waveform coding to this difference signal.
Recently, in order both to improve compression efficiency and to regenerate an image signal for each of regions constituting one picture and corresponding to individual objects (hereinafter referred to as image spaces), a method of compressively coding an image signal object by object and transmitting the resulting signal has been made practicable. In this method, at the side of regeneration, coded image signals corresponding to individual objects are decoded, images of the individual objects regenerated by decoding are composed, thereby displaying an image corresponding to one picture. In this way, object-by-object coding of an image signal enables images of objects to be displayed to be freely combined and composed, whereby moving pictures can be easily reedited. Further, in this method, according to the congestion conditions of channels, the performance of a regenerative apparatus, and the viewer's taste, moving pictures can be displayed without regenerating images of relatively unimportant objects.
More specifically, as methods for coding an image signal for forming image space including an image of an object having the arbitrary shape (hereinafter referred to as an object image), there are conventionally a coding method using a transformation method adaptive for its shape (for example, shape adaptive discrete cosine transformation), and a coding method of padding pixel values of pixels constituting an invalid region of an image space (that is, an outside region of an object image) by a specified method, and then performing cosine transformation to an image signal comprising plural pixel values corresponding to the image space, for each of unit regions into which the image space is divided (blocks comprising 8.times.8 pixels).
As a specific method for removing a redundant signal between pictures such as frames, there is a method of using macroblocks comprising 16.times.16 pixels as unit regions, obtaining a difference between an image signal corresponding to a target macroblock as a target of coding processing and its prediction signal. Herein, the prediction signal is an image signal corresponding to a prediction region which is obtained by motion compensation. The motion compensation is processing of detecting a region comprising 16.times.16 pixels, the region providing an image signal having the smallest difference with the image signal of the target macroblock, in a picture which has been subjected to coding processing or decoding processing, as the prediction region.
However, when this prediction region is located at the boundary of the object image in the image space, the prediction region includes pixels with insignificant (undefined) sample values (pixel values). Therefore, concerning this prediction region, the corresponding image signal is subjected to padding processing in which the insignificant sample values are replaced with significant pseudo sample values, and a difference between the prediction signal which has been subjected to the padding processing and the image signal of the target macroblock is obtained as a prediction error signal (difference signal), and converting processing for coding is performed to the difference signal. Herein, the padding processing is executed to the prediction region for the purpose of suppressing the difference signal, i.e., reducing code quantity when the difference signal is coded.
In addition, there is a hierarchical processing method, which is called scalability, in which image signals corresponding to plural hierarchies having different resolutions are used as an image signal corresponding to each object, i.e., an image signal for forming image space including an object image, and the image signals of the respective hierarchies are coded to be decoded.
In such a hierarchical processing method, part of a bit stream that is extracted from transmitted data (coded bit stream) is decoded to regenerate an object image having low resolution, and all data is decoded to regenerate an object image having high resolution.
In the hierarchical coding (scalability coding) processing, an image signal corresponding to a high-resolution image (high-resolution image signal) is coded on the basis of an image signal corresponding to a low-resolution image (low-resolution image signal). That is, a high-resolution image signal corresponding to a target block as a target of coding processing is predicted using a corresponding low-resolution image signal to generate a predicted image signal, and a difference signal obtained by subtracting the predicted image signal from the high-resolution image signal of the target block, is coded.
When an image signal is coded object by object, a shape signal indicating the arbitrary shape of an object, together with a texture signal including a luminance signal and a color-difference signal for gradation color display of an object image, is coded as the image signal. Therefore, in performing scalability coding to the image signal corresponding to each object, it is necessary that not only the texture signal but the shape signal is separated into a high-resolution signal and a low-resolution signal to be hierarchically coded.
In this object-by-object scalability coding, it is required to predict a high-resolution texture signal from a low-resolution texture signal with good efficiency. Since, especially, a low-resolution texture signal corresponding to a macroblock located at the boundary of an object includes insignificant (undefined) sample values (pixel values), if this low-resolution texture signal is used as it is to generate a prediction signal and the prediction signal is subtracted from a high-resolution texture signal of a target macroblock as a target of coding processing, difference pixel values in a difference signal corresponding to the pixels located at the boundary of the object become large values, failing to code the high-resolution texture signal with good efficiency.
Further, since a shape signal is separated correspondingly to plural hierarchies having different resolutions, more specifically, a high-resolution hierarchy and a low-resolution hierarchy, there occurs disagreement of the boundaries indicating the inside and the outside of the object (outlines of the object), of the object shape obtained from the low-resolution shape signal and the object shape obtained from the high-resolution shape signal. This is because, by down-sampling processing in generating the low-resolution shape signal from the high-resolution shape signal, the shape of the object image of the low-resolution shape signal is transformed with respect to the shape of the object image of the high-resolution shape signal, and the object shapes of both the shape signals also are transformed by compression processing to the high-resolution shape signal and the low-resolution shape signal.
In this case, while a specified macroblock region in image space formed by a high-resolution texture signal is included in an object image, the specified macroblock region in image space formed by a low-resolution texture signal is wholly located outside the object image. In such a condition, even if a prediction signal of the high-resolution texture signal on the basis of the low-resolution texture signal is used, it is impossible to suppress a difference signal between the high-resolution texture signal and its prediction signal with good efficiency.
The present invention is subjected to solving the above-mentioned problems, and has an object to provide a digital image coding method and a digital image coding apparatus in which, on the basis of an image signal for forming image space including an image of an object having the arbitrary shape, image signals corresponding to plural hierarchies having different resolutions are generated and, in hierarchical coding processing of performing difference coding of a high-resolution image signal using a low-resolution image signal, to each of unit regions, the image signal of the unit region located at the boundary of the object can be compressed with good coding efficiency.
Another object of the present invention is to provide a digital image decoding method and a digital image decoding apparatus in which a coded image signal obtained by hierarchical coding processing that can compress an image signal for forming image space including an image of an object with good coding efficiency, can be accurately regenerated by corresponding hierarchical decoding processing.
Still another of the present invention is to provide data recording media containing programs for realizing the hierarchical coding processing in the digital image coding method and the hierarchical decoding processing by the digital image decoding method, using computers.