In order to store or transmit digital image data with high efficiency, it is necessary to compressively encode the digital image data. Under the existing circumstances, as a method for compressive coding digital image data, there is DCT (Discrete Cosine Transform) represented by JPEG (Joint Photographic Experts Group) or MPEG (Moving Picture Experts Group). Additionally, there are waveform coding methods such as sub-band coding, wavelet coding, and fractal coding.
Further, in order to eliminate redundant image data between images such as adjacent frames, inter-frame prediction using motion compensation is carried out. That is, pixel values of pixels composing the present frame are represented using difference values between these pixel values of the present frame and the pixel values of pixels composing the previous frame, and a difference image signal comprising the difference values is subjected to waveform coding.
In recent years, not only to improve the compression efficiency of an image signal but also to realize reproduction of the image signal in units of objects which compose an image of one frame, there has been proposed a coding method in which the image signals corresponding to the respective objects are compressively coded object by object for transmission. At the reproduction end, the coded image signal obtained by the above-mentioned coding method is subjected to a decoding process adapted to the coding method. That is, in the decoding process, the coded image signals corresponding to the respective objects are decoded, and the reproduced image signals of the respective objects obtained by the decoding process are composited to generate a reproduced composite image signal. Then, based on the reproduced composite image signal, an image corresponding to one frame and comprising the respective objects is displayed.
To use the above-mentioned coding method for coding an image signal object by object enables the user to arbitrarily combine the objects to generate a composite image at the reproduction end, whereby the user can edit a moving picture easily. Furthermore, it is possible to display a moving picture comprising only objects of relatively high importance without reproducing objects of relatively low importance, according to the congestion of the transmission line, the performance of the reproduction apparatus, and the preference of the viewer.
When coding an image signal corresponding to an object (i.e., an image having an arbitrary shape), waveform transformation performing signal processing adapted to the shape of the object (e.g., shape-adaptive DCT) is used, or waveform transformation is carried out after performing a padding process on the image signal.
In the coding method using the padding process, to be specific, an image signal forming an image space corresponding to each object (object region) is subjected to padding for replacing the pixel values of pixels in an ineffective region of the object region with padding values obtained by a predetermined method, and the image signal after the padding is subjected to the conventional 8.times.8 cosine transformation. The ineffective region described above is a region of the object region, outside the object, and this region comprises pixels having no pixel values for displaying the object. That is, an image signal corresponding to the ineffective region comprises only insignificant sample values. Further, the 8.times.8 cosine transformation is a waveform transformation process in which an image signal corresponding to the object region is subjected to cosine transformation in units of image spaces each comprising 8.times.8 pixels.
Furthermore, as a specific method for eliminating a redundant signal between images such as adjacent frames, there is proposed a method in which difference data between an image signal to be coded (image data corresponding to a target block) and the corresponding prediction signal (image data of a prediction block corresponding to the target block) is obtained as a prediction error signal (image data of a difference block) by using an image space comprising 16.times.16 pixels as a unit region. The prediction signal is an image signal corresponding to a prediction region (prediction block) obtained by motion compensation. The motion compensation is a process to detect a region comprising 16.times.16 pixels as a prediction region (prediction block), the region giving image data whose difference from the image data of the target block is minimized, in a frame on which coding or decoding has already been performed.
However, there is a case where the prediction region (prediction block) includes pixels having insignificant sample values (insignificant pixels). In this case, when a difference between the image data of the prediction block including insignificant pixels and the image data of the target block to be coded is obtained, this difference often becomes very large because the sample values of the insignificant pixels are not always the optimum prediction values in view of reduction of the difference.
In order to solve this problem, there is proposed a method comprising the steps of: subjecting the image data of the prediction block to a padding process for replacing the insignificant sample values with predetermined padding values; obtaining difference data between the image data of the prediction block after the padding process and the image data of the target block, as image data of a difference block (prediction error signal); and subjecting the image data of the difference block to transformation for coding. In this way, by performing the padding process on the image data of the prediction block, the image data of the difference block can be suppressed.
Further, as an alternative method of coding and decoding, there is a scalability process in which data for image display is divided into a plurality of layers in a hierarchy according to the resolution of the image, followed by coding and decoding.
By using the scalability process (hierarchical coding and hierarchical decoding), a coded image signal (coded data) transmitted as a bit stream includes coded data corresponding to a low resolution image and coded data corresponding to a high resolution image. Therefore, the low resolution image (object) can be reproduced by reading a part of the transmitted coded data and decoding the data, while the high resolution image (object) can be reproduced by reading all of the transmitted coded data and decoding the data.
To be specific, the hierarchical coding process comprises the steps of: generating prediction data (data of a prediction block) corresponding to data of the high resolution image (data of a target block) by using data of the low resolution image; subtracting the prediction data based on the low resolution image data from the high resolution image data to generate difference data (data of a difference block); and coding only the difference data.
When the hierarchical coding is carried out object by object, i.e., when image data corresponding to an image (object) having an arbitrary shape is divided into a plurality of layers in a hierarchy according to the resolution of the object to be coded, it is necessary to perform hierarchical coding on a signal including shape information which indicates the arbitrary shape of the object as well as hierarchical coding on a texture signal (luminance signal and chrominance signal) for hierarchical color display of the object. In other words, when performing object-by-object scalability coding, not only the texture signal of the object but also the signal including the shape information (shape signal or transparency signal) must be separated to a high resolution signal and a low resolution signal was coded. The shape signal is a binary signal having, as its values, a pixel value "0" indicating that the pixel is positioned outside the object and a pixel value "1" indicating that the pixel is positioned inside the object. Further, the transparency signal is a multi-valued signal having "0" as a pixel value corresponding to pixels positioned outside the object, and values other than "0" (non-zero) as pixel values corresponding to pixels positioned inside the object. With respect to the pixels inside the object, the transparencies of the respective pixels constituting the object are shown by the pixel values other than "0".
Further, also in the prediction coding in the above-described scalability process, a prediction block including sample values which are not significant (insignificant sample values) is subjected to the padding process for replacing the insignificant sample values with padding values which minimize the difference values (values of difference data) and, thereafter, differences of sample values between the target block and the prediction block are obtained. Then, a prediction error signal for the target block (image data of the difference block) is generated, and the prediction error signal is coded. In this way, the prediction error signal can be suppressed by performing padding on the prediction block.
By the way, in the conventional prediction coding, padding is also performed on reference image data corresponding to a reference image (reference frame) which has been processed previously to the image which is presently processed (present frame). In this padding process, amongst the blocks constituting the reference frame, boundary blocks including the boundary of the object are also padded.
Since such boundary block includes pixels having significant sample values and pixels having insignificant sample values, when the boundary block is padded, insignificant sample values are replaced with significant sample values by using significant sample values positioned at the boundary of the object so that no insignificant pixels are included in the boundary block.
In this padding process, when there are two padding values for a specific sample point (a specific pixel), i.e., a padding value which has been repeatedly used for padding of pixels in the horizontal pixel line including the specific pixel and a padding value which has been repeatedly used for padding of pixels in the vertical pixel line including the specific pixel, the average of the two padding values is used as a padding value for the specific pixel.
As described above, when image data of the prediction block are generated after padding the boundary blocks in the reference frame, the prediction error can be minimized even if the boundary slightly gets out of position.
Hence, also in the object-by-object scalability coding, in order to efficiently predict a high resolution image from a low resolution image, it is necessary to pad boundary blocks in a low resolution frame serving as a reference frame.
In brief, especially in the scalability coding, since the shape signal is divided into a plurality of layers in a hierarchy based on the resolution, the boundary between the inside and outside of the object is not identical between the low resolution shape image and the high resolution shape image. To be specific, it might occur that a target block to be coded is positioned inside the object in the high resolution frame while the target block is positioned outside the object in the corresponding low resolution frame. Such difference in object's boundaries between the high resolution frame and the low resolution frame is caused due to the fact that the transformation process to generate the low resolution shape signal from the high resolution shape signal is attended with a signal change which causes deformation of the shape, or that the information indicating the shape of the object changes due to compression of the shape signal. So, at the coding end, in order to increase the efficiency of prediction using the boundary blocks on the low resolution reference frame, these boundary blocks are subjected to padding.
However, if padding is performed on the boundary blocks included in the reference frame at the coding end as described above, padding must be performed on the boundary blocks included in the reference frame at the reproduction end as well. As the result, the number of padding processes increases at the reproduction end. Especially, there occurs a problem that many processes to detect the boundary of the arbitrary shape are needed. The time required for the padding process in reproduction increases in proportion to the number of the boundary blocks. Further, also in the object-by-object scalability coding, there occurs a problem that the time required for decoding increases due to the padding process for the boundary blocks on the low resolution reference frame at the reproduction end.