In order to store or transmit digital image data with efficiency, it is necessary to compressively encode the digital image data. As a typical method for compressively coding digital image data, there is discrete cosine transformation (DCT) represented by JPEG (Joint Photographic Experts Group) or MPEG (Moving Picture Experts Group). Besides, there are waveform coding methods such as sub-band coding, wavelet coding, and fractal coding.
Further, in order to eliminate redundant image data between adjacent frames (images), inter-frame predictive coding using motion compensation is carried out. To be specific, a pixel value (pixel data) of a pixel in the present frame is expressed by using a difference between this pixel value and a pixel value (pixel data) of a pixel in the previous frame, and this difference value (difference data) is subjected to waveform coding.
A brief description will be given of an image coding method and an image decoding method, based on MPEG1 of the like, including DCT with motion compensation.
In the image coding method, initially, input image data corresponding to one frame to be coded (image space corresponding to one frame) is divided into image data corresponding to a plurality of macroblocks (image spaces each having the size of 16.times.16 pixels), and the image data are compressively coded macroblock by macroblock. To be specific, the image data corresponding to one macroblock is further divided into image data corresponding to four subblocks (image spaces each having the size of 8.times.8 pixels), and the image data are subjected to DCT and quantization, subblock by subblock, to generate quantized coefficients. This coding process is called "intra-frame coding".
At the receiving end, the quantized coefficients corresponding to the respective subblocks are subjected to inverse quantization and inverse DCT to reproduce image data corresponding to each macroblock.
Meanwhile, there is an image data coding method called "intra-framing coding". In this coding method, initially, from a frame (reference frame) which is temporally adjacent to a frame (target frame) including a target macroblock to be subjected to coding, an area comprising 16.times.16 pixels and having a smallest error in image data from the target macroblock is detected as a prediction macroblock, by a motion detecting method such as block matching. At this time, displacement data indicating a displacement of the prediction macroblock from the target macroblock is detected as a motion vector. Then, image data of the prediction macroblock is obtained from image data of a past frame (i.e., a frame which has already been coded) by motion compensation based on the detected motion vector.
Next, a difference in image data between the target macroblock and the prediction macroblock is obtained as difference data, and the difference data is subjected to DCT in units of 8.times.8 pixels to obtain DCT coefficients, and further, the DCT coefficients are quantized to obtain quantized coefficients.
Then, the quantized coefficients and the motion vector are transmitted or stored. This coding process is called "inter-frame coding".
The inter-frame coding has two prediction modes as follows: a prediction mode in which image data of a target macroblock included in a frame which is presently processed (present frame) is predicted only from image data of a previous frame which is previous to the present frame in the display order; and a prediction mode in which image data of a target macroblock is predicted from image data of two frames which are previous and subsequent to the present frame in the display order. The former is called "forward prediction mode" and the latter is called "bidirectional prediction mode".
At the receiving end, the quantized coefficients are restored to the difference data in the space domain by inverse quantization and inverse DCT. Thereafter, image data of the prediction macroblocks is obtained by motion compensation based on the motion vector, and the difference data and the image data of the prediction macroblock are added to reproduce image data of the target macroblock.
In order to increase the prediction efficiency, in other words, in order to minimize the difference (prediction error) between the image data of the target macroblock and the image data of the prediction macroblock, the motion compensation, i.e., the process to obtain the image data of the prediction macroblock in accordance with the motion vector, is performed with precision of 1/2 pixel.
However, since the input image data is composed of pixel values (pixel data) in units of while pixels, prediction data of 1/2 pixel precision must be generated by interpolation of pixel value between adjacent pixels within the reference frame. Further, when generating the prediction data of 1/2 pixel precision, the value of the motion vector has 0.5 pixel precision.
Although it is assumed that the quantization, DCT and the like are performed in units of 8.times.8 pixels in the above description, the processing unit is not restricted to 8.times.8 pixels. For example, those processes may be performed in units of 7.times.1 pixels, Hence, generally, the quantization, DCT, and the like can be performed in units of g.times.h pixels (g,h=positive integers). Further, although the macroblock comprises 16.times.16 pixels in the above description, the macroblock may comprise M.times.N pixels (M,N=positive integers), generally.
However, in the following description, for simplification, both the macroblock and the subblock are regarded as image spaces each comprising K.times.K pixels (K=positive integer). That is, it is premised that the coding, decoding, quantization, inverse quantization, DCT, and inverse DCT are performed in units of K.times.K pixels. Therefore, hereinafter a macroblock is simply refereed to as "a block".
FIG. 17 is a flowchart for explaining process steps in the conventional image decoding method including motion compensation.
First of all, coded image data which has been obtained by compressively coding image data by the above-mentioned coding method and then variable-length coding the compressed data, is input block by block (step S71).
Next, the coded image data corresponding to a target block is analyzed to be separated into quantized DCT coefficients (quantized coefficients), quantization scale, and motion vector, and these are respectively converted from variable-length codes to corresponding numerical values to be output (step S72).
Thereafter, the quantized coefficients are subjected to inverse quantization and inverse DCT in units of K.times.K pixels, and difference data in a space domain corresponding to the target block and comprising KK pieces of values (pixel data) are output (step S73).
Next, prediction data for the target block is generated from image data of the reference frame by motion compensation. When generating prediction data of 1/2 pixel precision, reference pixel values more than K.times.K are obtained from the reference frame.
That is, in the conventional decoding method, prediction data having 1/2 pixel precision in both the horizontal and vertical directions is generated as follows. Initially, K'.times.K' pixels are obtained from the position of a pixel specified according to the integer parts of the values of the motion vector in the reference frame (step S74), and the K'.times.K' pixel values so obtained are subjected to interpolation, such as bilinear interpolation, to generate prediction data of 1/2 pixel precision (step S75). In this method, K'=K+1.
Then, the prediction data is added to the difference data to generate reproduced image data of the target block (step S76).
Thereafter, it is decided whether or not the target block is the last block in the last frame among the frames composing the image (step S77). Then the target block is not the last block, the processes in steps S71.about.S77 are carried out again. When the target block is the last block, decoding of the coded image data is ended.
Next, the pixel value interpolation process in steps S74 and S75 will be described in more detail by using FIGS. 18(a).about.18(c).
For simplification, it is assumed that the unit of decoding (K.times.K pixels) is 8.times.8 pixels, and the motion vector MVt of the target block has, as its values, positional vectors (a,b) on the coordinates of the present frame and the previous frame (reference frame) which are image spaces of the same size. The value a is composed of an integer part x and a fraction part u, and the numerical value b is composed of an integer part y and a fraction part v. Further, since the horizontal and vertical components of the motion vector MVt of the target block have 1/2 pixel precision, the fraction parts u and v can take 0 or 5.
To generate prediction data specified by the motion vector MVt, the value (a,b) of the motion vector are added to the coordinates (a0,b0) of the upper-left corner Pt0 of the target block Tb on the target frame Tf (refer to FIG. 18(a)), and the coordinates (a0+a,b0+b) of the reference point Pt1, which are obtained as the result of the addition, are regarded as the coordinates of the upper-left corner Py of the prediction block Yb in the reference frame SF (refer to FIGS. 18(b) and 18(c)).
Hereinafter, a description is given of the case where the integer parts x and y of the motion vector MVt are positive integers, and the fraction parts u and v are 5.
Initially, the positive parts (x,y) of the motion vector are added to the coordinates (a0,b0) of the upper-left corner Pt0 of the target block Tb to generate the coordinates (a0+x,b0+y) of the reference position Pt1 on the target frame TF. Next, by using, as a reference, the positions Ps on the reference frame SF which corresponds to the reference position Pt1 on the target frame TF, a reference region Sr which comprises (K+1).times.(K+1) pixels and has the position Ps at the upper-left corner, is obtained. Since K=8, the reference region SR includes 9.times.9 original pixels (pixels originally included in the reference frame) which are shown by .largecircle. in FIG. 18(c).
Further, since both of the fraction parts u and v of the motion vector Mvt are 5, the reference region Sr needs interpolation pixels (fractional pixels) shown by X, which are arranged among the original pixels, at intervals of 0.5 pixel, along the horizontal and vertical directions.
So, by using two-dimensional interpolation for averaging the pixel values of four original pixels 806.about.809 positioned at apexes of a rectangle, the pixel value of an interpolation pixel 801 positioned in the center of the rectangle is generated. In this way, K.times.K (K=8) pieces of interpolation pixels are generated in the reference region Sr, and the pixel values of these interpolation pixels are obtained as prediction data for the target block Tb (i.e., pixel data of the prediction block Yb specified by the motion vector Mvt of fractional pixel precision). In this case, the tap length of a filter used for the interpolation is 2 in both of the horizontal and vertical directions. Generally, the number of pixels in the horizontal and vertical directions in the reference region, which pixels are required for interpolation, is represented by K+(filter's tap length)/2.
Further, when only one of the fraction parts u and v of the motion vector MVt is 5, the pixel values of interpolation pixels are obtained by one-dimensional interpolation (bilinear interpolation). To be specific, the pixel value of one interpolation pixel is generated from the pixel values of two adjacent original pixels. In this case, only the number of pixels in one of the horizontal and vertical directions of the reference region Sr becomes K+(filter's tap length)/2, while the number of pixels in the other direction becomes K.
In the above-described motion compensation including generation of pixel values of interpolation pixels, high-speed processing and high-speed access to memory are demanded.
That is, in order to generate pixel data of a prediction block comprising K.times.K pixels and having the same size as a block being the unit of decoding or coding, the pixel value (pixel data) of K'.times.K' pixels (K'=K+(filter's tap length)/2) must be obtained and, therefore, it is necessary to achieve high-speed access to the memory or to increase the access band width of the memory (i.e., the bit number in parallel access wherein plural bis in the memory are simultaneously accessed).
Further, since the interpolation is performed by using K'.times.K' pixel values larger than the pixel number (K.times.K) as the unit of decoding or coding, the quantity of operations in these processes increases.
Meanwhile, besides the image processing technique based on MPEG1 as described above, there has recently been proposed a compressive coding method as an image processing technique based on MPEG4. In the coding method, image data corresponding to a plurality of objects composing an image of one frame are compressively coded object by object for transmission, to improve the compression efficiency of the image data and to realize object by object reproduction of the image data.
Coded image data obtained by this coding method are subjected to a decoding process adapted to the coding method, at the reproduction end. More specifically, in the decoding process, the coded image data corresponding to the respective objects are decoded, and the resultant decoded image data corresponding to the respective objects are composited to generate reproduced image data. Then, the image corresponding to one frame comprising the respective objects is displayed according to the reproduced image data.
As described above, the object-by-object coding method enables the reproduction (decoding) end to generate a composite image by combining optional objects as desired, whereby editing a moving picture is facilitated. Further, it is possible to display a moving picture comprising highly-important objects without reproducing relatively unimportant objects, according to the congestion of the transmission line, the performance of reproduction apparatus, and the preference of the viewer.
However, even the image processing technique based on MPEG4 has the same problem as that of the image processing technique based on MPEG1, which processes an image of one frame without dividing it into image data corresponding to objects.