1. Field of the Invention
The present invention relates to an image encoding apparatus, an image decoding apparatus, etc. that are used to encode and decode moving image data, for example, in videophones and the like.
2. Related Art of the Invention
In recent years, to transmit or record moving image data at low bit rates in videophone and videoconferencing systems, image compression techniques have been required to provide higher compression ratios for higher efficiency. These techniques have been standardized, for example, as MPEG 1/2 by ISO/IEC and H.261 and H.263 by ITU-T.
FIG. 17 is a block diagram of a prior art image encoding apparatus, and FIG. 18 is an image decoding apparatus corresponding to the encoding apparatus, both implementing the ITU-T standard H.263.
Referring to FIG. 17, when intraframe-coding an input image, an intraframe/interframe coding selection switch 11 is switched to the upper position. The input image, after discrete-cosine-transformed by a DCT 5, is quantized by a quantizer 6 and variable-length encoded by a Huffman encoder 12, and then multiplexed by a MUX (multiplexer) 14 and output as a bit stream. In this process, part of the signal quantized by the quantizer 6 is inverse-quantized by an inverse quantizer 7 and inverse-discrete-cosine-transformed by an inverse DCT 8, and then stored as a reference image in a frame delay memory 3 via an adder 9. The illustrated example shows an advanced motion compensation mode, that is, the encode unit is an 8.times.8 block.
On the other hand, when interframe-coding an input image, the intraframe/interframe coding selection switch 11 is switched to the lower position. The input image is compared, in a motion estimator 1, with the reference image stored in the frame delay memory 3, and a motion vector is detected for each block and is stored in a motion vector memory 2. Based on the motion vector, a motion compensator 4 searches the reference image for each block for a region corresponding to the block and thereby creates a predicted image from the reference image. That is, motion compensation is performed relative to the reference image. The residual between the thus created predicted image and the input image is obtained using a subtractor 10. The resulting residual signal is encoded through the DCT 5 and quantizer 6 and variable-length encoded by the Huffman encoder 12, and then multiplexed by the MUX 14 and output as a bit stream. In this process, the quantized signal is inverse-quantized by the inverse quantizer 7 and inverse-discrete-cosine-transformed by the inverse DCT 8, and then added in the adder 9 to the predicted image output from the motion compensator 4, and stored in the frame delay memory 3 as a reference image. The motion vectors obtained by the motion estimator 1 are encoded by a motion vector encoder 13, and output after being multiplexed by the MUX 14 with the residual signal output from the Huffman encoder 12.
Referring next to FIG. 18, when the encoded bit stream output from the above image encoding apparatus is input to the image decoding apparatus, the bit stream is demultiplexed by a DMUX (demultiplexer) 15 into the encoded image signal and encoded motion vector signal. The image signal is decoded by a Huffman decoder 16, and further decoded by an inverse quantizer 7 and inverse DCT 8. At this time, if the image signal is an intraframe-encoded signal, an intraframe/interframe coding selection switch 18 is connected to the upper position so that the image signal is output directly as an output image. The output image is also stored in a frame delay memory 3 as a reference image.
On the other hand, the motion vectors demultiplexed by the DMUX 15 are decoded by a motion vector decoder 17 and stored in a motion vector memory 2. Based on these motion vectors, a motion compensator 4 creates a predicted image from the reference image fed from the frame delay memory 3, and the thus created predicted image is added in an adder 9 to the image signal output from the inverse DCT 8. At this time, if the image signal is an interframe-encoded signal (that is, the residual signal), the intraframe/interframe coding selection switch 18 is connected to the lower position so that the sum signal is output as an output image.
Here, as shown in FIG. 19(a), for each of the 8.times.8 blocks into which the input image is divided, the motion estimator 1 searches the reference image for a region having the highest correlation with the target block, and obtains a motion vector for that block by detecting its displacement. At this time, as shown in FIG. 19(b), for example, there can arise cases where the regions in the reference image which correspond to blocks T, B, L, and R surrounding a certain block C overlap the region corresponding to the block C or are separated by a certain distance from that region. As a result, when a predicted image is constructed from these searched regions, overlapping or discontinuous portions occur in the image, resulting in a degradation in image quality. To prevent such image quality degradation, it has been practiced to correct the predicted image by using motion vectors for the blocks horizontally and vertically adjacent to each target block and thus considering pixels in the neighborhood of the target block. More specifically, to obtain a prediction value for the block C, the regions in the reference image corresponding to the block C and its horizontally and vertically adjacent blocks T, B, L, and R, as shown in FIG. 19(b), are obtained from the motion vectors for these five blocks; then, pixels are read out from these five regions and multiplied by the coefficients shown in FIG. 20 set for each block, the results then being added together and finally divided by 8 for normalization. The same processing is repeated for each block, and the predicted image is obtained from their results. The neighbor motion vectors shown at the output of the motion vector memory 2 in FIGS. 17 and 18 indicate this processing.
With the above method, the discontinuous portions occurring in an image are alleviated, and image quality improves. In cases where prediction errors (residuals) cannot be encoded sufficiently because of a low bit rate, only motion vectors, and hence only the predicted image, are transmitted; even in that case, since the predicted image is constructed in overlapping fashion as depicted in FIG. 2D, a sharp image is obtained for areas of coherent translation and a smooth or blurred image is obtained for areas of nonuniform motion.
However, with the above prior art method, since the predicted image is constructed using the motion vectors for the blocks horizontally and vertically adjacent to each target block, prediction integrity cannot be preserved unless the same reference image is used, and also the reference image is limited to only one frame (e.g. a previous one), the prior art thus has had the problem of lacking extensibility
In MPEG 1/2, for example, by using the structure of IBBBPBBPBBP, P-image data can be encoded independently without referring to B-images. This means two parallel bit streams are considered; one is of IPPP . . . while the other is of additional BBB stream. However, in MPEG 1/2, bit streams are not defined as parallel streams, but arrive sequentially; therefore, if a template (i.e. reference frame) requires a very high bit rate, there may occur a case where the I-frame cannot be decoded within a prescribed time. This is called "latency" problem.
FIG. 21 is a diagram showing decode timing in the prior art image decoding apparatus. In FIG. 21, t0 and t1 each indicate a template, and p0, p1, . . . , p6 indicate ordinary frames. In the upper part of the diagram, decoding of the templates t0 and t1 does not take much time, but is done in time for the display timing, so that the frames are displayed in sequence starting with p0. On the other hand, in the lower part of the diagram, decoding of the template t1 (p4) takes a long time and is not done in time for the display timing of p4. In this case, since the decoding of p3 and p4 cannot be skipped under any circumstances, a buffer is provided to allow a margin for frame display timing, and when the decoding of p4 delays, the decoding of p5 and p6 is skipped and the decoding process proceeds to the next template. The above problem can thus be addressed.
However, in the above method that transmits using a single bit stream, bits must be interpreted one by one until the bit stream of the next template arrives, which presents the problem that the processing time cannot be shortened satisfactorily even if the processing of an ordinary frame is skipped.