Conventional image coding systems are constituted to divide image frames into blocks of a fixed size and then perform coding processing in these divided units. Typical examples of conventional image coding systems include the MPEG (Moving Picture Experts Group) 1 coding system as described in Le Gall. D: “MPEG: A Video Compression Standard for Multimedia Applications”, Trans. ACM, 1991, April.
MPEG 1 performs motion-compensated interframe prediction (MC: Motion Compensation) by dividing image frames into fixed block units known as macroblocks, detecting movement amounts (or motion vectors) by referencing a local decoding frame image encoded in units, specifying similar blocks from within a reference image and employing these similar blocks as predictive data. By means of this technique, even when there is motion in an image, the prediction efficiency can be improved by tracking the motion, and redundancy in a temporal direction can be reduced. Furthermore, redundancy that remains in a spatial direction can be reduced by employing the DCT (Discrete Cosine Transform), using units that are blocks consisting of 8×8 pixels, with respect to a prediction residual signal. A variety of standard image coding systems that start with MPEG1 perform data compression of image signals by combining MC and the DCT.
FIG. 20 is a block diagram showing the constitution of a conventional image coding apparatus based on an MPEG1 image coding system. An input image signal 1 which is inputted to the image coding apparatus shown in FIG. 20 is a temporal array of frame images and will subsequently embody the signal of each frame image unit. Further, an example of a frame image that is to be encoded is shown in FIG. 21. The current frame 601 is divided into fixed square/rectangular regions of 16 pixels×16 lines (called macroblocks), and the processing that follows is performed in these units.
The macroblock data of the current frame (current macroblocks) produced by the input image signal 1 are first outputted to a motion detection section 2 where detection of motion vectors 5 is carried out. A motion vector 5 is detected by referencing a predetermined search region of a previous encoded frame image 4 (called a local decoding image 4 hereinafter) stored in a frame memory 3, locating a pattern similar to the current macroblock (called a prediction image 6 hereinafter), and determining the amount of spatial displacement between this pattern and the current macroblock.
Here, the local decoding image 4 is not limited to only the previous frame. Rather, a future frame can also be used as a result of being encoded in advance and stored in the frame memory 3. Although the use of a future frame generates switching of the coding order and in turn an increase in the processing delay, there is the merit that variations in the image content produced between previous and future frames is easily predicted, thus making it possible to effectively reduce temporal redundancy still further.
Generally, in MPEG1, it is possible to selectively use three coding types which are called bidirectional prediction (B frame prediction), forward prediction (P frame prediction) that uses previous frames alone, and I frame prediction which does not perform interframe prediction, instead performing coding only within the frame. FIG. 21 is limited to P frame prediction alone, a local decoding image 4 being recorded with a previous frame 602.
The motion vector 5 shown in FIG. 20 is rendered by a two-dimensional parallel displacement amount. Block matching as represented in FIGS. 22A to 22D is generally used as the method of detecting the motion vector 5. A motion search range 603 centered on the spatial phase of the current macroblock is established, then, from the image data 604 within the motion search range 603 of the previous frame 602, the block for which the sum of the squares of the differences or the sum of the absolute values of differences is minimum is determined as the motion predictive data, and the relocation amount of the motion predictive data in relation to the current macroblock is determined as the motion vector 5.
The motion predictive data for all the macroblocks in the current frame is determined, and this data, which is rendered as a frame image, is equivalent to the motion prediction frame 605 in FIG. 21. The difference 606 between the motion prediction frame 605 shown in FIG. 21 obtained by way of the above MC processing, and the current frame 601, is obtained (obtained by the subtraction section 21 shown in FIG. 20), and this residual signal (called the prediction residual signal 8 hereinafter) undergoes DCT coding. Specifically, the processing to extract the motion predictive data for every macroblock (prediction image 6 hereinafter) is performed by a motion compensation section 7. The processing performed by the motion compensation section 7 involves using a motion vector 5 to extract the prediction image 6 from the local decoding image 4 stored in the frame memory 3.
The prediction residual signal 8 is converted into DCT coefficient data 10 (also called DCT coefficients 10 hereinafter) by a DCT section 9. As shown in FIG. 23, the DCT converts spatial pixel vectors denoted by 610 into a set of normalized orthogonal bases that render fixed frequency components denoted by 611. 8×8 pixel blocks (‘DCT blocks’ below) are normally adopted for the spatial pixel vectors. Because the DCT is discrete transform processing, the DCT actually performs conversion for each of the horizontal and vertical 8 dimensional row and column vectors of the DCT block.
The DCT uses the correlation between pixels present in a spatial region to localize the power concentration in the DCT block. The higher the power concentration, the better the conversion efficiency is, and therefore the performance of the DCT with respect to a natural image signal is not inferior when compared with a KL transform which is the optimum transform. Particularly in the case of a natural image, the power is concentrated in the lower regions including the DC component as a main part, and there is barely any power in the higher regions, and therefore, as shown in FIG. 24, by scanning from the lower regions to the higher regions as indicated by the arrows in the DCT block such that the quantized coefficients denoted by 612 are denoted by 613, and by including a large zero run, the overall coding efficiency which also includes the results of entropy coding is improved.
The quantization of the DCT coefficients 10 is performed by a quantization section 11 and the quantized coefficients 12 obtained thereby are scanned, run-length encoded, and multiplexed in a compressed stream 14 by a variable length coding section 13 before being transmitted. Further, the motion vectors 5 detected by the motion detection section 2 are also multiplexed in the compressed stream 14 and transmitted, one macroblock at a time, because these vectors are required in order to allow the image decoding apparatus described subsequently to generate a prediction image that is the same as that of the image coding apparatus.
In addition, the quantized coefficients 12 are decoded locally via a reverse quantization section 15 and a reverse DCT section 16, and the decoded results are added to the prediction image 6 by an addition section 22, whereby a decoding image 17 which is the same as that of the image decoding apparatus is generated. The decoding image 17 is used in the prediction for the next frame and is therefore stored in the frame memory 3.
A description is provided next for the constitution of a conventional image decoding apparatus that is based on an MPEG1 image decoding system as shown in FIG. 25. After receiving the compressed stream 14, the image decoding apparatus detects a sync word indicating the start of each frame by means of a variable length decoding section 18, and subsequently decodes motion vectors 5 and quantized DCT coefficients 12 in macroblock units. The motion vectors 5 are outputted to a motion compensation section 7d and the motion compensation section 7d extracts, as a prediction image 6, the image parts which have moved to an extent equivalent to the motion vectors 5, from a frame memory 19 (used in the same way as the frame memory 3), this operation being similar to the operation of the above-mentioned image coding apparatus. The quantized DCT coefficients 12 are decoded via a reverse quantization section 15d and a reverse DCT section 16d, and then added by the addition section 23 to the prediction image 6 to form the final decoding image 17. The decoding image 17 is outputted using predetermined display timing to a display device (not shown) where the image is played back.