For the purpose of efficiently storing or transmitting a digital image, the image is required to be coded in a compression coding manner. As a method for coding a digital image in a compression coding manner, there is a waveform coding method of sub-band coding, wavelet coding, fractal coding or the like other than discrete cosine transform (referred to as a DCT transform hereinafter) represented by JPEG (Joint Photographic Experts Group) and MPEG (Motion Picture Experts Group). For the purpose of removing a redundant signal between images, an inter-image prediction with a motion compensation is executed, thereby subjecting a differential signal to waveform coding.
According to the MPEG system, an input image is processed while being divided into a plurality of 16×16 macro blocks. One macro block is further divided into 8×8 blocks and quantized after undergoing 8×8 DCT transform. This is called an intra-frame coding.
On the other hand, according to a motion detection method inclusive of block matching, a prediction macro block having the minimum error with respect to the objective macro block is detected from other frames adjacent in time, the detected prediction macro block is subtracted from the target macroblock thereby forming a differential macro block, and this macro block is quantized after undergoing 8×8 DCT transform. This is called an inter-frame coding, and the prediction macro block is called a prediction signal of the time domain.
A normal image has spatially similar regions, and an image can be approximated to a spatial region by utilizing this characteristic. In a manner similar to that of the prediction signal of the time region, a prediction signal can also be obtained from an identical frame. This is called a spatial prediction signal.
Since spatially adjacent two pixel values are close to each other, the prediction signal of the spatial region is generally located close to the target signal. On the other hand, on the receiving side or the reproducing side, a signal which has been coded and reproduced in the past is required to be used as the prediction signal since the original image is absent. From these two factors, the prediction signal of the spatial region is required to be generated at high speed. This is because the signal used for the generation of a prediction signal has to be decoded and reproduced.
Therefore, the prediction signal of the spatial region is required to be generated in a simple manner, as well as, in high accuracy. Furthermore, a quickly operable construction is required in a coding apparatus and a decoding apparatus.
The coding of image data has been widely used in many international standards such as JPEG, MPEG1, H.261, MPEG2 and H.263. Each of the latter standards has a more improved coding efficiency. That is, much effort has been devoted to further reducing the number of bits than in the conventional standards in expressing the same image quality.
Coding of image data of moving images is comprised of intra-frame coding and prediction frame coding. In a representative hybrid coding system such as MPEG1 Standard, consecutive frames can be classified into the following three different types:                (a) intra-frame (referred to as an “I-frame” hereinafter);        (b) prediction frame (referred to as a “P-frame” hereinafter); and        (c) bidirectional prediction frame (referred to as a “B-frame” hereinafter).        
An I-frame is coded independently of the other frames, i.e., the I-frame is compressed without referring to the other frames. A P-frame is coded through motion detection and compensation by using the preceding frame for predicting the contents of a coded frame (it is a P-frame). A B-frame is coded through motion detection and compensation by using information from the preceding frame and information from the subsequent frame for predicting the data of the contents of the B-frame. The preceding frame and the subsequent frames could be an I-frame or a P-frame. The I-frame is coded in intra-modes. The P-frame and the B-frame are coded in intra and prediction mode.
As the characteristics of the coding of the I-frame, P-frame and B-frame are different from one another, the compressing methods thereof differ from one another. The I-frame uses no temporal prediction for the purpose of reducing the redundancy, and therefore, it requires more bits than those of the P-frame and the B-frame.
A description will be herein made taking MPEG2 as an example. It is assumed that the bit rate is 4 Mbits/sec and an image having 30 frames/sec is used. In general, the ratio of the number of bits used for the I- P- and B-frames is 6:3:1. Therefore, the I-frame uses about 420 kbits/s, and the B-frame uses about 70 kbits/s. This is because the B-frame is sufficiently predicted from both directions.
FIG. 14 is a block diagram showing a construction of a prior art image predictive coding apparatus. Since a DCT transform is executed on a block basis, the recent image coding methods are all based on the division of an image into smaller blocks. According to the intra-frame coding, an inputted digital image signal is first of all subjected to a block sampling process 1001 as shown in FIG. 14. Next, the blocks obtained after the block sampling process 1001 are subjected to a DCT transform process 1004 and thereafter subjected to a quantizing process 1005 and a run length Huffman variable length coding (VLC: Variable Length Coding; entropy coding) process 1006. On the other hand, according to the prediction frame coding, an inputted digital image is subjected to a motion compensating process 1003, and the motion-compensated block (i.e., the predicted block) is subjected to the DCT transform process 1004. Next, the quantizing process 1005 and the run length Huffman VLC coding (entropy coding) process 1006 are executed.
The fact that the block-based DCT transform process 1004 removes or reduces a spatial redundancy inside the target block to be processed and the fact that the motion detecting and compensating processes 1002 and 1003 remove or reduce a temporal redundancy between adjacent frames are known from the conventional image coding techniques. Further, the run length Huffman VLC coding or other entropy coding processes 1006 executed after the DCT transform process 1004 and the quantizing process 1005 removes statistical redundancy between quantized DCT transform coefficients. However, the process is executed only on the blocks within an image.
A digital image has a spatially great redundancy as an inherent characteristic. This redundancy exists not only in the blocks inside a frame but also between blocks over blocks. However, the fact that no actual method uses a process for removing the redundancy between blocks of an image is apparent from the above description.
According to the existing image coding method, the DCT transform process 1004 or another transform process is executed on the block basis due to restrictive conditions in terms of hardware formation and calculation.
Although the spatial redundancy is reduced through the block-based transform process, it is restricted to the inside of one block. The redundancy between adjacent two blocks is not satisfactorily considered. The redundancy, however, can be further reduced when the intra-frame coding which consistently consumes a great number of bits.
Furthermore, the fact that the block-based DCT transform process removes or reduces the spatial redundancy inside the target block to be processed and the fact that the motion predicting and compensating processes remove or reduce the temporal redundancy between adjacent two frames are known from the existing image coding techniques. A zigzag scan and the run length Huffman VLC coding or another entropy coding process, which are executed after the DCT transform process and the quantizing process, remove the statistical redundancy in quantized DCT transform coefficients, however, they are still restricted to the inside of one block.
A digital image inherently includes a great spatial redundancy. This redundancy exists not only inside a block but also between blocks over blocks of an image. There is no existing method uses the process for removing the redundancy between blocks of one image at all except for the DC coefficient prediction of JPEG, MPEG1 and MPEG2.
According to MPEG1 and MPEG2, the DC coefficient prediction is executed by subtracting the DC value of the preceding coded block from the currently coded block. This is a simple predicting method which does not have an adaptiveness or mode switching when the prediction is inappropriate. Further, it merely includes DC coefficients.
According to the current state of the concerned technical field, the zigzag scan is used for all blocks prior to the run length coding. No attempt at making scan adaptive on the basis of the data of the contents of the block has been made.
FIG. 22 is a block diagram showing a construction of a prior art image predictive coding apparatus. In FIG. 22, the prior art image predictive coding apparatus is provided with a block sampling unit 2001, a DCT transform unit 2003, a quantizing unit 2004, a zigzag scan unit 2005 and an entropy coding unit 2006. In this specification, the term “unit” device a circuit device.
According to the intra-frame coding (i.e., coding inside a frame), an inputted image signal is subjected to a block sampling process 2001 and thereafter subjected directly to a DCT transform process 2003. Then, a quantizing process 2004, a zigzag scan process 2005 and an entropy coding process 2006 are sequentially executed. On the other hand, according to the inter-frame coding (i.e., coding between frames, i.e., prediction frame coding), a motion detecting and compensating process is executed in a unit 2011 after the block sampling process 2001, and then a prediction error is obtained from an adder 2002 by subtracting a detection value obtained from the unit 2011 from the image data obtained from the block sampling 2001. Further, this prediction error is subjected to the DCT transform process 2003 and then to the quantizing process 2004, zigzag scan process 2005 and entropy coding process 2006 similar to the intra-frame coding.
In a local decoder provided in the image predictive coding apparatus shown in FIG. 22, an inverse quantizing process and an inverse DCT transform process are executed in units 2007 and 2008. According to the inter/frame coding, a prediction value obtained through motion detection and compensation is added by an adder 2009 to the prediction error reconstructed by the units 2007 and 2008, and the addition value device locally decoded image data. The decoded image data is stored into a frame memory 2010 of the local decoder. Finally, a bit stream is outputted from the entropy coding unit 2010 and transmitted to the image predictive decoding apparatus of the other party.
FIG. 23 is a block diagram showing a construction of a prior art image predictive decoding apparatus. The bit stream is decoded by a variable length decoder (VLD: Variable Length Decoding) unit (or an entropy decoding unit) 2021, and the decoded image data is then subjected to an inverse quantizing process and an inverse DCT transform process in units 2023 and 2024. According to the inter-frame coding, a prediction value which is obtained through motion detection and compensation and formed by a unit 2027 is added by an adder 2025 to the prediction error reconstructed, thereby forming locally decoded image data. The locally decoded image data is stored into a frame memory 1026 of the local decoder.
According to the existing image coding techniques, the DCT transform process or other transform process is executed on the block basis due to the restrictive conditions in terms of hardware formation and calculation. The spatial redundancy will be reduced through the block-based transform. However, it is restricted to the inside of a block. The redundancy between adjacent blocks is not satisfactorily considered. In particular, the intra-frame coding which consistently consumes a great amount of bits is not satisfactorily considered.