Conventionally, an image encoding apparatus and an image decoding apparatus based on MPEG-1 encoding method are disclosed in Le Gall, D. “MPEG: A Video Compression Standard for Multimedia Applications” (Trans. ACM, 1991, April). The image encoding apparatus is constructed as showed in FIG. 1, and the image decoding system is constructed as showed in FIG. 2.
The image encoding apparatus showed in FIG. 1 reduces redundancy in the time directions by motion compensating inter-frame prediction, and further reduces redundancy remaining in the spatial directions by DCT (Discrete Cosine Transform) to compress an image signal. FIG. 3 shows motion compensating inter-frame prediction; FIG. 4 shows block matching method frequently used to detect a motion vector; FIG. 5 shows the concept of DCT; and FIG. 6A shows the principle of encoding DCT coefficients. The operations of the image encoding apparatus and the image decoding apparatus showed in FIGS. 1 and 2, respectively, will be described by reference to these drawings.
An input image signal 1 is a time series of framed images and, hereinafter, refers to a signal by a framed image. A framed image to be encoded will be called a current frame as showed in FIG. 3. The current frame is divided into 16 pixels×16 lines square regions (hereinafter referred to as a “macro block”), for example, and dealt with as follows.
The macro block data (current macro block) of the current frame are sent to motion detection unit 2 to detect a motion vector 5. A pattern similar to the current macro block is selected from patterns in a predetermined search region of encoded framed images 4 (hereinafter called partially decoded images) stored in a frame memory 3, and the motion vector 5 is generated based on the spatial distance between the selected pattern and the current macro block.
The above partially decoded image is not limited to frames in the past. It is possible to use frames in the future by encoding them in advance and storing them in a frame memory. The use of the future frames increases time required for processing since the order of encoding needs to be switched. The use of the future frames, however, further reduces redundancy in the time directions effectively. Generally, in the case of MPEG-1, the following encoding types are selectively available: bi-directional prediction using both a past frame and a future frame (B-frame prediction); prior-directional prediction using only a prior frame (P-frame prediction); and I-frame that performs encoding without prediction. In FIG. 3 showing the case of P-frame prediction, the partially decoded image is indicated as a prior frame.
The motion vector 5 is represented by a two dimensional translation. The motion vector 5 is usually detected by block matching method showed in FIG. 4. A search range centered at the spatial position of the current macro block is provided, and motion is searched in the motion search range. A motion prediction datum is defined as a block that minimizes the sum of squared differences or the sum of absolute differences selected from the image data in the motion search range of the prior frame. The motion vector 5 is determined as the quantity of positional change between the current macro block and the motion prediction data. A motion prediction datum is obtained for each macro block of the current frame. The motion prediction data represented as a frame image corresponds to a motion prediction frame of FIG. 3. For the motion compensation inter-frame prediction, a difference between the motion prediction frame and the current frame is obtained, and the remainder signal (hereinafter referred to as prediction remainder signal 8) is encoded by DCT encoding method as showed in FIG. 3.
Specifically, a motion compensation unit 7 identifies the motion prediction datum of each macro block (hereinafter referred to as prediction image). That is, this motion compensation unit 7 generates a prediction image 6 from the partially decoded image 4 stored in the frame memory 3 using the motion vector 5.
The prediction remainder signal 8 is converted into a DCT coefficient datum by a DCT unit 9. As showed in FIG. 5, DCT converts a spatial pixel vector into a combination of normal orthogonal bases each representing a fixed frequency element. A block of 8×8 pixels (hereinafter referred to as a DCT block) is usually employed as a spatial pixel vector. Since DCT is a separation type conversion, each eight dimensional horizontal row vector of a DCT block is separately converted, and each eight dimensional vertical column vector of a DCT block is separately converted. DCT localizes power concentration ratio in a DCT block using the inter-pixel correlation existing in the spatial region. The higher the power concentration ratio is, the more efficient the conversion is. In the case of a natural image signal, the performance of DCT is as high as that of KL transformation that is the optimum conversion. Especially, the electric power of a natural image is mainly concentrated in a low frequency range and little distributed to the high frequency range. Accordingly, as showed in FIG. 6B, the quantization coefficients are scanned in the DCT block in a direction from a low frequency to a high frequency. Since the scanned data includes many zero runs, the total encoding efficiency including the effect of entropy encoding is improved.
A quantization unit 11 quantizes the DCT coefficients 10. The quantized coefficients 12 are scanned by a variable length encoding unit 13 and converted into a run-length code that is multiplexed on a compressed stream 14 and transmitted. In addition, the motion vector 5 detected by the motion detection unit 2 is multiplexed on the compressed stream 14 by a macro block and transmitted for the generation by a image decoding apparatus of the same prediction image as that generated by the image encoding apparatus.
A quantized coefficient 12 is partially decoded via an inverse quantization unit 15 and an inverse DCT unit 16. The result is added to the predicted image 6 to generate a decoded image 17 that is the same as a decoded image data generated by the image decoding apparatus. The decoded image 17 is stored in the frame memory 3 as the partially decoded image to be used for the prediction of the next frame.
The operation of an image decoding apparatus showed in FIG. 2 will be described below.
This image decoding apparatus, after receiving a compressed stream 14, detects a sync word indicating the top of each frame by a variable length decoding unit 18 and restores the motion vector 5 and the quantized DCT coefficient 12 by a macro block. The motion vector 5 is transferred to the motion compensation unit 7 that extracts a portion of image stored in a frame memory 19 (that is used in the same manner as the frame memory 3) that moved for the motion vector 5 as the prediction image 6. The quantized DCT coefficient 12 is restored through a inverse quantization unit 15 and a inverse DCT unit 16, and then, added to the predicted image 6 to make the final decoded image 17. The decoded image 17 is output to a display device at a predetermined timing to reproduce the image.
Encoding algorisms such as MPEG motion picture encoding that utilize a correlation of a signal that has already been decoded (hereinafter referred to as a reference image or a prediction image) are widely employed as described in connection with the conventional example described above. DCT is frequently used as the transformation base because of the reasons described above. DCT is effective for encoding signal waveforms the prior probability distribution of which is unknown. However, media signals such as an audio signal and an image signal are generally unsteady and spatially and temporally biased. Accordingly, in the case of the fixed transformation base described above in connection with the conventional example, the number of the bases (the number of coefficients) cannot be reduced, which poses a limit on the compression.