1. Field of the Invention
The present invention relates to an image processing apparatus and method for coding or decoding images.
2. Description of the Related Art
For transmitting and storing information with high efficiency, apparatuses for digital-compressing image data according to orthogonal transform, such as discrete cosine transform (DCT), and motion compensation, in compliance with Moving Picture Experts Group (MPEG), by utilizing the redundancy unique to image information are becoming widespread in information transmitters in, for example, broadcast stations and information receivers in, for example, households.
In particular, MPEG2(ISO/IEC13818-2) is defined as a general-purpose image coding method, and is currently used in widely available applications, such as in professional applications and consumer applications, by covering interlaced scanning images, sequential scanning images, standard resolution images, and high definition images.
By employing the MPEG2 compression method, the high compression ratio and the high image quality can be implemented by providing the coding amount (bit rate) of 4 to 8 Mbps for, for example, standard-resolution interlace scanning images having 720×480 pixels, and by providing the coding amount of 18 to 22 Mbps for, for example, high-resolution interlace scanning images having 1920×1088 pixels.
MPEG2 is mainly used for coding high quality images for broadcasting, and is not compatible with the coding amount (bit rate) lower than MPEG1, namely, it is not compatible with a higher compression ratio. Because of the widespread use of cellular telephones, there is an increasing demand for a coding method for a lower coding amount (higher compression ratio). To meet this requirement, the MPEG4 method was standardized, and the MPEG4 image coding method was acknowledged as ISO/IEC14496-2 in December 1998 as the International Standard.
For image coding, initially, for videoconferencing, H.26L (ITU-T Q6/16 VCEG) is being standardized. Although H.26L requires a greater computation amount for coding and decoding than known coding methods, such as MPEG2 and MPEG4, the higher coding efficiency can be achieved. Currently, as part of the activities of MPEG4, a method for achieving the higher coding efficiency is being standardized as Joint Model of Enhanced-Compression Video Coding based on H.26L by further incorporating features which are not supported by H.26L standards.
In H.26L-standard coding and decoding, motion prediction/compensation is performed with high pixel precision, such as 1/4 or 1/8 pixel precision, for increasing the coding efficiency.
In this case, in motion prediction/compensation, a plurality of pixel signals (pixel data) with integer precision are read from a frame memory, and are interpolated to generate interpolation pixel signals with 1/4 and 1/8 pixel precisions. Then, by using image data with 1/4 and 1/8 pixel precision formed by the pixel signals and interpolation pixel signals, motion vectors are generated.
However, when generating interpolation pixel signals with high pixel precision by using pixel signals read from a frame memory, the pixel signals must be read very frequently from the frame memory depending on the processing of motion vectors. Accordingly, a large, expensive, and wide-band frame memory and a high-performance computation circuit are required, and power consumption is accordingly increased.
The above-described problem is described in detail in the context of a specific example of a known coding apparatus and a known decoding apparatus.
FIG. 1 is a functional block diagram illustrating a known coding apparatus 101. In the coding apparatus 101, an input image signal is first converted into a digital signal in an analog-to-digital (A/D) conversion circuit 501. Then, the frames of the digital signal output from the A/D conversion circuit 501 are rearranged in a frame rearranging circuit 502 according to the GOP (Group of Pictures) structure of the image compression information.
For pictures to undergo intra-coding, image information of the overall frame is input into an orthogonal transform circuit 504, and undergoes orthogonal transform, such as DCT or Karhunen-Loeve transform.
A transform coefficient output from the orthogonal transform circuit 504 is quantized in a quantizing circuit 505.
The quantized transform coefficient output from the quantizing circuit 505 is input into a reversible transform circuit 506, and undergoes reversible coding, such as variable-length coding or arithmetic coding. Then, the resulting transform coefficient is stored in a buffer 507, and is output as compressed image data.
The quantizing rate employed in the quantizing circuit 505 is controlled by a rate control circuit 512. Meanwhile, the quantized transform coefficient output from the quantizing circuit 505 is also input into a dequantizing circuit 508, and further undergoes inverse orthogonal transform in an inverse orthogonal transform circuit 509, resulting in a decoded image signal. The decoded image signal is stored in a frame memory 510.
For pictures to undergo inter-coding, the corresponding image signal is input into a motion prediction/compensation circuit 511. Simultaneously, a reference image signal is read from the frame memory 510, and undergoes motion prediction/compensation in the motion prediction/compensation circuit 511, thereby generating a predictive image signal. The predictive image signal is output to a computation circuit 503, and an image signal, which indicates the difference between the image signal output from the frame rearranging circuit 502 and the predictive image signal output from the motion prediction/compensation circuit 511, is generated, and is output to the orthogonal transform circuit 504.
The motion prediction/compensation circuit 511 outputs a motion vector MV to the reversible coding circuit 506. The motion vector MV undergoes reversible coding, such as variable-length coding or arithmetic coding, in the reversible coding circuit 506, and is inserted into the header of the image signal. The rest of the processing is similar to that of intra-coding.
FIG. 2 is a functional block diagram illustrating a decoding apparatus 102 corresponding to the coding apparatus 101 shown in FIG. 1.
In the decoding apparatus 102 shown in FIG. 2, input image data is stored in a buffer 613, and is then output to a reversible decoding circuit 614. The image data undergoes variable-length decoding or arithmetic decoding in the reversible decoding circuit 614 according to a predetermined image compression information format. If the frame is an inter-coded frame, the motion vector MV stored in the header of the image signal is also decoded in the reversible decoding circuit 614, and the motion vector MV is output to a motion predictive/compensation circuit 620.
A quantized transform coefficient output from the reversible decoding circuit 614 is input into a dequantizing circuit 615 so as to generate a transform coefficient. The transform coefficient undergoes inverse orthogonal transform, such as inverse DCT or inverse Karhunen-Loeve transform in an inverse orthogonal transform circuit 616 according to a predetermined image compression information format. If the frame is an intra-coded frame, the image information which has undergone inverse orthogonal transform is stored in a frame rearranging circuit 618, and is converted into an analog signal in a digital-to-analog (D/A) conversion circuit 619, and is then output.
If the frame is an inter-coded frame, a predictive image signal is generated in the motion predictive/compensation circuit 620 based on the motion vector MV and a reference image signal stored in a frame memory 621. This predictive image signal and the image signal output from the inverse orthogonal transform circuit 616 are added in an adder 617. The rest of the processing is similar to that performed on the intra-coded frame.
In H.26L standards, motion prediction/compensation having high precision, such as 1/4 pixel precision and 1/8 pixel precision, is defined.
1/4-pixel-precision motion prediction/compensation is as follows.
It is now assumed that integer-precision image signals (pixel values) are present at pixel positions (phase) indicated by A in FIG. 3.
Interpolation pixel signals having 1/2 pixel precision corresponding to interpolation positions b are generated by a 6-tap finite impulse response (FIR) filter {1, −5, 20, 20, −5, 1}, and the resulting signals are clipped in a range [0, 255].
Then, the interpolation pixel signals at the interpolation positions b are input into the above-described FIR filter, and interpolation pixel signals having 1/2 pixel precision corresponding to interpolation positions c are generated. The resulting signals are clipped by a range [0, 255].
Interpolation pixel signals corresponding to interpolation positions d, g, e, and f are then generated by linear interpolation computation.
Subsequently, an interpolation pixel signal corresponding to interpolation position h is generated by averaging interpolation pixel signals at two interpolation positions b located on a diagonal passing through interpolation position h.
Then, an interpolation pixel signal corresponding to interpolation position i is generated by computation by using the pixel signals of the surrounding four pixel positions A.
When performing motion prediction/compensation using a 6-tap FIR filter, in addition to a motion compensation (MC) block, which serves as a unit for motion prediction/compensation, as shown in FIG. 4, pixel signals equivalent to five extra pixels in each row and column of the MC block, i.e., two upper pixels, two left pixels, three lower pixels, and three right pixels, must be extracted from a frame memory.
This overhead results in 1.72265625 (=(21×21)/(16×16)) for the largest 16×16 MC block, and results in 5.0625 (=(9×9)/(4×4)) for the smallest MC block. That is, a greater overhead is generated for a smaller MC block, causing a wider memory band.