1. Field of the Invention
The present invention relates to an image sequence coding and decoding method which performs interframe prediction using quantized values for chrominance or luminance intensity.
2. Description of Related Art
In high efficiency coding of image sequences, interframe prediction (motion compensation) by utilizing the similarity of adjacent frames over time, is known to be a highly effective technique for data compression. Today""s most frequently used motion compensation method is block matching with half pixel accuracy, which is used in international standards H.263, MPEG1, and MPEG2. In this method, the image to be coded is segmented into blocks and the horizontal and vertical components of the motion vectors of these blocks are estimated as integral multiples of half the distance between adjacent pixels. This process is described using the following equation:
[Equation 1]
P(x,y)=R(x+ui,y+vi(x,y)xcex5Bi,0xe2x89xa6i less than Nxe2x80x83xe2x80x83(1)
where P(x, y) and R(x, y) denote the sample values (luminance or chrominance intensity) of pixels located at coordinates (x, y) in the predicted image P of the current frame and the reference image (decoded image of a frame which has been encoded before the current frame) R, respectively. x and y are integers, and it is assumed that all the pixels are located at points where the coordinate values are integers. Additionally it is assumed that the sample values of the pixels are quantized to non-negative integers. N, Bi, and (ui, vi) denote the number of blocks in the image, the set of pixels included in the i-th block of the image, and the motion vectors of the i-th block, respectively.
When the values for ui and vi are not integers, it is necessary to find the intensity value at the point where no pixels actually exist in the reference image. Currently, bilinear interpolation using the adjacent four pixels is the most frequently used method for this process. This interpolation method is described using the following equation:
[Equation 2]
R(x+p/d,y+q/d=((dxe2x88x92q)((dxe2x88x92p)R(x,y)+pR(x+1,y))+q((dxe2x88x92p)R(x,y+1)+pR(x+1,y+1)))//d2xe2x80x83xe2x80x83(2)
where d is a positive integer, and p and q are smaller than d but not smaller than 0. xe2x80x9c//xe2x80x9d denotes integer division which rounds the result of normal division (division using real numbers) to the nearest integer.
An example of the structure of an H.263 video encoder is shown in FIG. 1. As the coding algorithm, H.263 adopts a hybrid coding method (adaptive interframe/intraframe coding method) which is a combination of block matching and DCT (discrete cosine transform). A subtractor 102 calculates the difference between the input image (current frame base image) 101 and the output image 113 (related later) of the interframe/intraframe coding selector 119, and then outputs an error image 103. This error image is quantized in a quantizer 105 after being converted into DCT coefficients in a DCT converter 104 and then forms quantized DCT coefficients 106. These quantized DCT coefficients are transmitted through the communication channel while at the same time used to synthesize the interframe predicted image in the encoder. The procedure for synthesizing the predicted image is explained next. The above mentioned quantized DCT coefficients 106 forms the reconstructed error image 110 (same as the reconstructed error image on the receive side) after passing through a dequantizer 108 and inverse DCT converter 109. This reconstructed error image and the output image 113 of the interframe/intraframe coding selector 119 is added at the adder 111 and the decoded image 112 of the current frame (same image as the decoded image of current frame reconstructed on the receiver side) is obtained. This image is stored in a frame memory 114 and delayed for a time equal to the frame interval. Accordingly, at the current point, the frame memory 114 outputs the decoded image 115 of the previous frame. This decoded image of the previous frame and the original image 101 of the current frame are input to the block matching section 116 and block matching isperformed between these images. In the block matching process, the original image of the current frame is segmented into multiple blocks, and the predicted image 117 of the current frame is synthesized by extracting the section most resembling these blocks from the decoded image of the previous frame. In this process, it is necessary to estimate the motion between the prior frame and the current frame for each block. The motion vector for each block estimated in the motion estimation process is transmitted to the receiver side as motion vector data 120. On the receiver side, the same prediction image as on the transmitter side is synthesized using the motion vector information and the decoding image of the previous frame. The prediction image 117 is input along with a xe2x80x9c0xe2x80x9d signal 118 to the interframe/intraframe coding selector 119. This switch 119 selects interframe coding or intraframe coding by selecting either of these inputs. Interframe coding is performed when the prediction image 117 is selected (this case is shown in FIG. 2). On the other hand when the xe2x80x9c0xe2x80x9d signal is selected, intraframe coding is performed since the input image itself is converted, to a DCT coefficients and output to the communication channel. In order for the receiver side to-correctly reconstruct the coded image, the reciever must be informed whether intraframe coding or interframe coding was performed on the transmitter side. Consequently, an identifier flag 121 is output to the communication circuit. Finally, an H.263 coded bitstream 123 is acquired by multiplexing the quantized DCT coefficients, motion vectors, the and interframe/intraframe identifier flag information in a multiplexer 122.
The structure of a decoder 200 for receiving the coded bit stream output from the encoder of FIG. 1 is shown in FIG. 2. The H.263 coded bit stream 217 that is received is demultiplexed into quantized DCT coefficients 201, motion vector data 202, and a interframe/intraframe identifier flag 203 in the demultiplexer 216. The quantized DCT coefficients 201 become a decoded error image 206 after being processed by an inverse quantizer 204 and inverse DCT converter 205. This decoded error image is added to the output image 215 of the interframe/intraframe coding selector 214 in an adder 207 and the sum of these images is output as the decoded image 208. The output of the interframe/intraframe coding selector is switched according to the interframe/intraframe identifier flag 203. A prediction image 212 utilized when performing interframe encoding is synthesized in the prediction image synthesizer 211. In this synthesizer, the position of the blocks in the decoded image 210 of the prior frame stored in frame memory 209 is shifted according to the motion vector data 202. On the other hand, for intraframe coding, the interframe/intraframe coding selector outputs the xe2x80x9c0xe2x80x9d signal 213 as is.
The image encoded by H.263 is comprised of a luminance plane (Y plane) containing luminance information, and two chrominance planes (U plane and V plane) containing chrominance information. At this time, characteristically, when the image has 2m pixels in the horizontal direction and 2n pixels in the vertical direction (m and n are positive integers), the Y plane has 2m pixels horizontally and 2n pixels vertically, the U and V planes have m pixels horizontally and n pixels vertically. The low resolution on the chrominance plane is due to the fact that the human visual system has a comparatively dull visual faculty with respect to spatial variations in chrominance. Having such image as an input, H.263 performs coding and decoding in block units referred to as macroblocks. The structure of a macroblock is shown in FIG. 3. The macroblock is comprised of three blocks; a Y block, U block and V block. The size of the Y block 301 containing the luminance information is 16xc3x9716 pixels, and the size of the U block 302 and V block 303 containing the chrominance information is 8xc3x978 pixels.
In H.263, half pixel accuracy block matching is applied to each block. Accordingly, when the estimated motion vector is defined as (u, v), u and v are both integral multiples of half the distance between pixels. In other words, xc2xd is used as the minimum unit. The configuration of the interpolation method used for the intensity values (hereafter the intensity values for xe2x80x9cluminancexe2x80x9d and xe2x80x9cchrominancexe2x80x9d are called by the general term xe2x80x9cintensity valuexe2x80x9d) is shown in FIG. 4. When performing the interpolation described in equation 2, the quotients of division are rounded off to the nearest integer, and further, when the quotient has a half integer value (i.e. 0.5 added to an integer), rounding off is performed to the next integer in the direction away from zero. In other words, in FIG. 4, when the intensity values for 401, 402, 403, 404 are respectively La, Lb, Lc, and Ld (La, Lb, Lc, and Ld are non-negative integers), the interpolated intensity values Ia, Ib, Ic, and Id (Ia, Ib, Ic, and Id are non-negative integers) at positions405, 406, 407, 408 are expressed by the following equation:
[Equation 3]
Ia=La
Ib=[(La+Lb+1)/2]
Ic=[(La+Lc+1)/2]
Id[(La+Lb+Lc+Ld+2)/4]xe2x80x83xe2x80x83(3)
where xe2x80x9c[ ]xe2x80x9d denotes truncation to the nearest integer towards 0 (i.e. the fractional part is discarded). The expectation of the errors caused by this rounding to integers is estimated as follows: It is assumed that the probability that the intensity value at positions 405, 406, 407, and 408 of FIG. 4 is used is all 25 percent. When finding the intensity value Ia for position 405, the rounding error will clearly be zero. Also, when finding the intensity value Ib for position 406, the error will be zero when La+Lb is an even number, and when an odd number the error is xc2xd. If the probability that La+Lb will be an even number and an odd number is both 50 percent, then the expectation for the error will be 0xc3x97xc2xd+xc2xdxc3x97xc2xd=xc2xc. Further, when finding the intensity value Ic for position 407, the expectation for the error is xc2xc as for Ib. When finding the intensity value Id for position 408, the error when the residual of La+Lb+Lc+Ld divided by four are 0, 1, 2, and 3 are respectively 0, xe2x88x92xc2xc, xc2xd, and xc2xc. If we assume that the probability that the residual is 0, 1, 2, and 3 is all equal (i.e. 25 percent), the expectation for the error is 0xc3x97xc2xcxe2x88x92xc2xcxc3x97xc2xc+xc2xdxc3x97xc2xc+xc2xcxc3x97xc2xc=xe2x85x9. As described above, assuming that the possibility that the intensity value at positions 405-408 being used are all equal, the final expectation for the error is 0xc3x97xc2xc+xc2xcxc3x97xc2xc+xc2xcxc3x97xc2xc+xe2x85x9xc3x97xc2xc={fraction (5/32)}. This indicates that each time motion compensation is performed by means of block matching, an error of {fraction (5/32)} occurs in the pixel intensity value. Generally in low rate coding, sufficient number of bits cannot be used for the encoding of the interframe error difference so that the quantized step size of the DCT coefficient is prone to be large. Accordingly, errors occurring due to motion compensation are corrected only when it is very large. When interframe encoding is performed continuously without performing intraframe coding under such environment, the errors tend to accumulate and cause bad effects on the reconstructed image.
Just as explained above, the number of pixels is about half in both the vertical and horizontal direction on the chrominance plane. Therefore, for the motion vectors of the U block and V block, half the value of the motion vector for the Y block is used for the vertical and horizontal components. Since the horizontal and vertical components of the motion vector for the Y block motion vector are integral multiples of xc2xd, the motion vector components for the U and V blocks will appear as integral multiples of xc2xc (quarter pixel accuracy) if ordinary division is implemented. However, due to the high computational complexity of the intensity interpolation process for motion vectors with quarter pixel accuracy, the motion vectors for U and V blocks are rounded to half pixel accuracy in H.263. The rounding method utilized in H.263 is as follows: According to the definition described above, (u, v) denotes the motion vector of the macroblock (which is equal to the motion vector for the Y block). Assuming that r is an integer and s is an non-negative integer smaller than 4, u/2 can be rewritten as u/2=r+s/4. When s is 0 or 2, no rounding is required since u/2 is already an integral multiple of xc2xd. However when s is equal to 1 or 3, the value of s is rounded to 2. By increasing the possibility that s takes the value of 2 using this rounding method, the filtering effect of motion compensation can be emphasized. When the probability that the value of s prior to rounding is 0, 1, 2, and 3 are all 25 percent, the probability that s will be 0 or 2 after rounding will respectively be 25 percent and 75 percent. The above explained process related to the horizontal component u of the motion vector is also applied to the vertical component v. Accordingly, in the U block and V block, the probability for using the intensity value of the 401 position is xc2xcxc3x97xc2xc={fraction (1/16)}, and the probability for using the intensity value of the 402 and 403 positions is both xc2xcxc3x97xc2xe={fraction (3/16)}, while the probability for using the intensity value of position 404 is xc2xexc3x97xc2xe={fraction (9/16)}. By utilizing the same method as above, the expectation for the error of the intensity value is 0xc3x97{fraction (1/16)}+xc2xcxc3x97{fraction (3/16)}+xc2xcxc3x97{fraction (3/16)}+xe2x85x9xc3x97{fraction (9/16)}={fraction (21/128)}. Just as explained above for the Y block, when interframe encoding is continuously performed, the problem of accumulated errors occurs.
As related above, for image sequence coding and decoding methods in which interframe prediction is performed and luminance or chrominance intensity is quantized, the problem of accumulated rounding errors occurs. This rounding error is generated when the luminance or chrominance intensity value is quantized during the generation of the interframe prediction image.
In view of the above problems, it is therefore an object of this invention, to improve the quality of the reconstructed image by preventing error accumulation.
In order to achieve the above object, the accumulation of errors is prevented by limiting the occurrence of errors or performing an operation to cancel out errors that have occurred.