1. Technical Field
The present invention relates to a motion compensation method using motion vectors and picture coding, and decoding methods using the motion compensation method.
2. Background Art
In recent years, accompanied by the development of multimedia applications, it is becoming universal to deal with media information as a whole such as a picture, a voice and a text. At this time, it is possible to deal with the media uniformly by digitizing the entire media. However, the digitized picture has an extremely large amount of data and therefore a technology for compressing picture information is indispensable. On the other hand, it is also important to standardize compression technology to interoperate compressed picture data. As standards of picture compression technology, there are H. 261 and H. 263 of ITU-T (International Telecommunication Standardization Section), MPEG (Moving Picture Experts Group)-1, MPEG-2, MPEG-4 and the like of ISO/IEC (International Standardization Organization/International Electrotechnical Commission).
Generally, information volume is compressed by reducing redundancy in temporal and spatial directions of moving picture coding. Therefore, for interpicture prediction coding aimed at reducing temporal redundancy, motion estimation and generation of a predictive picture are performed on a block-by-block basis by referring to a preceding picture and a following picture Coding is performed for a difference value between an obtained predictive image and an image of a current macroblock to be coded. Here, a picture is a term to represent one screen; it means a frame in a progressive picture and a frame or a field in an interlace picture. Here, an interlace picture is a picture in which one frame is formed with two fields with different time. In a coding and decoding process of an interlace picture, it is possible to process one frame as a frame “as is” or two fields. Also it is possible to process one frame structure or one field structure for each block in the frame.
A picture that does not have a reference picture and in which intrapicture prediction coding is performed is called an “I picture.” Additionally, a picture in which only one picture is referred and interpicture prediction coding is performed is called a “P picture.” Moreover, a picture in which two pictures are referred at one time and interpicture coding can be performed is called a “B picture.” In a B picture, two pictures can be referred as an arbitrary combination from forward pictures or backward pictures in display order. It is possible to select appropriate reference pictures for each block that is a basic unit for coding and decoding. Two reference pictures are distinguished: a reference picture that is described earlier in a coded bit stream is the first reference picture, and a reference picture that is described later in the coded bit stream is the second reference picture. But it is necessary that the reference pictures are already coded or decoded as a condition in the case of coding and decoding these pictures.
To code a P picture or a B picture, interpicture prediction coding using motion compensation is used. The interpicture prediction coding using motion compensation is a coding method in which motion compensation is applied to interpicture prediction. Motion compensation is a method that does not simply perform prediction based on pixel values of a block in a reference frame co-located with a current block but estimates a motion amount (hereinafter, called a “motion vector”) of each part and performs prediction considering the motion amount to improve predictive accuracy and reduce data amount. For example, estimating a motion vector of a current picture to be coded, obtaining a predictive value that has been shifted by the amount of the motion vector and coding a predictive residual (that is the difference between the predictive value and a pixel value of each pixel in the current picture to be coded), successfully reduce the data amount. In the case of this method, information of a motion vector is necessary at the time of decoding and therefore the motion vector is also coded and recorded, or transmitted.
The motion vector is estimated on a block-by-block basis, the blocks having a predetermined size. Concretely, the motion vector is estimated by moving each block in a reference picture corresponding to each block in a current picture to be coded in a search area and by detecting the location of the reference block that is most similar to the current block to be coded.
FIG. 1 is a block diagram showing the structure of a conventional picture coding apparatus 100. The picture coding apparatus 100 includes a difference unit 101, an image coding unit 102, a variable length coding unit 103, an image decoding unit 104, an addition unit 105, a picture memory 106, a picture memory 107, a motion compensation unit 108, a motion vector estimation unit 109 and a motion vector storage unit 110. Here, as for motion compensation, an appropriate block size is selected on a macroblock-by-macroblock basis from seven block sizes and used for coding and decoding, the seven block sizes being 4×4 pixels, 4×8 pixels, 8×4 pixels, 8×8 pixels, 8×16 pixels, 16×8 pixels and 16×16 pixels according to ITU-T H.26L TML8, which is currently under standardization.
The picture memory 107 stores image data “Img” that represents moving pictures inputted in the display order on a picture-by-picture basis. The difference unit 101 calculates the difference between the image data “Img” read out from the picture memory 107 and predictive image data “Pred” inputted from the motion compensation unit 108 and generates predictive residual image data “Res”. The image coding unit 102 performs coding processes such as frequency conversion and quantization to the inputted predictive residual image data “Res” and generates coded residual data “CodedRes”. In the case of intrapicture coding, interpicture motion compensation is not performed and therefore the value of the predictive image data “Pred” is thought to be “0.”
The motion vector estimation unit 109 estimates the motion vector that shows the location predicted to be optimum in the search area in the reference picture that is reference picture data “Ref,” which is coded decoding picture data stored in the picture memory 106 and outputs a motion parameter “MotionParann” that represents the estimated motion vector. In addition, at that time, the motion vector estimation unit 109 switches reference pictures according to whether a current picture to be coded is a P picture or a B picture. Coding mode “Mod” shows in which way (for example, which one of a bi-predictive mode, a unidirectional mode and a direct mode) motion compensation is performed. For example, in the direct mode, the motion vector estimation unit 109 calculates bi-predictive motion vectors of the current block to be motion-compensated by using a motion vector derived from another block. Here, a picture referred to derive a motion vector in the direct mode is called a standard picture and a block in the standard picture co-located with the current block is called a standard block. In this case, values of motion vectors in the direct mode are calculated with a 16×16-pixel macroblock as the unit regardless of the block size that is actually the unit for motion compensation, and the calculated motion vectors are not coded. Then, the motion vector estimation unit 109 chooses either the calculated motion vector or the motion vector (0, 0) to be used for each 4×4-pixel block. The motion compensation unit 108 generates the predictive image data “Pred” based on the coding mode “Mod” of the current block to be coded and the motion vectors estimated by the motion vector estimation unit 109.
Further, when a motion vector indicates sub-pixel locations such as a half pixel and a quarter pixel, the motion compensation unit 108 interpolates pixel values of the sub-pixel locations such as a half pixel and a quarter pixel by using a low-pass filter and the like. The motion vector storage unit 110 stores motion parameters “MotionParann” outputted from the motion vector estimation unit 109. The variable length coding unit 103 performs variable length coding and the like to the inputted coded residual data “CodedRes” and the motion parameters “MotionParann” outputted from the motion vector estimation unit 109 and generates coded data “Bitstream” by further adding the coding mode “Mod”.
The image decoding unit 104 performs decoding processes such as inverse quantization and inverse frequency conversion to the inputted coded residual data “CodedRes” and generates decoded residual data “ReconRes.” The addition unit 105 adds the decoded residual data “ReconRes” outputted from the image decoding unit 104 to the predictive image data “Pred” inputted from the motion compensation unit 108 and generates decoded image data “Recon.” The picture memory 106 stores the generated decoded image data “Recon.”
When the motion amount of a photogenic subject is smaller than an integer pixel unit, a predictive effect may improve if the prediction is performed with a movement that is smaller than the integer pixel unit. Generally, pixel interpolation is used when calculating pixel values of a predictive image with the movement that is smaller than the integer pixel unit. This pixel interpolation is performed by filtering pixel values of a reference picture with a linear filter (a low-pass filter). When increasing the number of taps of this linear filter, it is easier to realize a filter with good frequency characteristics and therefore the predictive effect improves but a processing amount increases. On the other hand, when the number of taps of this linear filter is small, the frequency characteristics become worse and therefore the predictive effect deteriorates but the processing amount decreases.
FIG. 2 is a diagram showing the structure of a conventional picture decoding apparatus 200 that performs pixel interpolation. The picture decoding apparatus 200 includes a variable length decoding unit 201, an image decoding unit 202, an addition unit 203, a picture memory 204, a motion vector storage unit 205 and a motion compensation unit 206.
The variable length decoding unit 201 extracts various data such as the coded residual data “CodedRes”, motion parameters “MotionParam” and information of the coding mode “Mod” at the time of coding from the inputted coded data “Bitstream”. The image decoding unit 202 decodes the inputted coded residual data “CodedRes” and generates predictive residual image data “Res”. The motion vector storage unit 205 stores the motion parameters “MotionParam” extracted by the variable length decoding unit 201. The motion compensation unit 206 includes an inside pixel interpolation unit (not illustrated) that interpolates pixel values of the sub-pixel locations such as a half pixel and a quarter pixel by using a linear filter and the like. The motion compensation unit 206 generates predictive image data “Pred” that is motion compensation data from the decoded image data “Recon” in the picture memory 204 based on the coding mode “Mod” at the time of coding, motion parameters “MotionParam” and the like. At this time, in the case of the direct mode, the motion compensation unit 206 generates the predictive image data “Pred” of the current block to be motion-compensated in the same size with the block size of motion compensation of a standard block in a standard picture, read out from the picture memory 204. The addition unit 203 adds the predictive residual image data “Res” outputted from the image decoding unit 202 to the predictive image data “Pred” that is motion compensation data outputted from the motion compensation unit 206 and generates the decoded image data “Recon.” The picture memory 204 stores the generated decoded image data “Recon.” Refer to MPEG-4 Visual written standards (1999, ISO/IEC 14496-2: 1999 Information technology—Coding of audio-visual objects-Part2: Visual)
To perform motion compensation of sub-pixel precision, however, it is necessary to obtain pixel values of not only the current block to be motion-compensated but also some adjacent pixels. In other words, to generate pixel values of sub-pixel precision, it is necessary to obtain the pixel values of a larger area than the actual block to be predicted. It is common practice to use a low-pass filter in order to generate pixel values by an interpolation process; it is necessary to access (read out) some adjacent pixels (pixels for a number of coefficients of the low-pass filter) to a target pixel in order to use the low-pass filter. FIGS. 3A and 3B are diagrams showing examples of a current block to be motion-compensated and its adjacent pixels, whose pixel values are necessary to be read out in order to generate a predictive image when performing pixel interpolation. FIG. 3A is a diagram showing the current block to be motion-compensated and its adjacent pixels when the current block to be motion-compensated is small. FIG. 3B is a diagram showing the current block to be motion-compensated and its adjacent pixels when the current block to be motion-compensated is large. In FIGS. 3A and 3B, the central square shows one current block to be motion-compensated while the surrounding hatched area shows the adjacent pixels whose pixel values are read out from a reference memory in order to perform pixel interpolation. Here, for example, when a filter of 9 taps (pixel values of nine pixels are necessary) is assumed to be used as a low-pass filter, in order to perform low-pass filter process to pixels in the border area of the block, it is necessary to obtain the pixel values of at least four pixels outside the block and therefore a memory must be accessed to read out the area including the pixel values of four pixels surrounding the central current block to be motion-compensated. For example, in a 4×4-pixel block, it is necessary to read out the pixel values of (4+4+4)×(4+4+4)=144 pixels for each block. In an 8×8-pixel block, it is necessary to read out the pixel values of (4+8+4)×(4+8+4)=256 pixels. When motion-compensating a 16×16-pixel macroblock with an 8×8-pixel block as the unit, it is enough to read out the pixel values of 256 pixels×4=1024 pixels but when motion-compensating the 16×16-pixel macroblock with a 4×4-pixel block as the unit, it is necessary to read out the pixel values of 144 pixels×16=2304 pixels. Consequently, the memory access amount of the motion compensation with an 8×8-pixel block as the unit is about half of that of four motion compensations with a 4×4-pixel block as the unit.
As is apparent from the above-mentioned example, when reading out the pixel values of the same number of external pixels surrounding one current block to be motion-compensated, the smaller the size of the current block to be motion-compensated, the larger the ratio of the number of pixels in adjacent blocks to the number of pixels in the current block to be motion-compensated (concerning the number of pixels read out from a reference memory). As a result, when reading out the pixel values of the current block to be motion-compensated from the reference memory, there is a problem that the load of memory access (access for reading out) becomes large by referring to the adjacent pixels that are not the target of motion compensation. Particularly, when performing the bi-predictive motion compensation of a B picture whose pixel values are calculated by motion-compensating the current picture to be coded or decoded referring to two pictures at the same time, the access to the reference memory becomes about double compared with unidirectional predictive motion compensation. Therefore, a problem of overhead becomes more prominent when the size of the current block to be motion-compensated is small.