Many video processing systems require knowledge of way that parts of the image move between one frame and the next. The process of determining the motion is known as motion estimation. A common motion estimator is the block-based type, in which a frame of video is divided into a number of blocks, and for each block a vector is found that represents the motion of the pixels in that block.
Motion estimation commonly uses what may be referred to as single-ended motion vectors. FIG. 1 shows an example block based single-ended motion estimator. An image 100 is divided into a regular array of blocks 105, and motion estimation proceeds for each block in turn.
FIG. 1 shows a moving object 110 at a certain position in one frame of a video sequence, and, superimposed onto the same figure, the same object 115, at its position in the previous frame in the sequence. The image data in block 120 contains a number of pixels representing part of object 110. Motion estimation for block 120 involves searching the previous frame in the sequence to find the area of image data with contents most similar to the contents of block 120. Assuming that the motion estimation performs well, the area 125 is found. It can be seen that area 125 is the same size as block 120, but is not aligned to the grid 105. The position of the area of matching pixels 125, relative to block 120, determines motion vector 130 which reflects the motion of object 110, and is said to be the motion vector of block 120.
Single-ended motion estimation works well in some applications, such as video encoding, since it produces one vector for each block, such as 120, in each frame 100 that is encoded.
Another application for motion estimation is a motion compensated frame rate converter. In this application it is necessary to produce an interpolated frame at an intermediate position between two existing source frames in a video sequence. FIG. 2 shows the single ended motion estimation result from FIG. 1, being used to interpolate image data in a new frame mid-way between two source frames from the original video sequence. Motion estimation for block 120 determines motion vector 130, and pixels for a new area of image 200, positioned at the midpoint of the vector, are derived from the pixels in block 120 and from the pixels in area 125. It should be noted that the interpolated area 200 is not necessarily aligned to the block grid.
FIG. 3 illustrates a problem that arises when using single-ended vectors in a frame rate converter. Objects 300 and 305 are moving at different speeds, giving rise to unequal motion vectors 320 and 325 for the blocks 310 and 315 respectively. In this example the vectors are converging. Interpolation of a new frame involves the creation of pixel data at positions 330 and 335, the mid-points of the two vectors. Blocks 310 and 315 are adjacent, but the interpolated areas, 330 and 335 are not. This leads to a hole, 340, in the interpolated image. An alternative situation exists where vectors diverge, leading to overlap of interpolated areas. In either case, some effort is required to resolve holes and overlap areas, in order to produce an output image with one value at each pixel position.
FIG. 4 shows an example of double ended motion estimation. When used in the example application of a frame rate converter, this type of motion estimation has the significant advantage of producing exactly one value for each pixel position in the interpolated frame. The frame to be interpolated, 400, is divided into a regular array of blocks, 405, and motion estimation takes place for each block in turn. Motion estimation for block 405 involves searching the previous and next frames in the sequence for areas of image data that are most similar to each other. The search is constrained, in this example, by requiring that the offsets of the areas tested are equal in magnitude and opposite in direction with respect to the position of the block in the interpolated frame. In this example, the best match for the motion of the round object is found between area 410 in the previous frame and area 415 in the next frame, both of which are shown superimposed onto the grid of blocks in the interpolated frame. Note that neither area is necessarily aligned with the grid. The forward offset 420 is equal to the backward offset 425. In combination the two offsets may be said to be the motion vector of block 405, and represent the motion of an object in the interval between source frames. In the figures, where double ended motion vectors are shown, the component corresponding to the forward offset (such as 420) is shown with an open arrow head, and the component corresponding to the backward offset (such as 425) is shown with a solid arrow head.
Interpolation of pixel data in block 405 requires that pixel data be derived from pixel data in one or both of the areas 410 and 415. The alignment of the grid to the interpolated frame means that exactly one value is produced for each pixel position.
The example of FIG. 4 shows interpolation occurring at the temporal mid-point between two source frames. In frame rate conversion it is common that other interpolation phases are required, for example interpolation at one quarter of the interval between source frames. In such a situation several possibilities exist, one of which is illustrated in FIG. 5. A block 500 is motion estimated and interpolated using a method similar to that illustrated in FIG. 4. However, it is known that interpolation at one quarter of the frame interval is required, and so the forward offset 505 is scaled such that it is three times the size of the backward offset 510. The scaled offsets are then used in motion estimation and interpolation. This gives correct interpolation position of object 515 at one quarter of the temporal interval between source frames. Should further interpolations be required, for example at half and three-quarter intervals, further motion estimations are performed with forward and backward offset sizes adjusted accordingly.
Occluded and revealed areas of images present a problem for any motion estimation system, and particularly for a system using double-ended vectors. A common example occurs where an object moves across a background. At the leading edge of the moving object parts of the background are occluded, and at the trailing edge of the moving object parts of the background are revealed.