Many video processing systems require knowledge of the way that parts of the image move between one frame and the next. The process of determining the motion is known as motion estimation. A common motion estimator is the block-based type, in which a frame of video is divided into a number of blocks, and for each block a vector is found that represents the motion of the pixels in that block.
FIG. 1 shows an example block-based single-ended motion estimator. An image 100 is divided into a regular array of blocks 105, and motion estimation proceeds for each block in turn. Also shown is a moving object 110 at a certain position in one frame of a video sequence, and, superimposed onto the same figure, the same object 115, at its position in the previous frame in the sequence. The image data in block 120 contains a number of pixels representing part of object 110. Motion estimation for block 120 involves searching the previous frame in the sequence to find the area of image data with contents most similar to the contents of block 120. Assuming that the motion estimation performs well, the area 125 is found. It can be seen that area 125 is the same size as block 120, but is not aligned to the grid 105. The position of the area of matching pixels 125, relative to block 120, determines motion vector 130 which reflects the motion of object 110, and is said to be the motion vector of block 120.
Single-ended motion estimation works well in some applications, such as video encoding, since it produces one vector for each block, such as 120, in each frame 100 that is encoded.
Another application for motion estimation is a motion compensated frame rate converter. In this application it is necessary to produce an interpolated frame at an intermediate position between two existing source frames in a video sequence. FIG. 2 shows the motion estimation result from FIG. 1, being used to interpolate image data in a new frame mid-way between two source frames from the original video sequence. Motion estimation for block 120 determines motion vector 130, and pixels for a new area of image 200, positioned at the midpoint of the vector, are derived from the pixels in block 120 and from the pixels in area 125. Notice that the interpolated area 200 is not necessarily aligned to the block grid.
FIG. 3 illustrates a problem that may arise when using single-ended vectors in a frame rate converter. Objects 300 and 305 are moving at different speeds, giving rise to unequal motion vectors 320 and 325 for the blocks 310 and 315 respectively. In this example the vectors are converging. Interpolation of a new frame involves the creation of pixel data in areas 330 and 335, at the mid-points of the two vectors. Blocks 310 and 315 are adjacent, but the interpolated areas, 330 and 335 are not. This leads to a hole, 340, in the interpolated image. An alternative situation exists where vectors diverge, leading to overlap of interpolated areas. In either case, some effort is required to resolve holes and overlap areas, in order to produce an output image with one value at each pixel position.
FIG. 4 shows an example of double ended motion estimation. When used in the example application of a frame rate converter, this type of motion estimation has the significant advantage of producing exactly one value for each pixel position in the interpolated frame. The frame to be interpolated, 400, is divided into a regular array of blocks, 405, and motion estimation takes place for each block in turn. Motion estimation for block 405 involves searching the previous and next frames in the sequence for areas of image data that are most similar to each other. The search is constrained, in this example, by requiring that the offsets of the areas tested are equal in magnitude and opposite in direction with respect to the position of the block in the interpolated frame. In this example, the best match is found between area 410 in the previous frame and area 415 in the next frame, both of which are shown superimposed onto the grid of blocks in the interpolated frame. Note that neither area is necessarily aligned with the grid. The forward offset 420 is equal to the backward offset 425. In combination the two offsets may be said to be the motion vector of block 405, and represent the motion of an object in the interval between the two source frames. In the figures, where double ended motion vectors are shown, the component corresponding to the forward offset (such as 420) is shown with an open arrow head, and the component corresponding to the backward offset (such as 425) is shown with a solid arrow head.
Interpolation of pixel data in block 405 requires that pixel data be derived from pixel data in one or both of the areas 410 and 415. The alignment of the grid to the interpolated frame means that exactly one value is produced for each pixel position.
The example of FIG. 4 shows interpolation occurring at the temporal mid-point between two source frames. In frame rate conversion it is common that other interpolation phases are required, for example interpolation at one quarter of the interval between source frames. In such a situation several possibilities exist, one of which is illustrated in FIG. 5. A block 500 is motion estimated and interpolated using a method similar to that illustrated in FIG. 4. However, it is known that interpolation at one quarter of the frame interval is required, and so the offsets are scaled, before they are tested, such that the forward offset 505 is three times the size of the backward offset 510. The scaled offsets are then used in motion estimation and interpolation. This gives correct interpolation of object 515. Should further interpolations be required, for example at half and three-quarter intervals, further motion estimations are performed with forward and backward offset sizes adjusted accordingly.
Occluded and revealed areas of image present a problem for any motion estimation system, and particularly for a system using double-ended vectors. A common example occurs where an object moves across a background. At the leading edge of the moving object parts of the background are occluded, and at the trailing edge of the moving object parts of the background are revealed.
In a video encoder, it is not always necessary for motion vectors to reflect the actual motion of objects in the scene, provided that the vectors provide good pixel matches and therefore allow effective compression of the video. In a frame rate converter, however, an interpolated frame is created by rendering image data at intermediate positions determined by the motion vectors. It is therefore much more important that the motion vectors represent the true motion of the objects in the scene.