High compression gains for video sequences can be achieved by removing temporal redundancies between images. For example, to encode an image, a temporal prediction of the image to be encoded is generated based on previously encoded images. The temporal prediction is compared with the actual image to determine the prediction error, and the prediction error is encoded. The prediction can be made with conventional block-based motion estimation and compensation methods.
Motion compensation and estimation methods are used to find a reference block in one or more reference images to predict the location of a corresponding target block in the target image, so that only motion vectors and a prediction residual of the target needs to be coded. These methods perform block matching to identify a reference block of pixels in the reference image that is most similar to a corresponding target block in the target image. The pixel distance between the reference block and corresponding target block is the motion vector for the target block.
The motion compensation procedure begins by tiling the target image and the reference image into fixed size blocks. FIG. 1 shows two standard shape definitions that are typically applied to the reference and target images. The first definition 110 tiles an image using non-overlapping 16-by-16 blocks of pixels. The second definition 120 uses 4-by4 blocks. These fixed size blocks, which are located at pre-defined areas in the target and reference images, are unrelated to the shapes and locations of objects in the image.
Generally, pixels on the target image are assigned to exactly one block. This one-to-one mapping is referred to as tiling. Each block is assigned a motion vector that maps it to a corresponding portion on a reference image. A compensated image, which approximates the target image, is then formed by using the mapping defined by the motion vectors to identify the corresponding reference blocks and then copying the pixel values from the reference block to the area defined by the target block.
The error between the desired target image and the compensated image is determined, and a residual correction for this error is then encoded. It is assumed that both the encoder and decoder have access to the same reference images. Therefore, only the motion vectors and residual corrections are transmitted to accomplish video coding.
A successful video coder balances many factors to generate a high-quality target image using limited computational resources. Of all these factors, the selection of reference data is possibly the most critical to video quality and the most costly in terms of computational resources. For example, if an object moves from one location in the reference image to another location in the target image, a motion vector should be able to map the object from the reference image to the target image.
However, conventional motion compensation methods use motion vectors that map pre-determined blocks, which rarely correspond to boundaries of moving objects. Because the shapes of the pre-determined blocks are unrelated to natural contours on moving objects in the reference image, the motion vector maps to a reference block that is partially related to the moving object, and partially unrelated to the moving object. This causes an increase in the error of the compensated image.
Therefore, there is a need for a method to identify natural contours of moving objects in the reference images to improve the quality of motion compensation.