This invention relates to signal encoding. More particularly, this invention relates to coding of television image signals with the aid of motion compensated prediction.
In the case of television signals, it is often the case that successive image frames are almost identical. It is often that the scenes do not change from frame to frame and, hence, there is a substantial correlation between the signals of the two frames. Because of that, many systems have been devised to take advantage of this correlation by appropriately coding the image to reduce the overall number of bits that are required to describe the image.
Typically these coding approaches perform a prediction of the image based on the received frame and encode the difference between the frame and the prediction of the frame.
When a television scene contains moving objects and an estimate of their translation is available, a reasonably good prediction can still be performed on the television frame signal using the elements in the previous frame that are appropriately spatially displaced. Such prediction is called motion compensated prediction. In real scenes, motion can be a complex combination of translation and rotation. Different parts of a frame may be moved in different directions. Such motion is very difficult to estimate and may require large amounts of processing. Nevertheless, translational motion is easily estimated and has been used successfully for motion compensated coding. Its success depends on the amount of translational motion in the scene and the ability of an algorithm to estimate translation with the accuracy that is necessary for good prediction. The crucial problem is the algorithm used for motion estimation.
Most prior art algorithms for motion compensation in interframe coding make the following assumptions: (i) objects move in translation in a plane that is parellel to the camera plane; (ii) illumination is spatially and temporally uniform; and (iii) the covering and uncovering of one object by another are neglected. Stated in other words, the assumption is that one frame is merely a translation of the previous frame by a certain distance in the x direction and a certain other distance in the y direction. The motion compensation algorithm merely concentrates on selecting the x and y direction values that minimize the prediction error.
To improve the prediction, there are a number of block matching methods. In accordance with these methods, an image is divided into a fixed set of blocks, typically into a matrix of square blocks. A translation vector is determined for each of the blocks independently. Since each block is smaller, it follows that a better translation vector (smaller proportionate prediction error) can be found for each block, and that the overall prediction error can also be small.
A fairly extensive discussion of motion estimation is found in "Digtal Pictures" by A. N. Netravali and B. G. Haskell, Plenum Press, 1988, pg. 334 et seq.
One problem with the prior art block matching methods is that fixed sized blocks are used. This presents a problem in situations where some areas of the encoded image have a lot of change (such as a hidden image is suddenly uncovered) while other areas of the encoded image exhibit little change or very regular motion. In other words, in connection with some blocks a very good prediction of motion can be had while in connection with other blocks a good prediction of motion cannot be had. Of course, one could select a block size that insures a performance level that is acceptable under all image circumstance. However, HDTV transmission, particularly in terrestrial communication, offers only a limited bandwidth. That means that selecting a block size that permits a prediction error that is "good enough" is likely to exceed the bit budget that is allocated to the motion vectors.
It is an object of this invention to create a collection of motion vectors that translate large blocks or small blocks based on a measure of the prediction error. It is another object of this invention to select the areas where small translation blocks and large translation blocks are used to minimize the prediction errors within the bit budget allotted to motion vectors.