Due to the huge size of raw digital video data, compression must be applied to video signals so that they may be transmitted and stored. There are many international standards for video compression including ISO MPEG-1, MPEG-2, MPEG4, ITU-T H.261, H263, H263+. These are commonly used in video distribution such as in VCD, DVD, DVB, HDTV, video conferencing, video editing, video streaming over the Internet, etc.
One common feature among these standards is that motion estimation is used to reduce temporal redundancy inherent in video sequences. In motion estimation, each frame (or picture) is divided into square blocks typically containing 16×16 picture elements (or pixels). For each such present block, full exhaustive search is typically performed within a predefined search area of a reference frame which may be the previous frame or a future frame in a video sequence, to find the block in the reference frame which is most similar to the present block according to the predefined matching criterion (In fact, there are algorithms which use a plurality of reference frames, “multiframe algorithms”; in MPEG, we use a previous frame (possibly with a distance greater than 1) and a forward (future) frame (again possibly with a distance greater than 1). The matching criterion is typically the sum of the absolute differences (SAD):SAD=Σi=118Σj−116|Xij−−Yij|for a 16×16 block, where X ij is the value at pixel (i,j) of an image frame to be encoded, and Y ij is the value at pixel (i,j) of the reference frame. Alternatively, other common matching criteria include the sum of square differences (SSD):SSD=Σi=118Σj=118(Xij−−Yij)2, the mean absolute difference (MAD) (which is SAD divided by 256) and the mean square error (MSE) or mean square difference (MSD) which are both equal to SSD divided by 256.
The difference in location of present block and the most similar block of the reference frame is called the motion vector of the present block. Thus, the present block may be compressed as (i) the motion vector and (ii) the difference between the present block and the most similar block corresponding to the motion vector of the reference frame.
It is well known that the exhaustive full search (FS) is very slow, typically requiring 109 operations per second for standard television signals. It is highly desirable to develop fast motion estimation algorithms without significantly affecting the visual quality of the image which can be reproduced from the compressed image signal.
U.S. Pat. No. 5,757,668 proposed a motion estimation in which the search for a matching block terminated as soon as the matching criterion (e.g. SAD) for a block of the reference signal was below a threshold. The threshold was adaptive.
U.S. Pat. No. 4,838,685, U.S. Pat. No. 4,661,849, U.S. Pat. No. 4,853,775, and U.S. Pat. No. 5,089,887 start with an initial motion vector estimate, and refined the estimate by a velocity field gradient descent. Such iterative gradient descent algorithms tend to be slow.
U.S. Pat. No. 5,583,580, U.S. 5,635,994, U.S. Pat. No. 5,610,658, U.S. Pat. No. 5,717,470, U.S. Pat. No. 5,926,231, U.S. Pat. No. 5,594,504, U.S. Pat. No. 5,754,237, U.S. Pat. No. 5,731,850, U.S. Pat. No. 5,608,458 and U.S. Pat. No. 5,742,710 propose motion vector estimation using a hierarchical search. Generally in such techniques a first level searching is performed using a subsampled version of the frame; in a second level, local searching is performed. Such techniques are particularly inefficient in small size video such as QCIF.
U.S. Pat. No. 5,8189,69 proposes estimating motion vectors from immediately adjacent blocks and those in previous images. If the region is uniform, such a search may be fast. Otherwise, a more extensive search is required, and local fine scale search is performed. This approach does not take into account consideration of the bit rate requirement of the motion vectors. For complex scenes the complexity of the algorithm is great. Similar techniques, which are accompanied by the same weaknesses, are proposed in U.S. Pat. No. 5,81,969 and U.S. Pat. No. 5,428,403.
U.S. Pat. No. 5,764,787 proposes motion estimation by loading consecutive pixel values into fields of a register. Byte-based SIMD architecture and instructions can help speed up motion estimation.
U.S. Pat. No. 5,812,199 and U.S. Pat. No. 5,739,872 perform motion estimation by pixel subsampling and search area subsampling. This method does not take into account the bit rate requirement of the motion vectors, and should achieve poor complexity-quality trade off.
The disclosure of all of the above documents is incorporated herein by reference.
The present inventors have proposed an algorithm, circular zonal search (CZS), in which a search is first performed using a block at the centre of the image, searching for the reference image starting with blocks having a central pixel at the centre of the reference image. Specifically, a number of zones in the reference image are defined, as circular regions one pixel in radial extent and centred on the centre of the image. The search is performed for each of these zones in turn, working radially outwards, and in each zone comparing the blocks centred on the pixels of that zone with the block being coded. Instead of searching for the best possible matching image in the reference frame, the search terminates when the mismatch between a block of the reference frame and the block being coded is less than a threshold. This technique exploits the fact that most video sequences are centre biased, so the centre of the search area is most likely to be the optimal block. The remaining search points have decreasing likelihood to be optimal, according to how far they are away from the centre. A drawback of CZS is that the advantage of the algorithm over FS is low if the threshold is too low.