Identification of motion in video sequences using block based matching techniques is well known. These methods generally consider two consecutive frames from the video sequence and subdivide them into multiple regions known as blocks or macroblocks. In a motion search procedure, each block is compared with pixel data from various candidate locations in the previous frame. The relative position of the best match gives a vector that describes the motion in the scene at that block position. Collectively, the set of motion vectors at each block position in a frame is known as the motion vector field for that frame. Note that use of the term “vector field” should not be confused with the use of “field” or “video field” to describe the data in an interlaced video sequence, as described below.
Video sequences typically comprise a series of non interlaced frames of video data, or a series of interlaced fields of video data. The interlaced sequences are produced by fields which carry data on alternate lines of a display, such that a first field will carry data for alternate lines, and a second field will carry data for the missing lines. The fields are thus spaced both temporally and spatially. Every alternate field in a sequence will carry data at the same spatial locations.
FIG. 1 illustrates a typical example of a block matching motion estimator. In all the figures, including FIG. 1, motion vectors are shown with the head of the arrow at the centre of the block to which the vector corresponds. The frames are divided into blocks, and an object 101 in the previous frame has moved to position 102 in the current frame. The previous position of the object is shown superimposed on the current frame as 103. Motion estimation is performed for blocks rather than for objects, where a block of pixels in the current frame is matched with a block sized pixel area in the previous frame which is not necessarily block aligned. For example, block 104 is partially overlapped by the moving object 102, and has contents as illustrated at 105. Motion estimation for block 104, if it performs well, will find the pixel data area 106 in the previous frame, which can also be seen to contain the pixels illustrated in 105, i.e. a good match has been found. Superimposed back onto the current frame, the matching pixel data area is at 107. The motion vector associated with block 104 is therefore as illustrated by arrow 108.
Many block based motion estimators select their output motion vector by testing a set of motion vector candidates with a method such as a sum of absolute differences (SAD) or mean of squared differences (MSD), to identify motion vectors which give the lowest error block matches. FIG. 2 illustrates the candidate evaluation process for the block 201 in the current frame which has pixel contents shown in 211. In this simple example system, three motion vector candidates 206, 207 and 208 are considered which correspond to candidate pixel data areas at locations 202, 203 and 204 in the previous frame. The pixel contents of these pixel data areas can be seen in 212, 213 and 214 respectively. It is apparent that the pixel data at location 202 provides the best match for block 201 and should therefore be selected as the best match/lowest difference candidate. Superimposed back onto the current frame, the matching pixel data area is at 205 and the associated motion vector is 206.
Different systems have different requirements of the motion estimation. In a video encoder, the requirement is to form the most compact representation of a frame, by reference to a previous frame from the sequence. The requirement is generally to find motion vectors which give the lowest error block matches, and while the resulting motion vectors are usually representative of the actual motion of objects in the scene, there is no requirement that this is always the case. In other applications, such as de-interlacing or frame rate conversion, it is more important that the motion vectors represent the true motion of the scene, even if other distortions in the video mean that the block matches do not always give the lowest error. By applying appropriate constraints to the candidate motion vectors during motion search, the results can be guided towards “lowest error” or “true motion” as necessary.
Motion vectors are known to be highly correlated both spatially and temporally with vectors in adjacent blocks, so these neighbouring vectors are often used as the basis for the candidates in the motion estimator. A random element may also be incorporated into the candidates to allow the system to adapt as the motion in the video changes. Where a block has motion that is not simply predicted by its neighbours, a conventional system relies on random perturbation of vector candidates. This works well for slowly changing vector fields, but tends not to allow the motion estimator to converge rapidly to a new vector where it is very different to its neighbours. A system relying on randomness may wander towards the correct motion over time, but is prone to becoming stuck in local minima, or converging so slowly that the motion has changed again by the time it gets there. The number of candidate motion vectors tested for each block is often a compromise between choosing a set large enough to identify true motion and/or provide good matches with a low residual error, while being small enough to minimize computational expense.
The present invention presents an efficient method of generating candidate motion vectors that are derived from the physical momentum and acceleration present in real world objects. As such, they are highly likely to be representative of the true motion of the scene. Such candidates may be unavailable through other vector propagation techniques using temporally and spatially derived candidates, and provide a more efficient method of tracking motion and adapting to changing motion than a system that relies entirely on randomness. The present invention may not remove the need for randomness entirely, but a single candidate motion vector that predicts the motion accurately is clearly better than several random guesses which may or may not select the correct vector. The present invention may allow fewer random candidates to be used or, more likely, to allow faster convergence in areas of rapid or changing motion.
Many motion estimations (e.g. de Haan et al. True-Motion Estimation with 3-D Recursive Search Block Matching, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 3, No. 5, October 1993) use a temporal vector as one of the candidate motion vectors in the motion estimator. The temporal vector candidate is taken from a block in the same position, or in a similar position, to the current block, but using the motion estimation result that was derived for that block during the motion estimation processing of a previous frame. The use of the temporal vector candidate is based on the assumption that objects are larger than blocks and that if an object at a certain block location is moving with a particular velocity in the past then new content arriving in the block is likely to continue to move with the same speed and direction. The assumption of continuing motion is reasonable because objects in the real world exhibit the physical property of momentum, and so the temporal vector provides a useful candidate motion vector.
The concept of block acceleration has also been used in the prior art, for example, to generate predictors for a static block location in the Enhanced Predictive Zonal Search (EPZS) technique in MPEG-4 video encoding. In this method, a block's acceleration is calculated by considering the differentially increasing/decreasing motion vectors present at a fixed block location over two frames and storing the resulting ‘accelerator motion vector’ in the same block position for use in the following frame. FIG. 3 illustrates the formation of an ‘accelerator motion vector’, 306, for block 303 in the current frame. Blocks 303, 302 and 301 are at the same spatial block location in the current frame, previous frame and the frame before that respectively and can therefore be said to be co-located. The differential acceleration of the co-located block can be seen to be the sum of the motion vector 305 from frame n−1, and the differential change in motion between the frames n−1 and n−2. The vector 306 is therefore given by two times the vector 305, minus the vector 304.