FIG. 1 shows a known form of video coder. Video signals (commonly in digital form) are received by an input buffer 1. A subtractor 2 forms the difference between the input and a predicted signal from a frame store 3 which is then further coded in box 4. The coding performed here is not material to the present invention, but may include thresholding (to suppress transmission of zero or minor differences) quantisation or transform coding for example. The input to the frame store is the sum, formed in an adder 5, of the prediction and the coded difference signal decoded in a local decoder 6 (so that loss of information in the coding and decoding process is included in the predictor loop).
The differential coding is essentially inter-frame, and the prediction could simply consist of a one-frame delay provided by the frame store 3; as shown however a motion estimator 7 is also included. This compares the frame of the picture being coded with the previous frame being supplied to the predictor. For each block of the current frame (into which the picture is regarded as divided) it identifies that region of the previous frame which the block most closely resembles. The vector difference in position between the identified region and the block in question is termed a motion vector (since it usually represents motion of an object within the scene depicted by the television picture) and is applied to a motion compensation unit 8 which serves to shift the identified region of the previous frame into the position of the relevant block in the current frame, thereby producing a better prediction. This results in the differences formed by the subtractor 2 being, on average, smaller and permits the coder 4 to encode the picture using a lower bit rate than would otherwise be the case. The motion vector is sent to a decoder along with the coded difference signal from 4.
The motion estimator must typically compare each block with the corresponding block of the previous frame and regions positionally shifted from that block position; in practical systems, this search is limited to a defined search area rather than being conducted over the entire frame, but even so it involves a considerable amount of processing and often necessitates many accesses to stored versions of both frames. Note that this requires that the input buffer 1 introduces sufficient delay that the motion estimator 7 has access to the current block and its search area to complete its motion estimation for that block before it arrives at the subtractor 2.
Usually the motion estimator regards a “current” frame of a television picture which is being coded as being divided into 8×8 blocks—that is, eight picture elements (pixels) horizontally by eight lines vertically. Although the principles are equally applicable to interlaced systems, for simplicity of description a non-interlaced picture is assumed. It is designed to generate for each block a motion vector which indicates the position of the 8×8 region, lying within a defined search area of the (or a) previous frame of the picture, which is most similar to the block in question. FIG. 2 illustrates a field with an 8×8 block N (shaded) and a typical associated 23×23 search area indicated by a rectangle SN. If the pixels horizontally and lines vertically are identified by co-ordinates x, y, with an origin at the top left-hand corner, then the search area for a block whose upper left hand corner pixel has co-ordinates xN,yN is the area extending horizontally from (xN−8) to (xN+14) and vertically from (yN−8) to (yN+14).
In order to obtain the motion vector it is necessary to conduct a search in which the block is compared with each of the 256 possible 8×8 regions of the previous frame lying within the search area—i.e. those whose upper left pixel has co-ordinates xN+u, yN+V where u and v are in the range −8 to +7. The motion vector is the values of u,v for which the comparison indicates the greatest similarity. The test for similarity can be any conventionally used—e.g. the sum of the absolute values of the differences between each of the pixels in the “current” block and the relevant region of the previous frame.
As this is computationally intensive, one known approach (see, for example, J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, “A novel unrestricted center-biased diamond search algorithm for block motion estimation”, IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 4, pp. 369-377, August 1998) is to make an initial estimate of the motion vector, and use this to define the position of an offset search area. This search area can then be smaller and the search can be performed more quickly. One such method involves an iterative approach in which this search area consist of just five positions—i.e. the positions, relative to the estimated offset position, of (0,0), (−1,0), (1,0), (0,−1) and (0,1). An updated estimate is the position represented by that one of these five positions that gives the smallest value of E. This is then repeated as necessary until no further improvement is obtained—i.e. relative position (0,0) gives the smallest E.
Chen and Li, “A Novel Flatted Hexagon Search Pattern for Fast Block Motion Estimation”, 2004 International Conference on Image Processing (ICIP), IEEE, vol. 3, pp. 1477-1480, notes that the probability of horizontally-biased motion is greater than that of vertically-biased motion and seeks to improve searching speed by selecting a search area in the shape of a flattened hexagon.
Possible options for the initial estimate include:
the motion vector already generated for the correspondingly positioned block of the preceding (or a preceding) frame;
the motion vector already generated for the block immediately above the block under consideration;
the motion vector already generated for the block immediately to the left of the block under consideration.
One useful method (Tourapis, A. et al. Predictive Motion Vector Field Adaptive Search Technique (PMVFAST). Dept. of Electrical Engineering, The Hong Kong University of Science and Technology, available online at: http://www.softlab.ntua.gr/˜tourapis/papers/4310-92.pdf) involves using all three, and (0,0) and choosing the one that gives the lowest E, and using the chosen one as the starting point for the iteration.
Some proposals introduce the idea of a global motion vector—that is, a vector that identifies a single shift that when applied to the whole frame of a television picture produces a good prediction. See, for example, Hirohisa Jozawa, Kazuto Kamikura, Atsushi Sagata, Hiroshi Kotera, and Hiroshi Watanabe, “Two-Stage Motion Compensation Using Adaptive Global MC and Local Affine MC”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 7, No. 1 Feb. 1997, pp. 75-85. Again, the global vector can provide the starting point for a search for a better vector. Winger (US2004/0013199A) obtains two global motion estimates, which he uses to provide the initial offsets for two simultaneous searches.
Sun and Lei, in “Efficient Motion estimation using Global Motion Predictor”,
Proceedings of the Fifth lasted International Conference of Signal and Image Processing, 2003, pp. 177-182, propose a method of global motion estimation in which the amount of computation is reduced by considering only subregions of the image in the vicinity of the frame boundaries, amounting to roughly 5% of the frame.