Motion compensated prediction is a key element of the majority of video coding schemes. FIG. 1 is a schematic diagram of an encoder for compression of video sequences using motion compensation. Essential elements in the encoder are a motion compensated prediction block 1, a motion estimator 2 and a motion field coder 3. The operating principle of the motion compensating video encoder is to compress the prediction error E.sub.n (x,y), which is a difference between the incoming frame I.sub.n (x,y) to be coded, called the current frame, and a prediction frame P.sub.n (x,y), wherein: EQU E.sub.n (x,y)=I.sub.n (x,y)-P.sub.n (x,y) (1)
Compression of the prediction error E.sub.n (x,y) typically introduces some loss of information. The compressed prediction error, denoted E.sub.n (x,y), is sent to the decoder. The prediction frame P.sub.n (x,y) is constructed by the motion compensated prediction block 1 and is built using pixel values of the previous, or some other already coded frame denoted R.sub.ref (x,y), called a reference frame, and motion vectors describing estimated movements of pixels between the current frame and the reference frame. Motion vectors are calculated by motion field estimators 2 and the resulting motion vector field is then coded in some way before applying it to the predictor block 1. The prediction frame is then: EQU P.sub.n (x,y)=R.sub.ref [x+.DELTA.x(x,y),y+.DELTA.y(x,y)] (2)
The pair of numbers [.DELTA.x(x,y), .DELTA.y(x,y)] is called the motion vector of a pixel in location (x,y) in the current frame, whereas .DELTA.x(x,y) and .DELTA.y(x,y) are the values of horizontal and vertical displacement of this pixel. The set of motion vectors of all pixels in the current frame I.sub.n (x,y) is called motion vector field. The coded motion vector field is also transmitted as motion information to the decoder.
In the decoder shown in FIG. 2, pixels of the current frame I.sub.n (x,y) are reconstructed by finding the pixels' predictions P.sub.n (x,y) from the reference frame R.sub.ref (x,y). The motion compensated prediction block 21 generates the prediction frame using the received motion information and the reference frame R.sub.ref (x,y). In the prediction error decoder 22 a decoded prediction error frame E.sub.n (x,y) is then added with the prediction frame, the result being the approximated current frame I.sub.n.
The general object of the motion compensated (MC) prediction encoder is to minimise the amount of information which needs to be transmitted to the decoder. It should minimise the amount of prediction error measured according to some criteria, e.g. the energy associated with E.sub.n (x,y), and minimise the amount of information needed to represent the motion vector field.
The document N. Nguen, E. Dubois, "Representation of motion information for image coding". Proc. Picture Coding Symposium '90, Cambridge, Mass., Mar. 26-18, 1990, pages 841-845, gives a review of motion field coding techniques.
As a rule of the thumb, a reduction of prediction error requires a more refined sophisticated motion field, i.e. more bits must be spent on its encoding. Therefore the overall goal of the video encoding is to encode the motion vector field as compactly as possible keeping at the same time the measure of prediction error as low as possible.
Due to the very large number of pixels in the frame, it is not efficient to transmit a separate motion vector for each pixel. Instead, in most of the video coding schemes the current frame is divided into image segments, as shown for example in FIG. 3, so that all motion vectors of the segment can be described by a few parameters. Image segments can be square blocks. For example 16.times.16 pixel blocks are used in codecs in accordance with international standard ISO/IEC MPEG-1 or ITU-T H.261, or they can comprise completely arbitrarily shaped regions obtained for instance by a segmentation algorithm. In practice segments include at least a few tens of pixels.
The motion field estimation block 1 of FIG. 1 calculates motion vectors of all the pixels of a given segment which minimise some measure of prediction error in this segment, for example the square prediction error. Motion field estimation techniques differ both in the model of the motion field and in the algorithm for minimisation of the chosen measure of prediction error.
In order to compactly represent the motion vectors of the pixels in the segments it is desirable that their values are described by a function of few parameters. Such a function is called a motion vector field model. A known group of models are linear motion models, in which motion vectors are approximated by linear combinations of motion field basis functions. In such models the motion vectors of image segments are described by a general formula: ##EQU1##
where parameters c.sub.i are called motion coefficients and are transmitted to the decoder. Functions f.sub.i (x,y) are called motion field basis functions and they have a fixed form known to both encoder and decoder.
The problem when using the linear motion model having the above described formula is how to minimise in a computationally simple manner the number of motion coefficients c.sub.i which are sent to the decoder, keeping at the same time some measure of distortion, e.g. a chosen measure of prediction error E.sub.n (x,y), as low as possible.
The total amount of motion data which needs to be sent to the decoder depends both on the number of segments in the image and the number of motion coefficients per segment. Therefore, there exist at least two ways to reduce the total amount of motion data.
The first way is to reduce the number of segments by combining (merging) together those segments which can be predicted with a common motion vector field model without causing a large increase of prediction error. The number of segments in the frame can be reduced because very often adjacent, i.e. neighbouring, segments can be predicted well with the same set of motion coefficients. The process of combining such segments is called motion assisted merging. FIG. 3 shows a frame divided into segments. The prior art techniques for motion coefficient coding include several techniques for motion assisted merging. After motion vectors of all the segments have been estimated, motion assisted merging is performed. It is done by considering every pair of adjacent segments S.sub.i and S.sub.j with their respective motion coefficients c.sub.i and c.sub.j. The area of combined segments S.sub.i and S.sub.j is denoted S.sub.ij. If the area S.sub.ij can be predicted with one set of motion coefficients c.sub.ij without causing excessive increase of prediction error over the error resulting from separate predictions of S.sub.i and S.sub.j, then S.sub.i and S.sub.j are merged. The methods for motion assisted merging differ essentially in the way of finding a single set of motion coefficients c.sub.ij which allow a good prediction of segments combined together.
One method is known as merging by exhaustive motion estimation. This method estimates "from scratch" a new set of motion parameters c.sub.ij for every pair of adjacent segments S.sub.i and S.sub.j. If the prediction error for S.sub.ij is not excessively increased then the segments S.sub.i and S.sub.i are merged. Although this method can very well select the segments which can be merged it is not feasible for implementation because it would increase the complexity of the encoder typically by several orders of magnitude.
Another method is known as merging by motion field extension. This method tests whether area of S.sub.ij can be predicted using either motion parameters c.sub.i or c.sub.j without an excessive increase of the prediction error. This method is characterised by very low computational complexity because it does not require any new motion estimation. However, it very often fails to merge segments because motion compensation with coefficients calculated for one segment very rarely predicts well also the adjacent segments.
Still another method is known as merging by motion field fitting. In this method the motion coefficients c.sub.ij are calculated by the method of approximation. This is done by evaluating a few motion vectors in each of the segments. Some motion vectors in segments S.sub.i and S.sub.j are depicted in FIG. 4. The motion field for the segment S.sub.ij is determined by fitting a common motion vector field through these vectors using some known fitting method. The disadvantage of the method is that the motion field obtained by fitting is not precise enough and often leads to an unacceptable increase of prediction error.
The second way to minimise the number of motion coefficients is to select for each segment a motion model which allows achieving satisfactorily low prediction error with as few coefficients as possible. Since the amount and the complexity of the motion varies between frames and between segments it is not efficient to always use all N+M motion coefficients per segment. It is necessary to find out for every segment what is the minimum number of motion coefficients which yields a satisfactorily low prediction error. Such a process of adaptive selection of coefficients is called motion coefficient removal.
Methods for performing motion estimation with different models and selecting the most suitable one are proposed in H. Nicolas and C. Labit, "Region-based motion estimation using deterministic relaxation schemes for image sequence coding," Proc. 1994 International Conference on Acoustics, Speech and Signal Processing, pp. III265-268 and P. Cicconi and H. Nicolas, "Efficient region-based motion estimation and symmetry oriented segmentation for image sequence coding," IEEE Tran. on Circuits and Systems for Video Technology, Vol. 4, No. 3, June 1994, pp. 357-364. The methods try to adapt the motion model depending on the complexity of the motion by performing motion estimation with different models and selecting the most suitable one. The main disadvantage of these methods is their high computational complexity and the small number of different motion field models which can be tested in practice.
Although the afore-described methods reduce the amount of motion information sent to the decoder to some extent while maintaining the accuracy of predicted image at a reasonable level, there is still a desire to further reduce that amount.