The motion compensated prediction is a key element of majority of video coding schemes. FIG. 1 is a schematic diagram of an encoder for compression of video sequences using motion compensation. Essential elements in the encoder are a motion compensated prediction block 1, a motion estimator 2 and a motion field coder 3. The operating principle of the motion compensating video coders is to compress the prediction error E.sub.n (x,y), which is a difference between the incoming frame I.sub.n (x,y) being coded called the current frame and a prediction frame I.sub.n (x, y), wherein: EQU E.sub.n (x, y)=I.sub.n (x, y)-I.sub.n (x, y) (1)
The prediction frame I.sub.n (x, y) is constructed by the motion compensated prediction block 1 and is built using pixel values of the previous, or some other already coded frame denoted I.sub.n-1 (x, y), called a reference frame, and the motion vectors of pixels between the current frame and the reference frame. Motion vectors are calculated by the motion field estimator 2 and the resulting vector field is then coded in some way before applying to the predictor block 1. The prediction frame is then: EQU I.sub.n (x, y)=I.sub.n-1 [x+.DELTA.x(x, y), y+.DELTA.y(x, y)](2)
The pair of numbers [x+.DELTA.x(x,y), y+.DELTA.y(x,y)] is called the motion vector of pixel in location (x, y) in the current frame, whereas .DELTA.x(x,y) and .DELTA.y(x,y) are the values of horizontal and vertical displacement of this pixel. Set of motion vectors of all pixels in the current frame I.sub.n (x,y) is called motion vector field. The coded motion vector field is also transmitted as motion information to the decoder.
In the decoder, FIG. 2, pixels of the current frame I.sub.n (x,y) are reconstructed by finding the pixels' predictions I.sub.n (x, y) in the reference frame I.sub.n-1 (x, y). The motion compensated prediction block 21 generates the prediction frame using the received motion information and the reference frame I.sub.n-1 (x, y) (in this picture the reference frame is the same as the current frame). In the prediction error decoder 22 decoded prediction error E.sub.n (x,y) is then added with the prediction frame, the result being the original current frame I.sub.n.
The general object of the motion compensated (MC) prediction is to minimize amount of information which needs to be transmitted to the decoder. It should minimize the amount of prediction error measured, e.g., as the energy of E.sub.n (x,y), and minimize the amount of information needed to represent motion vector field.
The document H. Nguen, E. Dubois, "Representation of motion information for image coding". Proc. Picture Coding Symposium '90, Cambridge, Mass., Mar. 26-18, 1990, pages 841-845, gives a review of motion field coding techniques. As a rule of the thumb reduction of prediction error requires more sophisticated motion field, i.e., more bits must be spent on its encoding. Therefore the overall goal of the video encoding is to encode as compactly as possible the motion vector field keeping at the same time the measure of prediction error as low as possible.
The motion field estimation block 1, FIG. 1, calculates motion vectors of all the pixels of a given segment which minimize some measure of prediction error in this segment, for example square prediction error. Motion field estimation techniques differ both in the model of the motion field and in the algorithm for minimisation of the chosen measure of prediction error.
Due to very large number of pixels in the frame it is not efficient to transmit a separate motion vector for each pixel. Instead, in most of the video coding schemes the current frame is divided into larger image segments so that all motion vectors of the segment can be described by few parameters. Image segments can be square blocks, e.g. 16.times.16 pixels blocks are used in codecs in accordance with international standard ISO/IEC MPEG-1 or ITU-T H.261, or they can comprise of completely arbitrarily shaped regions obtained for instance by a segmentation algorithm. In practice segments include at least few tens of pixels.
In order to compactly represent the motion vectors of the pixels in the segment it is desirable that their values are described by a function of few parameters. Such function is called motion vector field model. A known group of models are linear motion models, in which motion vectors are linear combinations of motion field basis functions. In such models the motion vectors of image segments are described by a general formula: ##EQU1## where parameters c.sub.i are called motion coefficients and are transmitted to the decoder. Functions f.sub.i (x,y) are called motion field basis functions which are fixed and known both to encoder and decoder.
The problem when using the linear motion model having the above described formula is how to minimize the number of motion coefficients c.sub.i which are sent to the decoder, preserving at the same time as low measure of prediction error E.sub.n (x,y) as possible. This process is performed in the encoder by the motion field coding block 3, see FIG. 1. It is performed after computationally very complex motion field estimation which is accomplished by the block 2. It is therefore crucial that motion field coding is computationally simple so that it does not impose additional burden on the encoder.
The total number of motion coefficients which needs to be sent to the decoder depends both on number of segments in the image and number of motion coefficients per segment. Therefore, there is at least two ways to reduce the total number of motion coefficients.
The first way is to reduce the number of segments by combining (merging) together those segments which can be predicted with a common motion vector field without causing a large increase of prediction error. The number of segments in the frame can be reduced because very often the adjacent, i.e. neighbouring, segments can be predicted well with the same set of motion coefficients. Process of combining such segments is called motion assisted merging.
The second way is to select for each segment a motion model which allows achieving satisfactorily low prediction error with as few coefficients as possible. Since the amount and the complexity of the motion varies between frames and between segments it is not efficient to use always all N+M motion coefficients per segment. It is necessary to find out for every segment what is the minimum number of motion coefficients which allows to achieve satisfactorily low prediction error. The process of such adaptive selection of coefficients is called motion coefficient removal.
FIG. 3 shows a frame divided into segments. The prior art techniques for motion coefficient coding include several techniques for motion assisted merging. After motion vectors of all the segments have been estimated, the motion assisted merging is performed. It is done by considering every pair of adjacent segments S.sub.i and S.sub.j with their respective motion coefficients c.sub.i and c.sub.j. Area of combined segments S.sub.i and S.sub.j is denoted S.sub.ij. If the area S.sub.ij can be predicted with one set of motion coefficients c.sub.ij without causing excessive increase of prediction error over the error resulting from separate predictions of S.sub.i and S.sub.j, then S.sub.i and S.sub.j are merged. The methods for motion assisted merging differ essentially in the way of finding a single set of motion coefficients c.sub.ij which allow a good prediction of segments combined together.
One method is known as merging by exhaustive motion estimation. This method estimates "from scratch" a new set of motion parameters c.sub.ij for every pair of adjacent segments S.sub.i and S.sub.j. If the prediction error for S.sub.ij is not excessively increased then the segments S.sub.i and S.sub.j are merged. Although this method can very well select the segments which can be merged it is not feasible for implementation because it would increase the complexity of the encoder typically by several orders of magnitude.
Another method is known as merging by motion field extension. This method tests whether area of S.sub.ij can be predicted using either motion parameters c.sub.i or c.sub.j without an excessive increase of the prediction error. This method is characterised by very low computational complexity because it does not require any new motion estimation. However, it fails to merge segments very often because motion compensation with coefficients calculated for one segment very rarely predicts well also the adjacent segments.
Still another method is known as merging by motion field fitting. In this method the motion coefficients c.sub.ij are calculated by the method of approximation. This is done by evaluating few motion vectors in each of the segments. Some motion vectors in segments S.sub.i and S.sub.j are depicted in FIG. 4. The motion field for the segment S.sub.ij is done by fitting a common motion vector field through these vectors using some known fitting method. The disadvantage of the method is that motion field obtained by fitting is not precise enough and often leads to unacceptable increase of prediction error.
Methods for performing motion estimation with different models and selecting the most suitable one are proposed in the documents H. Nicolas and C. Labit, "Region-based motion estimation using deterministic relaxation schemes for image sequence coding," Proc. 1994 International Conference on Acoustics, Speech and Signal Processing, pp. III265-268 and P. Cicconi and H. Nicolas, "Efficient region-based motion estimation and symmetry oriented segmentation for image sequence coding," IEEE Tran. on Circuits and Systems for Video Technology, Vol. 4, No. 3, June 1994, pp. 357-364. The methods try to adapt the motion model depending on complexity of the motion by performing motion estimation with different models and selecting the most suitable one. The main disadvantage of these methods is their high computational complexity and low amount of different motion field models which can be tested in practice.
None of afore described methods alone solves the problem how to minimize the number of motion coefficients c.sub.i which are sent to the decoder, preserving at the same time as low measure of prediction error E.sub.n (x,y) as possible.