The present invention relates to video data encoding and decoding, and more particularly relates to the compression and decompression of video data using motion compensated prediction.
A schematic diagram of a video coding system using motion compensated prediction is shown in FIG. 1 and FIG. 2 of the accompanying drawings. FIG. 1 illustrates an encoder and FIG. 2 illustrates a corresponding decoder. Motion compensated prediction in such a system is outlined below.
In typical video sequences the change in the content of successive frames is to a great extent the result of the motion in the scene. This motion may be due to camera motion or due to motion of an object depicted in the scene. Therefore, typical video sequences are characterised by significant temporal correlation, which is highest along the trajectory of the motion, and efficient compression of video sequences requires exploitation of this property of video sequences. Motion compensated (MC) prediction is a widely recognised technique for compression of video. It utilises the fact that in a typical video sequence, the image intensity value in a particular frame can be predicted using image intensities of some other already coded frame, given the motion trajectory between these two frames.
In the encoder shown in FIG. 1 the Motion Estimation block calculates motion vectors (xcex94x(x,y),xcex94y(x,y)) of pixels between the frame being coded In(x,y), called the current frame, and a reference frame denoted Rref(x,y).
The reference frame is one of the previously coded frames (e.g. the frame preceding the one being coded) which at a given instant is available in the Frame Memory of the encoder and of the decoder. The pair of numbers (xcex94x(x,y),xcex94y(x,y)) is called the motion vector of the pixel at location (x,y) in the current frame, and xcex94x(x,y) and xcex94y(x,y) are the values of horizontal and vertical displacement, respectively.
The set of motion vectors of all pixels in the current frame, called a motion vector field, is compressed by the Motion Field Coding block and transmitted to the decoder. To indicate that the compression of the motion vector field is typically lossy the compressed motion vectors are denoted as ({tilde over (xcex94)}x(x,y),{tilde over (xcex94)}y(x,y)). In the Motion Compensated (MC) Prediction block, the compressed motion vectors ({tilde over (xcex94)}x(x,y),{tilde over (xcex94)}y(x,y)) and the reference frame are used to construct the prediction frame Pn(x,y):
Pn(x,y)=Rref(x+{tilde over (xcex94)}x(x,y),y+{tilde over (xcex94)}y(x,y)).xe2x80x83xe2x80x83(1)
The prediction error, i.e., the difference between the current frame In(x,y) and the prediction frame Pn(x,y):
En(x,y)=In(x,y)xe2x88x92Pn(x,y),xe2x80x83xe2x80x83(2)
is compressed and sent to the decoder. The compressed prediction error is denoted as {tilde over (E)}n(x,y).
In the decoder shown in FIG. 2 pixels of the current coded frame Ĩn(x,y) are reconstructed by finding the prediction pixels in the reference frame Rref(x,y) using the received motion vectors and by adding the received prediction error {tilde over (E)}n(x,y), i.e.,
xe2x80x83Ĩn(x,y)=Rref(x+{tilde over (xcex94)}x(x,y),y+{tilde over (xcex94)}y(x,y)+{tilde over (E)}n(x,y).xe2x80x83xe2x80x83(3)
Ĩn(x,y) is not identical to In(x,y), due to the loss introduced in coding. The difference between the coded frame and the original frame
Dn(x,y)=In(x,y)xe2x88x92Ĩn(x,y)xe2x80x83xe2x80x83(4)
is called the reconstruction error.
An objective of motion compensated prediction is to find an optimum trade-off between the amount of information which needs to be transmitted to the decoder and the loss introduced in encoding, i.e.,
1. minimize the amount of prediction error, and
2. minimize the amount of information needed to represent motion vector field.
Due to the very large number of pixels in a frame it is not efficient to transmit a separate motion vector for each pixel. Instead, in most video coding schemes, the current frame is divided into larger image segments so that all motion vectors of the segment can be described by a few coefficients. Depending on the way the current frame is divided into the segments, two types of motion compensator coders can be distinguished;
1. Block based coders where the current frame is divided into fixed and known blocks e.g. 16xc3x9716 pixel blocks in International Standard ISO/IEC MPEG 1 or ITU-TH.T6 1 codecs (see FIG. 3a), or
2. Segmentation based, i.e. region based, coders where the current frame is divided into arbitrarily shaped segments, e.g. obtained by a segmentation algorithm FIG. 3b. 
A frame of a typical video sequence contains a number of objects with different motion. MC prediction is performed by dividing the frame In(x,y) into several segments Sk and estimating the motion of these segments between that frame and the reference frame Rref(x,y). In practice, a segment includes at least a few tens of pixels. In order to represent the motion vectors of these pixels compactly, it is desirable that their values are described by a function of few parameters. Such a function is called a motion vector field model. Motion compensated video coding schemes approximate the motion vectors of an image segment using a general formula:                                                         Δ              ^                        ⁢                          x              ⁡                              (                                  x                  ,                  y                                )                                              =                                    ∑                              n                =                1                            N                        ⁢                                          c                n                            ⁢                                                f                  n                                ⁡                                  (                                      x                    ,                    y                                    )                                                                    ,                            (        5        )                                                                    Δ              ^                        ⁢                          y              ⁡                              (                                  x                  ,                  y                                )                                              =                                    ∑                              n                =                                  N                  +                  1                                                            N                +                M                                      ⁢                                          c                n                            ⁢                                                f                  n                                ⁡                                  (                                      x                    ,                    y                                    )                                                                    ,                            (        6        )            
where parameters cn are called motion coefficients and are compressed and transmitted to the decoder. The compressed motion coefficients will be denoted as {tilde over (c)}n. Functions fn are called basis functions and have to be known both to the encoder and decoder. Segmentation information is an inherent part of motion representation and it also needs to be coded and transmitted to the decoder. In the decoder, segmentation information and coefficients {tilde over (c)}n are used to obtain the compressed motion vector field for each segment:                                                         Δ              ~                        ⁢                          x              ⁡                              (                                  x                  ,                  y                                )                                              =                                    ∑                              n                =                1                            N                        ⁢                                                            c                  ~                                n                            ⁢                                                f                  n                                ⁡                                  (                                      x                    ,                    y                                    )                                                                    ,                                            Δ              ~                        ⁢                          y              ⁡                              (                                  x                  ,                  y                                )                                              =                                    ∑                              n                =                                  N                  +                  1                                                            N                +                M                                      ⁢                                                            c                  ~                                n                            ⁢                                                                    f                    n                                    ⁡                                      (                                          x                      ,                      y                                        )                                                  .                                                                        (        7        )            
In the encoder, the Motion Field Coding block aims to minimise the number of bits necessary for representation of the motion vector field while at the same time retaining low prediction error. The total number of bits needed to represent the motion vector field depends on:
the number of segments in the image,
the number of motion coefficients per segment,
the number of bits required to represent the motion coefficients.
A prior art system for performing Motion Field Coding is shown in FIG. 4, and consists of 4 main building blocks: a QR Motion Analyzer 1, a Segment Merging block 2, an Orthogonalization block 3 and a Motion Coefficient Removal block 4. Such a system is described in PCT publications WO97/16025 and WO97/40628.
The inputs to the Motion Field Coding block are:
the motion vector field (xcex94x(x,y),xcex94y(x,y)) found by the Motion Estimation block,
the current frame,
a reference frame, and
initial segmentation of the current frame. The initial segmentation can be obtained in the encoder before or during Motion Estimation. The segmentation can also be provided to the encoder by some external means.
The Motion Field Coding block can reduce the total number of bits which have to be sent to the decoder by:
Reducing the number of segments by combining (merging) together those segments which can be predicted with a common vector of motion coefficients without causing a large increase of prediction error. The process of combining such segments is called motion assisted merging, and is performed by the Segment Merging block 2.
Using basis functions which have low sensitivity to quantization (performed by the Quantisation block 5) of corresponding motion coefficients so that these coefficients can be represented with a small number of bits. It has been found that coefficients corresponding to discrete orthogonal functions are robust to quantization. Therefore, after segment merging, basis functions are orthogonalized with respect to the rectangle circumscribing the segment. This is done by the Orthogonalisation block 3.
Finding for each segment a minimal number of basis functions which achieve a satisfactorily low prediction error. Only coefficients corresponding to these selected basis functions have to be transmitted to the decoder. The process of such adaptive selection of basis functions and corresponding motion coefficients is performed by the Motion Coefficient Removal block 4.
The function of the QR Motion Analyzer 1 is to find a representation of the motion vector field which can be used downstream by the Segment Merging block 2 and Motion Coefficient Removal block 4 to efficiently calculate motion coefficients corresponding to different combinations of segments and basis functions. The QR Motion Analyzer 1 and Segment Merging block 2 operate as follows.
The QR Motion Analyzer 1 performs a plurality of steps comprising matrix operations. These are described in detail in PCT publications WO97/16025 and WO97/40628. In the first step the prediction frame is approximated so that the prediction frame becomes linear with respect to motion vectors. In the second step a matrix Ek and vector qk are constructed for each segment Sk of the prediction frame and are used for minimisation of the square prediction error. In the third step the well known QR factorization algorithm is used to decompose the matrix Ek into a product of two matrices Qk and Rk, where Qk is a unitary matrix, and Rk is an upper triangular matrix. In addition, an auxiliary vector zk is calculated from the factor matrix Qk and the matrix qk. Part of the matrix Rk and the auxiliary vector zk are applied to the Segment Merging block 2.
The Segment Merging block performs merging operation for pairs of neighbouring segments Si and Sj by finding whether the pixel values in the combined area can be predicted using a common motion coefficient vector. If the area of the combined segments can be coded using one vector of motion coefficients, without excessive increase of distortion defined as prediction error, thus yielding a better trade-off between reconstruction error and number of transmitted bits, then these segments are merged. In the matrix operations a matrix equation is firstly formed, thereafter the factor matrices are processed using known matrix computation methods. The result is a matrix equation, where one matrix includes terms on the basis of which it is easy to calculate the square prediction error in the area of the merged segments. If the change of the square prediction error is acceptable according to a chosen criterion, the segments are merged.
After all pairs of segments are considered, the output of the Segment Merging block 2 is:
i. a new division of the image with a reduced number of segments,
ii. for each new segment the block outputs matrix R1k, vector z1k,
iii. merging information which is sent to the decoder and helps the decoder to identify the segments which were merged.
Overall, the outputs of Motion Field Coding block are:
information describing image segmentation,
information on which coefficients are transmitted to the decoder,
quantised values for the transmitted motion coefficients.
It is crucial that Motion Field Coding is computationally simple so that it enables the encoder to process the data at the incoming rate.
This invention proposes to introduce changes in the Motion Field Coding system described above.
Accordingly, in one aspect, the invention provides a video codec for motion compensated encoding of video data, the video codec providing a motion vector field of video pixels of a current frame to be coded relative to a reference frame, and having a motion field coder for coding the motion vector field to provide compressed motion information, the motion field coder including a motion analyzer comprising means for calculating and storing for each segment k of the current frame an approximation matrix Ek and an approximation vector qk such that a predefined measure xcex94ek for distortion in each segment is a function of (Ekckxe2x88x92qk), ck being a vector approximating said motion vector field as motion coefficients cn of a set of polynomial basis functions ƒn, and means for generating motion analyzer output parameters including an output matrix Ak and an output vector dk, wherein Ak is the product of Ek transpose and Ek, and dk is a product of Ek transpose and qk.
Here, the denomination k is a general denomination representative of any segment of a video frame.
By means of the invention, the representation of the motion vector field in the Motion Analyser is different to that in the prior art systems, and is obtained with substantially lower computational complexity, thus requiring less computing power, memory and enabling downsizing. More specifically, these changes substantially simplify the computations without compromising performance, thus speeding up the encoding process.
It is preferred that the motion field coder includes a Segment Merging block receiving the output of the Motion Analyzer, the Segment Merging block comprising merging means for merging pairs of neighbouring segments Si and Sj if the pixel values in the combined area can be predicted using a common motion coefficient vector, said merging means determining a common motion coefficient vector ck for a merged segment Sk by solving the linear equations
Akck=dk
where Ak is a merged matrix given by the sum of motion analyser output matrices Ai and Aj and dk is a merged vector given by the sum of motion analyser output vectors di and dj of segments Si and Sj respectively.
Such a solution for motion assisted merging provides a single vector of motion coefficients which allow a good prediction of combined segment.
In a second aspect of the invention, there is provided a method for motion compensated encoding of video data comprising providing a motion vector field of video pixels of a current frame to be coded relative to a reference frame, and coding the motion vector field to provide compressed motion information, said coding of the motion vector field comprising, calculating and storing for each segment k an approximation matrix Ek and an approximation vector qk such that a predefined measure xcex94ek for distortion in each segment is a function of (Ekckxe2x88x92qk), ck being a vector approximating said motion vector field as motion coefficients cn of a set of polynomial basis functions ƒn, and generating motion analyzer output parameters including an output matrix Ak and an output vector dk, wherein Ak is the product of Ek transpose and Ek, and dk is a product of Ek transpose and qk.
The invention further includes a decoder operating in accordance with the principle of the present invention.