Motion compensated prediction is a key element of the majority of video coding schemes. To describe the operation of motion compensated prediction it should be appreciated that each digital image contains certain set of pixels corresponding to certain parts of the image. Each pixel may be represented, for example, as intensities of Red, Green and Blue (RGB color system) or as intensities of the luminance and two chrominance components.
FIG. 1 shows illustratively two segments of an image, Sk and SI, each showing a set of pixels 10 to 15 at old locations, that is in a previous image of a sequence of images. The new locations of these pixels in a current image are shown as positions 10′ to 15′. The change of their location, that is their motion, defines respective motion vectors v1k to v3k and v1I v3I of the pixels in these segments. At the simplest, the segments are squares or rectangles. Alternatively, and in legacy schemes, they may also be of an arbitrary form, as shown in FIG. 1.
FIG. 2 is a schematic diagram of an encoder for compression of video sequences using motion compensation. Essential elements in the encoder are a motion compensated prediction block 1, a motion field estimation block 2 and a motion field coder 3. The operating principle of motion compensating video coders is to compress the prediction error En(x, y), which is a difference between the incoming frame In(x, y) being coded, called the current frame, and a prediction frame În(x, y), wherein:En(x, y)=In(x, y)−În(x, y)  (1)The prediction frame În(x, y) is constructed by the motion compensated prediction block 1 and is built using pixel values of the previous, or some other already coded frame denoted Ĩref(x, y), called a reference frame, and the motion vectors of pixels between the current frame and the reference frame. Motion vectors are calculated by the motion field estimation block 2 and the resulting vector field is then coded in some way before being provided as an input to the prediction block 1. The prediction frame is then:În(x, y)=Ĩref[x+{tilde over (Δ)}x(x, y), y+{tilde over (Δ)}y(x, y)]  (2){tilde over (Δ)}x(x, y) and {tilde over (Δ)}y(x, y) are the values of horizontal and vertical displacement of pixel in location (x, y) and the pair of numbers [{tilde over (Δ)}x(x, y), {tilde over (Δ)}y(x, y) ] is called the motion vector of that pixel. The set of motion vectors of all pixels in the current frame In(x, y) is called a motion vector field. The coded motion vector field is transmitted as motion information to the decoder together with encoded prediction error information.
In the decoder, shown in FIG. 3, the current output frame Ĩn(x, y) is reconstructed by finding the pixels' prediction În(x, y) in the reference frame Ĩref(x, y) and adding a decoded prediction error Ên(x, y). The motion compensated prediction block 21 generates the prediction frame using the received motion information and the reference frame Ĩref(x, y). The prediction error decoder 22 generates the decoded prediction error Ên(x, y) for adding to the prediction frame, the result being the current output frame Ĩn(x, y).
The general object of motion compensated (MC) prediction is to minimize amount of information which needs to be transmitted to the decoder together with the amount of prediction error measured, e.g., as the energy of En(x, y).
The document H. Nguen, E. Dubois, “Representation of motion information for image coding”. Proc. Picture Coding Symposium '90, Cambridge, Mass., Mar. 26–18, 1990, pages 841–845, gives a review of motion field coding techniques. As a rule of thumb, reduction of prediction error requires a more sophisticated motion field model, that is, more bits must be used for its encoding. Therefore, the overall goal of video encoding is to encode the motion vector field as compactly as possible while keeping the measure of prediction error as low as possible.
The motion field estimation block 2, shown in FIG. 2, calculates motion vectors of all the pixels of a given image segment minimizing some measure of prediction error in this segment, for example square prediction error.
Due to the very large number of pixels in the frame, it is not efficient to transmit a separate motion vector for each pixel. Instead, in most video coding schemes, the current frame is divided into larger image segments so that all motion vectors of the segment can be described by few parameters. Image segments may be square blocks, for example, 16×16 and 8×8 pixel blocks are used in codecs in accordance with international standards ISO/IEC MPEG-1, MPEG-2, MPEG-4 or ITU-T H.261 and H.263, or they may comprise arbitrarily shaped regions obtained for instance by a segmentation algorithm. In practice, segments include at least few tens of pixels.
In order to compactly represent the motion vectors of the pixels in a segment, it is desirable that the motion vectors are described by a function of few parameters. Such a function is called a motion vector field model. A known group of models is linear motion model, in which motion vectors are represented by linear combinations of motion field basis functions. In such models, the motion vectors of image segments are described by a general formula:
                                                                        Δ                ⁢                                                                  ⁢                                  x                  ⁡                                      (                                          x                      ,                      y                                        )                                                              =                                                ∑                                      i                    =                    1                                    N                                ⁢                                                      c                    i                                    ⁢                                                            f                      i                                        ⁡                                          (                                              x                        ,                        y                                            )                                                                                                                                                                                Δ                  ⁢                                                                          ⁢                                      y                    ⁡                                          (                                              x                        ,                        y                                            )                                                                      =                                                      ∑                                          i                      =                                              N                        +                        1                                                                                    N                      +                      M                                                        ⁢                                                            c                      i                                        ⁢                                                                  f                        i                                            ⁡                                              (                                                  x                          ,                          y                                                )                                                                                                        ,                                                          (        3        )            where parameters ci are called motion coefficients and are transmitted to the decoder. In general, the motion model for a segment is based on N+M motion coefficients. Functions fi(x, y) are called motion field basis functions which and are known both to the encoder and decoder. Known motion field estimation techniques vary both in terms of the model used to represent the motion field and in the algorithm for minimization of a chosen measure of the prediction error.
Both the amount and the complexity of the motion varies between frames and between segments. In one case some of the content of the image may be rotated, skewed and shifted from one side of the image to the opposite side of the image. On the other hand, in another case a video camera may turn slowly about its vertical axis so that all the pixels move slightly in horizontal plane. Therefore, it is not efficient to always use all N+M motion coefficients per segment.
One way to reduce motion information is simply to reduce the number of motion coefficients from the motion field model that models the motion of pixels' locations from one image to another. However, the prediction error tends to increases, as the motion field model becomes coarser.
For every segment, it is necessary to determine the minimum number of motion coefficients that yields a satisfactorily low prediction error. The process of such adaptive selection of motion coefficients is called motion coefficient removal. This process is performed in the encoder by the motion field coding block 3, see FIG. 2. It is performed after motion field estimation performed by the motion field estimation block 2.
In future, digital video transmission will be possible between wireless mobile terminals. Usually such terminals have limited space for additional components and operate by a battery so that they are likely not accommodate computing capacity comparable to fixed devices such as desktop computers. Therefore, it is crucial that the motion field coding performed in a video coder is computationally simple so that it does not impose an excessive burden on the processor of the device. Additionally, the encoded motion field model should be computationally simple to facilitate later decoding at a decoder in a receiving (mobile) terminal.
Methods for performing motion estimation with different models and selecting the most suitable one are proposed in the documents H. Nicolas and C. Labit, “Region-based motion estimation using deterministic relaxation schemes for image sequence coding,” Proc. 1994 International Conference on Acoustics, Speech and Signal Processing, pp. III265–268 and P. Cicconi and H. Nicolas, “Efficient region-based motion estimation and symmetry oriented segmentation for image sequence coding,” IEEE Tran. on Circuits and Systems for Video Technology, Vol. 4, No. 3, June 1994, pp. 357–364. The methods attempt to adapt the motion model depending on the complexity of the motion by performing motion estimation with different models and selecting the most suitable one. The main disadvantage of these methods is their high computational complexity and the small number of different motion field models that can be tested in practice.
Yet another method is described in WO97/16025. A video codec includes a motion field coder for minimizing the number of motion coefficients of a motion vector field. In the coder, a first block includes means for forming a new matrix representation of the motion vector field. The new coded motion vector field is linear. A second main block includes means for merging pairs of adjacent segments if the combined segment area can be predicted using a common motion field. Merging information is transmitted to a decoder. A third main block includes means for removing motion field basis functions. After each removing step, the squared prediction error is calculated and removing is continued until the magnitude of the error is not acceptable. Final motion coefficients are calculated by solving a linear matrix equation. As a result, reduced number of motion coefficients for each segment is obtained. The motion coefficients are transmitted to the decoder. This approach allows removal of motion coefficients until a certain threshold of prediction error is reached.
However, there is still a need to further reduce the complexity of the motion encoding process as well as the amount of motion information that needs to be sent to the decoder while causing minimal deterioration in the quality of a decoded image.
An objective of the present invention is to reduce the amount of motion field vector information produced by the motion field estimation block 2 by a large factor without deteriorating the decoded image to a significant extent. Another objective is to keep the complexity of the motion field coder low to allow practical implementation using available signal processors or general-purpose microprocessors.