1. Field of the Invention
The invention is related to video compression systems, and in particular to compression of digital video systems.
2. Discussion of the Background
Transmission of moving pictures in real-time is employed in several applications such as video conferencing, net meetings, TV broadcasting and video telephony.
However, representing moving pictures requires bulk information as digital video typically is described by representing each pixel in a picture with 8 bits (1 Byte). Such uncompressed video data results in large bit volumes, and can not be transferred over conventional communication networks and transmission lines in real time due to a limited bandwidth.
Thus, enabling real time video transmission requires a large extent of data compression. Data compression may, however, compromise the picture quality. Therefore, significant efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.
In video compression systems, a main goal is to represent the video information with as little capacity as possible. Capacity is defined with bits, either as a constant value or as bits/time unit. In both cases, a main goal is to reduce the number of bits.
The most common video coding method is described in the MPEG* and H.26* standards. The video data undergo four main processes before transmission, namely prediction, transformation, quantization and entropy coding.
The prediction process significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. Because the predictor part is known to both the encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation. Further, the prediction is mainly based on picture content from previously reconstructed pictures where the location of the content is defined by motion vectors. The prediction process is typically performed on square block sizes (e.g., 16×16 pixels) as indicated in block M in FIG. 1. However, the size of the blocks may vary. This is indicated in the figure by the smaller adjacent blocks a, b, c and d.
In a typical video sequence, the content of the present block M would be similar to a corresponding block in a previously decoded picture. If no changes have occurred since the previously decoded picture, the content of M would be equal to a block of the same location in the previously decoded picture. In other cases, an object in the picture may have been moved so that the content of M is more equal to a block of a different location in the previously decoded picture. Such movements are represented by motion vectors (V). As an example, a motion vector of (3;4) means that the content of M has moved 3 pixels to the left and 4 pixels upwards since the previously decoded picture. For improved accuracy, the vector may also include decimals, requiring interpolation between the pixels.
To reduce the data size of the motion vectors before transmission, it is assumed that the movements in one block are quite similar to the movements of the adjacent blocks. Thus, a prediction (Vpred) of V is created based on the actual motion vectors of the adjacent blocks. As the motion vectors of the adjacent blocks are already known at the receiving side, only the difference (Vdiff) between the actual motion vector and the corresponding prediction have to be represented: Vdiff=V−Vpred. At the receiving side, the motion vector could then be recreated by V=Vpred−Vdiff.
In the ITU standards H.261 and H.262 and the ISO standards MPEG1 and MPEG2, Vpred is set equal to the motion vector of the adjacent block corresponding to block a in FIG. 1, i.e., it is assumed that the movements for a block is the same as for the adjacent block at the left hand side. In H.263 and MPEG4, three adjacent blocks are used to derive a prediction motion vector. Each component of the vector (horizontal and vertical) is derived separately by selecting the respective median of the components of the three vectors.
It is known from the publication US 2002/0039386 A1 a block matching processor and method for supporting block matching motion estimation at motion vector prediction modes using matching blocks of various sizes.
Further, it is known from US 2001/0031004 A1 a method and an apparatus for updating motion vector memories used for pre-diction of motion vectors within a video coding/decoding regime. The main issue in this document is how to store motion vectors in an efficient way for later use as current blocks. The prediction of the motion vectors is based on median calculation; hence this document does not describe a sufficiently accurate measurement method for motion vector prediction.
Still further there is known a solution (WO 01/99437 A2) where the prediction of motion vectors is based on median calculations. However, the main idea described in this document is to make a search within a smaller window in proximity to the predicted position.
The prediction vector derived according to the background art as described above has shown not to be sufficiently accurate. In addition, by selecting the prediction vector on a component by component basis, the vector may be constructed of components from different vectors resulting in a “fictional” motion vector.