Transmission of moving pictures in real-time is employed in several applications like e.g. video conferencing, net meetings, TV broadcasting and video telephony.
However, representing moving pictures requires bulk information as digital video typically is described by representing each pixel in a picture with 8 bits (1 Byte) or more. Such uncompressed video data results in large bit volumes, and cannot be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.
Thus, enabling real time video transmission requires a large extent of data compression. Data compression may, however, compromise with picture quality. Therefore, great efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.
In video compression systems, the main goal is to represent the video information with as little capacity as possible. Capacity is defined with bits, either as a constant value or as bits/time unit. In both cases, the main goal is to reduce the number of bits.
The most common video coding method is described in the MPEG* and H.26* standards. The video data undergo four main processes before transmission, namely prediction, transformation, quantization and entropy coding.
The prediction process significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. Since the predictor part is known to both encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation. The prediction is mainly based on vectors representing movements. The prediction process is typically performed on square block sizes (e.g. 16×16 pixels).
Note that in some cases, such as in H.264/AVC, predictions of pixels based on the adjacent pixels in the same picture rather than pixels of preceding pictures are used. This is referred to as intra prediction, as opposed to inter prediction. In H.264/AVC, there are many different modes for doing such prediction both for luminance blocks and chrominance blocks. One of the prediction modes is called DC-prediction. It predicts all pixels in a block to have the same value. When we take into account the characteristics of the particular transform that is used for residual coding it means that only the DC coefficient of the residual block data is changed compared to transformation of the block data without prediction. All AC-coefficients are unchanged. For this reason the prediction mode is named DC-prediction.
The residual represented as a block of data (e.g. 4×4 pixels) still contains internal correlation. A well-known method of taking advantage of this is to perform a two dimensional block transform. In H.263 an 8×8 Discrete Cosine Transform (DCT) is used, whereas H.264 uses a 4×4 integer type transform. This transforms 4×4 pixels into 4×4 transform coefficients and they can usually be represented by fewer bits than the pixel representation. Transform of a 4×4 array of pixels with internal correlation will probability result in a 4×4 block of transform coefficients with much fewer non-zero values than the original 4×4 pixel block.
A macro block is a part of the picture consisting of several sub blocks for luminance (luma) as well as for chrominance (chroma).
There are typically two chrominance components (Cr, Cb) with half the resolution both horizontally and vertically compared with luminance. This is in contrast to for instance RGB (red, green, blue) which is typically the representation used in the camera sensor and the monitor display.
From the patent literature there are examples disclosing video encoding/decoding and methods of compression. In particular the U.S. Pat. No. 6,256,347 B1 (Yu et al.) should be mentioned, which discloses an image processor that receives prediction error values from decompressed MPEG coded digital video signals in the form of pixel blocks containing luminance and chrominance data in a 4:2:2 or 4:2:0 format and recompresses the pixel blocks to a predetermined resolution. Luminance and chrominance data are processed with different compression laws during recompression. Luminance data are recompressed to an average of six bits per pixel, whereas chrominance data are recompressed to an average of four bits per pixel. Thus Yu et al. discloses a method for bit compression of data on 4:2:2 and 4:2:0 formats, and hence is not a general method applying to a plurality of formats.
Further it should be mentioned that US 2003/0043921 A1 (Dufour et al.) discloses a method for video encoding applied to an input signal which includes a sequence of frames represented by a luminance matrix and two chrominance matrices.
Most video coding standards are mainly designed for 4:2:0. MPEG2 professional profile covers 4:2:2 using a special chrominance block arrangement. The same is true for H.263. Generally this means that each format needs a special solution.