1. Field of the Invention
The invention is related to video compression systems.
2. Discussion of the Background
Transmission of moving pictures in real-time is employed in several applications like video conferencing, net meetings, TV broadcasting and video telephony.
However, representing moving pictures requires bulk information as digital video typically is described by representing each pixel in a picture with 8 bits (1 Byte). Such uncompressed video data results in large bit volumes, and can not be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.
Thus, enabling real time video transmission requires a large extent of data compression. Data compression may, however, compromise picture quality. Therefore, great efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.
In video compression systems, one goal is to represent the video information with as little capacity as possible. Capacity is defined with bits, either as a constant value or as bits/time unit. In both cases, it is preferable to reduce the number of bits.
The most common video coding method is described in the MPEG* and H.26* standards. The video data undergoes four main processes before transmission, namely prediction, transformation, quantization and entropy coding.
The prediction process significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. Since the predictor part is known to both encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation. The prediction is mainly based on picture content from previously reconstructed pictures where the location of the content is defined by motion vectors. The prediction process is typically performed on square block sizes (e.g., 16×16 pixels). Note that in some cases, predictions of pixels based on adjacent pixels in the same picture rather than pixels of preceding pictures are used. This is referred to as intra prediction, as opposed to inter prediction.
The residual, represented as a block of data (e.g., 4×4 pixels), still contains internal correlation. A well-known method of taking advantage of this is to perform a two dimensional block transform. The ITU recommendation H.264 uses a 4×4 integer type transform. This transforms 4×4 pixels into 4×4 transform coefficients and they can usually be represented by fewer bits than the pixel representation. Transformation of a 4×4 array of pixels with internal correlation will probably result in a 4×4 block of transform coefficients with much fewer non-zero values than the original 4×4 pixel block.
Direct representation of the transform coefficients is still too costly for many applications. A quantization process is carried out for further reduction of the data representation. Hence the transform coefficients undergo quantization. The possible value range of the transform coefficients is divided into value intervals each limited by an uppermost and lowermost decision value and assigned a fixed quantization value. The transform coefficients are then quantified to the quantization value associated with the intervals within which the respective coefficients reside. Coefficients being lower than the lowest decision value are quantified to zero. It should be mentioned that this quantization process results in a reconstructed video sequence that is somewhat different than the uncompressed sequence.
As already indicated, one characteristic of video content to be coded is the requirement that bits to describe the sequence are strongly varying. For several applications it is well known for a person skilled in the art that the content in a considerable part of the picture is unchanged from frame to frame. H.264 widens this definition so that parts of the picture with constant motion can also be coded without use of additional information. Regions with little or no change from frame to frame require a minimum number of bits to be represented. The blocks included in such regions are defined as “skipped”, reflecting that no changes or only predictable motion relative to the corresponding previous blocks occur, hence no data is required for representing these blocks other than an indication that the blocks are to be decoded as “skipped”. This indication may be common to several macro blocks.
As H.264 is a decoding specification it does not describe any methods for detecting regions of marginal or no changes prior to the transformation and quantization process. As a result, these regions could undergo motion search, transformation and quantization, even if they finally would be defined as skipped and not represented with any data. As these operations require processing capacity, this is unnecessary consumption of resources in the encoder. Effective utilization of processing resourses is particularly crucial in connection with H.264 as it requires large processing resources. At least for some applications it is therefore very desirable to reduce encoder complexity.
Another problem associated with H.264 is that the coding usually is “lossy” in the sense that each picture is reconstructed with some error. This fact together with noise on the source signal means that there will always be a difference between an uncoded block and the collocated block in the previous picture even if there is no real change in the picture content. A typical encoding procedure may therefore often lead to coding of this as a residual signal. This will typically lead to a slight improvement of the objective reconstruction of the block, but will result in an annoying flickering effect in still regions of the picture.