Transmission of moving pictures in real-time is employed in several applications like e.g. video conferencing, net meetings, TV broadcasting and video telephony.
However, representing moving pictures requires bulk information as digital video typically is described by representing each pixel in a picture with 8 bits (1 Byte). Such uncompressed video data results in large bit volumes, and can not be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.
Thus, enabling real time video transmission requires a large extent of data compression. Data compression may, however, compromise with picture quality. Therefore, great efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.
In video compression systems, the main goal is to represent the video information with as little capacity as possible.
Capacity is defined with bits, either as a constant value or as bits/time unit. In both cases, the main goal is to reduce the number of bits.
The most common video coding method is described in the MPEG* and H.26* standards. The video data undergo four main processes before transmission, namely prediction, transformation, quantization and entropy coding.
The prediction process significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. Since the predictor part is known to both encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation. The prediction is mainly based on vectors representing movements. The prediction process is typically performed on square block sizes (e.g. 16×16 pixels). Note that in some cases, predictions of pixels based on the adjacent pixels in the same picture rather than pixels of preceding pictures are used. This is referred to as intra prediction, as opposed to inter prediction.
The residual represented as a block of data (e.g. 4×4 pixels) still contains internal correlation. A well-known method of taking advantage of this is to perform a two dimensional block transform. In H.263 an 8×8 Discrete Cosine Transform (DCT) is used, whereas H.264 uses a 4×4 integer type transform. This transforms 4×4 pixels into 4×4 transform coefficients and they can usually be represented by fewer bits than the pixel representation. Transform of a 4×4 array of pixels with internal correlation will probability result in a 4×4 block of transform coefficients with much fewer non-zero values than the original 4×4 pixel block.
Direct representation of the transform coefficients is still too costly for many applications. A quantization process is carried out for a further reduction of the data representation. Hence the transform coefficients undergo quantization. A simple version of quantisation is to divide parameter values by a number—resulting in a smaller number that may be represented by fewer bits. It should be mentioned that this quantization process has as a result that the reconstructed video sequence is somewhat different from the uncompressed sequence. This phenomenon is referred to as “lossy coding”. The outcome from the quantisation part is referred to as quantized transform coefficients.
Entropy coding implies lossless representation of different types of parameters such as overhead data or system description, prediction data (typically motion vectors), and quantized transform coefficients from the quantisation process. The latter typically represent the largest bit consumption.
The coding is performed on block wise parts of the video picture. A macro block consists of several sub blocks for luminance (luma) as well as for chrominance (chroma). There are typically two chrominance components (Cr, Cb) with half the resolution both horizontally and vertically compared with luminance. In FIG. 1, the macro block consists of 16×16 luminance pixels and two chrominance components with 8×8 pixels each. Each of the components is further broken down into 4×4 blocks, which are represented by the small squares. For coding purposes, both luma and chroma 4×4 blocks are grouped together in 8×8 sub blocks and designated Y0-Y3 and Cr, Cb.
H.263 and H.264 describe the prior art entropy coding of quantized transform coefficients in video compression.
H.263 is based on Variable Length Coding (VLC). A set of events is defined. An event represents fixed values for one or more parameters. An event is allocated a unique bit code. The code table is designed such that the length of the codes match the statistical probability of each event. Optimum efficiency is obtained if Bit_number=−log 2(p), where p is the statistical probability of the event (p is in the range 0-1). An example of a VLC code is shown in table 1.
In H.264, improved methods are introduced, i.a. the concept of “Context Adaptivity” (CA). The concept applies a dynamically changed model based on previous coding. As an example, a more suitable VLC table may be chosen based on the occurrence of previous events and thereby make the coding more efficient. Two such CA based methods are disclosed in H.264:                1 CAVLC with moderate Context Adaptivity that use VLC tables for coding.        2 CABAC (Binary Arithmetic Coding) with more complex Context Adaptivity and using arithmetic coding at the end. This result in largest compression gain—but at a price of higher complexity.        
CAVLC is considered to have a moderate adaptivity and complexity. CABAC uses more elaborate adaptivity but is considered as being too complex for certain applications, in particular for real time applications.