Transmission of moving pictures in real-time is employed in several applications like, e.g., video conferencing, net meetings, TV broadcasting and video telephony.
However, representing moving pictures requires bulk information as digital video typically described by representing each pixel in a picture with 8 bits (1 Byte). Such uncompressed video data results in large bit volumes, and can not be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.
Thus, enabling real time video transmission requires an extrusive use of data compression. Data compression may, however, compromise with picture quality. Therefore, great efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.
In video compression systems, the main goal is to represent the video information with as little capacity as possible. Capacity is defined with bits, either as a constant value or as bits/time unit. In both cases, the main goal is to reduce the number of bits.
Many video compression standards have been developed. Many of those methods are standardized through ISO (the International Standards organization) or ITU (the International Telecommunications Union). In addition, a number of other proprietorial methods have been developed. The main standardization methods are:    ITU: H.261, H.262, H.263, H.264; and    ISO: MPEG1, MPEG2, MPEG4/AVC.
The first step in the coding process according to these standards is to divide the picture into square blocks of pixels, for instance 16×16 or 8×8 pixels. This is done for luminance information as well as for chrominance information.
The prediction process that follows significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence, and produces a prediction for the pixels in the block. This may be based on pixels in an already coded/decoded picture (called inter prediction) or on already coded/decoded pixels in the same picture (intra prediction). The prediction is mainly based on vectors representing movements.
Since the predictor part is known to both the encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation. The difference between the pixels to be coded and the predicted pixels is often referred to as a residual.
The residual represented as a block of data (e.g. 4×4 pixels) still contains internal correlation. A well-known method of taking advantage of this is to perform a two dimensional block transform. In H.263, an 8×8 Discrete Cosine Transform (DCT) is used, whereas H.264 uses a N×N (where N can be 4 or 8) integer type transform. This transforms N×N pixels into N×N transform coefficients and they can usually be represented by fewer bits than the pixel representation. A transform of an N×N array of pixels with internal correlation will probability result in a 4×4 block of transform coefficients with much fewer non-zero values than the original 4×4 pixel block.
Direct representation of the transform coefficients is still too costly for many applications. A quantization process is carried out for a further reduction of the data representation. Hence the transform coefficients undergo quantization. A simple version of quantization is to divide parameter values by a number—resulting in a smaller number that may be represented by fewer bits. This is the major tool for controlling the bit production and reconstructed picture quality. It should be mentioned that this quantization process has as a result that the reconstructed video sequence is somewhat different from the uncompressed sequence. This phenomenon is referred to as “lossy coding”. This means that the reconstructed pictures typically have lower quality than the original pictures. The outputs from the quantization process are integer numbers—which do not represent the original transform coefficients correctly. These integers together with integers representing the side information are coded in a lossless way and transmitted to the decoder.
Finally, a so-called scanning of the two dimensional transform coefficient data into a one dimensional set of data is performed, and the one dimensional set is further transformed according to en entropy coding scheme. Entropy coding implies lossless representation of the quantized transform coefficients.
The above steps are listed in a natural order for the encoder. The decoder will to some extent perform the operations in the opposite order and do “inverse” operations as inverse transform instead of transform and de-quantization instead of quantization.
Lossless coding is conventionally being used for entropy coding of quantized transform coefficients, and for coding side information like motion vectors, coding mode and Coded Block Pattern (CBP). Typically a set of “events” are defined in an event table. Then a Variable Length Code (VLC) code table is defined and each event is coupled to a code in the VLC table. Below are some examples of event tables to be coded.
Motion vector components are typically horizontal or vertical components. Assuming now that only one such component is considered, and the value is an integer number that may be positive or negative. The most probable value is 0. Then follow ±1, ±2, ±3, etc. with descending probability. A good combination of vector values and code values turn out to be:
••••−3 00110−2 00100−1  0100  11 011200101300111
For an optimal solution the most probable event should have the shortest code. More specifically, the code length in bits should be equal to the entropy of the event or: Code_length=−log2(probability_of_event).
Hence the VLC above is optimal if the probabilities of . . . −3, −2, −1, 0, 1, 2, 3 are 1/32, 1/32, ⅛, ½ ⅛, 1/32, 1/32  etc.
Another kind of side information that typically is coded with VLC tables is CBP. It turns out to be beneficial to signal which of 4 8×8 luminance blocks and 2 collocated 8×8 or chrominance blocks in a macroblock that have nonzero coefficients or not by VLC. Therefore, an event table with the 26=64 possible events is defined. A corresponding VLC table is defined that matches the probabilities of the 64 events.
The quantized transform coefficients are also coded by lossless VLC. There are many ways of coding the transform coefficients efficiently. This may lead to different event tables—and associated VLCs. Conventionally, the quantized transform coefficients in a block are expressed by the number of nonzero transform coefficients in a block, the position of the last nonzero transform coefficients in a block and the actual size of transform coefficients. Combinations of this information then create events defined in VLC tables. For example, a combined event can be coded indicating both the position of the last nonzero and whether the size of the last coefficient=1 or >1. Other combinations can also be used, but the basics are still to select combinations and corresponding VLC tables minimizing the number of bits required based on the likelihood of events.
This will result in low bit usage as long as the data to be coded fit reasonably well with the underlying statistics. In the opposite case, when very untypical data is to be coded, the use of bits may become too high. In situation where the data to be coded fail to fit with the “normal” statistics, occurrences that are represented by a large number of bits will become more frequent. This may be the situation at rapid and lasting light changes in the environment where the video image is captured. This will harm the quality of the encoded/decoded image as the coding process automatically will adjust the quantization intervals to comply the frequent occurrence of long code words.