Spatial domain DPCM (differential pulse code modulation), which acted as the prototype of the coding method for digital image and video signal, could be traced back to 1950s. Afterwards in the 1970s, transform coding and motion compensated prediction came into existence. In 1974, Ahmed introduced a block based two dimensional discrete cosine transform (DCT), which now becomes a key technique in the current advanced video coding structure. After all these techniques became mature and practical in the 1980s, a block based hybrid coding structure, which consists of predictive coding, transform coding and entropy coding, is formed. In the next two decades, a series of international coding standards are established, such as H.261, H.263 and H.26L formulated by ITU and MPEG-1, MPEG-2 and MPEG-4 by MPEG of ISO. When it comes to the 21st century, with the advancement of the technology, more efficient coding technique and network adapting method are required to satisfy customers' need for multimedia communication. Under such circumstance, a new generation of video coding standard H.264/MPEG-AVC (H.264) is formulated and issue by the end of 2003. Meanwhile, Chinese self-owned intellectual property rights video coding standard AVS part 2 had been formulated at the end of 2003 and issue as formal standard of China (GB/T 20090.2) at February 2006. The coding efficiency of AVS and H.264 is almost 2 times as that of MPEG-2 while at the cost of computational complexity. Moreover, both of AVS and H.264 are based on the traditional hybrid coding structure.
The essential objective of video coding is to compress the video signal, eliminate redundant data, so that extra memory storage and transmission bandwidth could be saved. The total amount of data of the original video signal is enormous. One frame of CIF size YUV image, for instance, which has 352×288 pixels and formatted as 4:2:0, will cost 1216512 bits to represent if one luma or chroma pixel is denoted by 8 bits. Suppose that video is displayed with 25 frame per second, bit rate would be as high as 30.4 Mbps. This would be even higher for standard definition and high definition video sequence. It is of considerably difficulty to achieve such a high bit rate in transmission and storage so that efficient video compression techniques are necessary to guarantee video communication and storage. Fortunately, in video signal there exist huge amount of redundant information, which mainly includes spatial redundant information, temporal redundant information, data redundant information and visual redundant information. While the first three only consider redundant information between pixels, which could be categorized as pixels based statistical redundant information, the last one mainly focus on the characteristic of human visual system. The important objective of video coding is to eliminate redundant information and compress the video signal. Consequently, aiming at eliminating the redundant information between each pixels, hybrid coding structure based on predictive coding, transform coding and entropy coding is introduced. Its feature includes:
1) Utilize predictive coding to eliminate temporal and spatial redundancy;
2) Utilize transform coding to further eliminate spatial redundancy;
3) Utilize entropy coding to eliminate data redundancy;
Traditional predictive coding in hybrid coding structure includes intra-prediction and inter-prediction (please refer to H.264/AVC and AVS standard). The former one consists of pixel domain prediction and transform domain prediction, both of them constitute the spatial domain prediction. The video frame that applies intra-prediction is called intra-coding frame (I frame). It applies the following procedure: first, frame is divided into blocks (one form of coding unit); applies intra-prediction to coding blocks, and prediction error is obtained according to the different block size and prediction mode; prediction error is transformed; applies quantization to transformed coefficient in the transform domain; transform the 2 dimensional signal into 1 dimension by scanning; entropy coding. The video frame that applies inter-prediction is called inter-coding frame, and it includes forward, backward and bidirectional prediction (P frame and B frame), both of which could be applied to various block size. It applies the following procedure: first, frame is divided into blocks; applies motion estimation techniques of motion search and motion prediction to obtain a motion vector and a reference block (one form of reference block); applies motion compensation and obtained prediction error of inter-prediction (temporal prediction). Moreover, there are several temporal-spatial coding techniques (please refer to Kenneth Andersson, “Combined Intra Inter-prediction Coding Mode”, VCEG-AD11, 18 October 2006). prediction error is transformed; applies quantization to transformed coefficient in the transform domain; transform the 2 dimensional signal into 1 dimension by scanning; entropy coding. The spatial and temporal redundancy exist in prediction error is drastically reduced compare with that of original video signal. If this spatial and temporal redundancy could be quantized as correlation in mathematics, the spatial and temporal correlation of prediction error is small compare with original video signal. Applying two dimensional transform will further reduced the spatial correlation and in the end transformed coefficient should be quantized and entropy coded to eliminate data redundancy. As a result, more accurate predictive coding techniques will help to reduce the spatial and temporal correlation of prediction error and thus lead to efficient compression; more efficient transform coding techniques is needed to reduce temporal correlation; after the predictive coding and transform coding, more suitable scanning, quantization and entropy coding technique should be designed.
To handle with the bottleneck of the traditional hybrid coding structure, there is still redundancy exist in prediction error after the spatial and temporal prediction. Further elimination of such redundancy will help to improve the coding efficiency.