Video coding systems are widely used to compress digital video signals to reduce the storage need and/or transmission bandwidth of such signals. Among the various types of video coding systems, such as block-based, wavelet-based, and object-based systems, nowadays block-based hybrid video coding systems are the most widely used and deployed. Examples of block-based video coding systems include international video coding standards such as the MPEG1/2/4 part 2, H.264/MPEG-4 part 10 AVC [1][3] and VC-1[2] standards.
FIG. 1 is a block diagram of a generic block-based hybrid video encoding system. The input video signal 102 is processed block by block. In all existing video coding standards, the video block unit consists of 16×16 pixels; such a block unit is also commonly referred to as a macroblock or MB. Currently, JCT-VC (Joint Collaborative Team on Video Coding) of ITU-T/SG16/Q.6NCEG and ISO/IEC/MPEG is developing the next generation video coding standard called High Efficiency Video Coding or HEVC [4]. In HEVC, extended block sizes (called a “coding unit” or CU) are used to efficiently compress high resolution (1080p and beyond) video signals. In HEVC, a CU can be up to 64×64 pixels. A CU can be further partitioned into prediction units or PUs, for which separate prediction methods are applied. For each input video block (MB or CU), spatial prediction (160) and/or temporal prediction (162) may be performed. Spatial prediction (or “intra prediction”) uses pixels from the already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal. Temporal prediction (also referred to as “inter prediction” or “motion compensated prediction”) uses pixels from the already coded video pictures (commonly referred to as “reference pictures”) to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. Temporal prediction signal for a given video block is usually signaled by one or more motion vectors that indicate the amount and the direction of motion between the current block and its prediction block in the reference picture. Also, if multiple reference pictures are supported (as is the case for the recent video coding standards such as H.264/AVC or HEVC), then, for each video block, its reference picture index is sent additionally. The reference picture index identifies which reference picture in the reference picture store (164) (also referred to as “decoded picture buffer” or DPB) the temporal prediction signal is to be obtained in order to generate the prediction of the current video block that is to be reconstructed. After spatial and/or temporal prediction, the mode decision block (180) in the encoder chooses the best prediction mode, for example, based on the rate-distortion optimization method. The prediction block is then subtracted from the current video block (116); and the prediction residual is transformed (104) and quantized (106). The quantized residual coefficients are inverse quantized (110) and inverse transformed (112) to form the reconstructed residual, which is then added back to the prediction block (126) to form the reconstructed video block. Further in-loop filtering such as deblocking filters, Sample Adaptive Offset, and Adaptive Loop Filters may be applied (166) on the reconstructed video block before it is put in the reference picture store (164) and used to code future video blocks. To form the output video bitstream 120, coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy coding unit (108) to be further compressed and packed to form the bitstream.
FIG. 2 gives a general block diagram of a block-based video decoder. The video bitstream 202 is first unpacked and entropy decoded at entropy decoding unit 208. The coding mode and prediction information are sent to either the spatial prediction unit 260 (if intra coded) or the temporal prediction unit 262 (if inter coded) to form the prediction block. If inter coded, the prediction information includes prediction block sizes, one or more motion vectors (indicating direction and amount of motion) and one or more reference indices (indicating from which reference picture the prediction signal is to be obtained). Motion compensated prediction is then applied by the temporal prediction unit 262 to form the temporal prediction block. The residual transform coefficients are sent to inverse quantization unit 210 and inverse transform unit 212 to reconstruct the residual block. The prediction block and the residual block are then added together at 226. The reconstructed block may further go through in-loop filtering before it is stored in reference picture store 264. The reconstructed video in reference picture store is then sent out to drive a display device, as well as used to predict future video blocks.