A video coding standard project, called High Efficiency Video Coding (HEVC), has been started by a Joint Collaborative Team on Video Coding (JCT-VC). One goal of this standard is to improve the coding performance over the older H.264/AVC standard in broader applications and wider bit ranges. The initial framework of the HEVC video coding standard is not much significantly different from previous video coding standard, such as in a block based prediction technique, a 2-D Discrete Cosine Transform (DCT) transform, and context based entropy coding. The new coding tools are more advanced and flexible, but with an increased computational complexity. Like conventional coding techniques, the encoder and decoder operate on a sequence of frames of the video. The frames in the video are partitioned into macroblocks of pixels. The macroblocks can be spatially adjacent within a frame (for intra mode coding), and temporally adjacent in successive frames (for inter mode coding)
Orthogonal and bi-orthogonal complete dictionary such as DCT or wavelet have been used for dominant transform domain representation in image and video coding. Sparse and redundant representation of signals over an overcomplete dictionary has been successfully applied to various applications, such as image denoising.
An overcomplete video coding technique can achieve a competitive coding gain at very low bit rates compared with conventional video coding standards. Basically, the block based 2-D DCT transform is replaced with an expansion of larger and more suitable basis functions in the overcomplete video coding. At low bit rate video coding, residual signals are represented with fewer nonzero DCT coefficients because of a larger quantization parameter (QP), and thus only low frequency components appear in a macroblock. In this scenario, the set of overcomplete dictionaries can provide a more various and faithful expression of residual signals than the complete set of dictionaries. Thus, the residual signal can be approximated better with fewer coefficients.
Conventional overcomplete video coding constructs a set of dictionaries with modulated Gabor functions. Matching pursuits (MP) is used to select the most appropriate dictionary elements in the representation. MP determines a suboptimal solution for the sparse signal representation. The set of dictionaries can be varied by concatenating dictionaries generated by several analytic functions such as wavelet, curvelets, and discrete Fourier transforms (DFT). Curvelets are an extension of the wavelet concept. Curvelets use a non-adaptive technique for multi-scale object representation. Wavelets generalize the Fourier transform by using a basis that represents both location and spatial frequency. For 2D or 3D signals, directional wavelet transforms use basis functions that are also localized in orientation. A curvelet transform differs from other directional wavelet transforms in that the degree of localization in orientation varies with scale.
However, those models have drawbacks despite their simplicity. Natural images or videos often contain features that are not well-represented by these models. In these cases, poor reconstruction or artifacts such as ringing can be introduced into the decoded image or video.
Dictionary training can be used because residual signals tend to have a directional orientation after the prediction. Therefore, a set of dictionaries can be well designed by reflecting the characteristics of the residual signals. Mode dependent directional transform can be used for intra coding. A complete dictionary can be constructed using intra prediction residuals corresponding to the directional prediction. Dictionary training can also be adapted to a intra prediction in image coding applications.