The demand for digital video products continues to increase. Some examples of applications for digital video include video communication, security and surveillance, industrial automation, and entertainment (e.g., DV, HDTV, satellite TV, set-top boxes, Internet video streaming, video gaming devices, digital cameras, cellular telephones, video jukeboxes, high-end displays and personal video recorders). Further, video applications are becoming increasingly mobile as a result of higher computation power in handsets, advances in battery technology, and high-speed wireless connectivity.
Video compression, i.e., video coding, is an essential enabler for digital video products as it enables the storage and transmission of digital video. In general, video compression techniques apply prediction, transformation, quantization, and entropy coding to sequential blocks of pixels, i.e., coding blocks, in a video sequence to compress, i.e., encode, the video sequence. A coding block is a subset of a frame or a portion of a frame, e.g., a slice or a block of 64×64 pixels, in a video sequence and a coding block and a frame may be inter-coded or intra-coded. For encoding, a coding block may be divided into prediction blocks, e.g., 4×4, or 8×8 or 16×16 blocks of pixels. Prediction blocks may be inter-coded or intra-coded as well. In an intra-coded coding block, all prediction blocks are intra-coded. In an inter-coded coding block, the prediction blocks may be either intra-coded or inter-coded.
For intra-coded prediction blocks, spatial prediction is performed using different spatial prediction modes that specify the direction, e.g., horizontal, vertical, diagonal, etc., in which pixels are predicted. For example, the H.264/AVC video coding standard provides nine 4×4 spatial prediction modes, nine 8×8 spatial prediction modes, and four 16×16 spatial prediction modes for spatial prediction in the luminance space, and four 8×8 prediction modes in the chrominance space. Future standards may provide more spatial prediction modes and/or larger sizes of prediction blocks. In general, spatial prediction predicts a current prediction block, i.e., an actual prediction block in a coding block of a frame, based on surrounding pixels in the same frame using each of the spatial prediction modes, and selects for output the predicted prediction block and prediction mode that yields a predicted prediction block most closely resembling the pixels in the current prediction block. The predicted prediction block is then subtracted from the current prediction block to compute a residual prediction block, and transform coding is applied to the residual prediction block to reduce redundancy.
Prediction mode dependent directional transforms may be used in transform coding of spatially predicted i.e., intra-coded, prediction blocks. In one technique for using prediction mode dependent transforms, referred to as Mode-Dependent Directional Transform (MDDT), a set of predetermined, trained transform matrices (Bi, Ai), i=0, . . . , n−1, is provided, one for each of n spatial prediction modes. The transform coding selects which of the transform matrices to use based on the spatial prediction mode selected by the spatial prediction. More specifically, if a residual prediction block X results from using prediction mode i, the transformed version of X, i.e., the 2D transform coefficients of X, is given by: Y=BiXAiT where Bi and Ai are column and row transforms. In H.264, Bi=Ai=M, where M is a Discrete Cosine Transform (DCT) transform matrix. Further, a form of a Karhunen-Loève Transform (KLT) is used to determine Bi and Ai. More specifically, singular value decomposition (SVD) is performed on cross-correlated residual blocks of each prediction mode i collected from training video sequences to determine Bi and Ai.
To use MDDT, two transform matrices must be stored for each spatial prediction mode. For example, if there are twenty-two spatial prediction modes as in H.264/AVC, forty-four transform matrices are required. Further, using transform matrices as generated for MDDT is computationally complex, especially as compared to the more commonly used DCT, since it may require a full matrix multiply. That is, transform coding of an N×N block may require 2×N×N×N multiplications and 2×N×N×(N−1) additions. Thus, using these transform matrices may not be well suited for encoding on resource limited devices. Additional information regarding MDDT may be found in the following documents published by the ITU-Telecommunications Standardization Sector of the Video Coding Experts Group (VCEG): VCEG-AG11, VCEG-AM20, and VCEG-AF15, and in JCTVC-B024 published by the Joint Collaborative Team on Video Coding (JVT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11.