In video coding, frames are typically encoded in two ways:                i) intra coding, and        ii) inter coding.        
In intra coding, the spatial correlation of blocks within a frame is utilized to generate prediction residuals, which have significantly less energy than the corresponding original image blocks. The prediction residual is the difference between an original block and its prediction. Hence, fewer bits are required to encode the blocks at a given level of fidelity.
For inter coding, motion-compensated prediction residuals are generated using blocks within a temporally adjacent frames.
FIG. 1 shows a conventional encoder. Input is a macro block 101 and output is a bit stream 109. The macroblock is transformed 110 by a transform that is selected based upon the prediction mode 102. The transformed data is then quantized 120 to a quantized signal. The quantized signal is entropy coded 130 to produce the bit stream 109. Output of the entropy coder is also inverse quantized 140, inversed transformed 150 and used for intra prediction 160 when combined with the input macroblock 101.
FIG. 2 shows a conventional decoder. The input is a bit stream 109 and the output is a macroblock 208. The bit stream is entropy decoded 201, and inverse quantized 203. The decoded transform coefficients are inverse transformed 204, where the transform is selected based upon the prediction mode 202. An intra or inter prediction residual 207 is combined 205 to produce a decoded macroblock. This macroblock is output and stored in a buffer 206 to be used for reconstruction future decoded macroblocks.
In state-of-the-art video encoders/decoders (codecs), such as codecs designed according to the H.264/AVC standard, the prediction for an intra coded block is determined from previously coded spatially neighboring blocks in the same frame. Several directional predictions are generated, and a fitness measure such as sum of absolute differences (SAD), sum of squared error (SSE), or sum of absolute transformed differences (SATD) is determined for each direction. In H.264/AVC, the best prediction direction or “mode” is selected, and the corresponding prediction residual is transformed via the conventional integer Discrete Cosine Transform (DCT) prior to quantization. Because the residuals of the same mode possess common patterns of correlation, one can design transforms that further exploit these patterns to reduce the bit rate. The prior art defines a set of transforms called Mode Dependent Directional Transforms (MDDT). MDDT utilizes the Karhunen-Loève Transform (KLT) as a trained set of transforms for residuals of each intra prediction mode.
The KLT, however, is vulnerable to outliers in the training data. The outliers can skew the KLT in such a way that they become suboptimal when used to subsequently transform and code video data after the training process. Additionally, the KLTs may not be a sparse as desired for practical video coding situations. Hence, there is a need for a method to train and utilize transforms in a manner that is still accurate in to the presence of outliers.