Video encoders are becoming more complex over time. Video compression standards promise increased compression ratios and better visual quality. Encoder implementation varies across different data processor architectures. Conventional encoder implementation suffers from increased system overhead and severe overall performance degradation.
Conventional encoder designs include encoding loops revolving around single macroblocks. These encoders typically trigger a loop filtering process for all macroblocks at the end of encoding for each frame. The reconstructed pixels of each macroblock prior to loop filtering are used to predict subsequent macroblocks in an intra prediction mode. This intra prediction mode dependency makes it difficult to process multiple macroblocks at a time. Thus the conventional implementation incurs penalties from cache misses and produces many small and scattered data transfers.