A typical microprocessor used in video processing has internal memory that is designed to be used for caching data and caching instructions. When a program starts to execute on the microprocessor, all data and instruction are stored in external memory (e.g., random access memory (RAM)). The internal memory stores recent data and instructions that are used in the course of executing the program and which are retrieved from the external memory. If the same data and/or instructions are required again during the execution of the program, the processor can quickly access the internal memory, or cache, to get the information, if it is still present in the cache (i.e., has not been overwritten by other, e.g., more recently used data). Since the size of the cache is small, the processor will automatically replace old data and/or instructions in the cache with more recent ones according to the microprocessor's built-in cache management algorithms. During the execution of a program, if there are data and/or instructions required for execution of the program that are not currently in the cache, a cache miss occurs where the microprocessor has to access the external memory to retrieve the necessary data and/or instructions. As a result, program execution is slowed because of the added time required to retrieve the data and/or instructions from the external memory; the added amount of time required to access data and/or instructions external to the cache is referred to as a cache miss penalty.
Video decoding and filtering involves large amounts of data and complex processing. Simple strategies for processing video are not optimal in terms of processing speed and/or cache usage. For example, decoding an entire video frame before filtering the same frame can lead to a large number of data cache misses because an entire video frame does not fit in the cache, and therefore parts of the video frame data need to be swapped in and out of the cache for each processing step. It would be beneficial to be able to make cache utilization as efficient as possible in order to reduce cache miss penalties.