Conventional inter-frame video compression techniques use motion compensated reference frames to reconstruct a current frame. In a conventional video decoder system, the reference frames are typically stored in an external dynamic random access memory (DRAM) chip. The DRAM chip is separate from the decoder processor chip because the specified memory capacity is too large to fit on the decoder processor chip economically.
Power consumed in transferring the reference data to and from the DRAM in portable high-definition video decoder implementations is a problem. The power consumption is especially noticeable with complex motion compensation such as with H.264 encoded video. Other functions that access the DRAM, such as graphical user interface processing, are often bottlenecked by the reference data transfers. A reduction in the average DRAM bandwidth can be achieved by caching recently used motion compensation data on the processor chip. However, traditional set associative caches have problems when used in such applications. If the cache block is too big, a large percentage of the data may not be used before the cache block is replaced. Dumping unused data from the cache can use more DRAM bandwidth and more power than not having any cache. If the cache block is small, the associated address tag can consume a large percentage of the cache size. For example, given a 16-byte block size and a 18-bit address tag, the overhead caused by the address tag is approximately 14% of the total cache utilization. In addition, the area and power of the tag compare logic increases as the number of parallel tag compares increases. For example, multiple sets of N-bit tag compare logic are used per cache way in an associative cache (i.e., 8 sets of 18-bit tag compare logic for each cache way in an 8-way set associative cache). Furthermore, the DRAM overhead of processing many small cache blocks misses is normally greater than processing fewer larger cache misses because of a large DRAM latency associated with each cache miss.
A caching technique is desired that reduces the DRAM bandwidth utilization to save power consumed in transferring the data.