Data storage and retrieval plays a key role in data processing tasks. Minimizing the delay, or latency, associated with memory operations in this regard is an important goal in system design. A variety of solutions exist to manage latency with varying degrees of success. Generally speaking, the shorter the bus between a memory device and its associated memory controller, the shorter the latency.
One way to minimize latency involves employing data cache structures on the processor requesting the data. With the use of the cache structure, data with a high probability of being reused soon after storage may be held in a local on-chip cache to allow quick retrieval. In contrast, data with a lower probability of being reused soon after storage may be stored in an off-chip memory, such as a DRAM array. Data stored in the off-chip memory may generally take several clock cycles to retrieve.
While conventional on-chip cache structures provide benefits in certain applications, space and cost concerns generally restrict the storage capacity of on-chip caches. Consequently, data held in a cache cannot be held there for long. They are quickly replaced in the cache by other data and sent out to off-chip memory (e.g., main memory). What is needed is an apparatus and method to combine the low-latency benefits of an on-chip cache memory with the cost and capacity of off-chip memory.