The present invention relates generally to processors, and more particularly, to a data cache prefetch controller in a multi-processing unit system.
A processor may include one or more processing units that execute instructions, a memory that stores data required for executing the instructions, and a data cache that temporarily stores the instruction data in the form of cache lines. Example processing units include an arithmetic logic unit (ALU), a branch unit, and a load store unit. Typically, the processor uses a prefetch engine to predict which future memory addresses will be accessed, and prefetch the data at these addresses from the memory for storage in the data cache. Example prefetch engines include a one-block look ahead (OBL) prefetch engine and a stride-based prefetch engine.
A conventional OBL prefetch engine prefetches data at a memory address subsequent to a current memory address when a cache miss has occurred for the current memory address. A cache hit occurs when the processing unit uses a cache line of the data cache to execute an instruction, a cache evict occurs when a cache line is evicted from the data cache, and a cache miss occurs when the processing unit does not find the data needed to execute an instruction in the data cache.
A conventional stride-based prefetch engine includes a reference pattern table (RPT) for storing the details of instructions executed by the processing unit and predicting memory addresses based on patterns of previous memory addresses accessed by the instructions. The RPT holds information for the most recently used instructions to predict memory access patterns. However, the RPT used by the conventional stride-based prefetch engine is fairly large in size as it includes details of all the recent instructions executed by the processing unit, so it occupies a large area of the processor and consumes a large amount of power.
Further, a common problem associated with both the OBL and stride-based prefetch engines is prefetching of unwanted data, which leads to more cache evicts than cache hits. The prefetching of unwanted data increases the on-chip traffic in the processor and leads to under-utilization of the data cache. A conventional solution to tackle the above-mentioned problem includes using large look-up tables to identify unwanted cache lines (cache data). However, using such tables also increases the area and power overhead of the processor.
Therefore, it would be advantageous to have a prefetch controller that reduces prefetching of unwanted data, reduces on-chip traffic and power consumption, improves speed and performance of the processor, and generally overcomes the above-mentioned limitations of existing prefetch engines.