The present invention relates to a prefetch queue provided for an external cache memory in a processor.
Prefetching is a known technique implemented in processor devices. Prefetching causes data or instructions to be read into the processor before it is called for by the processor's core execution unit ("core"). By having the data available within the processor when the core is ready for it, the core need not wait for the data to be read from slower external memories. Instead, the data is available to the core at the relatively higher data rates of internal buses within the processor. Because prefetching can free a core from having to wait while data requests are fulfilled, prefetching can improve processor performance.
If implemented incorrectly, however, prefetching can impair processor performance. By reading data from external memories into the processor, prefetch operations occupy resources on the external bus. Also, prefetching generally reads data into a memory cache at the core. Due to the limited size of the core cache, prefetching may write data over other data that the processor uses. Further, prefetching may read data into the processor that may never be used. Thus, prefetching is useful only if it improves processor performance more often than it impairs such performance. Instruction streaming, a type of prefetching, occurs when a core causes data to be read sequentially from several adjacent positions in external memory. Instruction streaming suffers from the above disadvantages.
It is known that prefetching may provide significant performance improvements when a processor either executes instructions or manipulates data held in adjacent memory locations. However, no known prefetching scheme adequately distinguishes programs that perform sequential memory reads from those that perform non-sequential memory reads. Further, some programs may perform sequential reads "in parallel." They may read data from sequential memory positions in a first area of memory interspersed with reads from sequential memory positions in a second area of memory. Traditional prefetching techniques do not recognize multiple streams of sequential memory reads as appropriate for prefetching.
Accordingly, there is a need in the art for a prefetch scheme that prefetches only when there exists a pattern demonstrating that performance improvements are to be obtained by prefetching. There is a need in the art for a prefetch scheme that incurs low performance costs for erroneous prefetches. Further, there is a need in the art for a prefetch scheme that detects and observes parallel prefetch operations.