1. Field of the Invention
This invention relates generally to processor-based systems, and, more particularly, to pre-fetching in a processor-based system.
2. Description of the Related Art
Many processing devices utilize caches to reduce the average time required to access information stored in a memory. A cache is a smaller and faster memory that stores copies of instructions and/or data that are expected to be used relatively frequently. For example, central processing units (CPUs) are generally associated with a cache or a hierarchy of cache memory elements. Other processors, such as graphics processing units, can also implement cache systems. Instructions or data that are expected to be used by the CPU are moved from (relatively large and slow) main memory into the cache. When the CPU needs to read or write a location in the main memory, it first checks to see whether a copy of the desired memory location is included in the cache memory. If this location is included in the cache (a cache hit), then the CPU can perform the read or write operation on the copy in the cache memory location. If this location is not included in the cache (a cache miss), then the CPU needs to access the information stored in the main memory and, in some cases, the information can be copied from the main memory and added to the cache. Proper configuration and operation of the cache can reduce the average latency of memory accesses to a value below the main memory latency and close to the cache access latency.
A pre-fetcher can be used to populate the lines in the cache before the information in these lines has been requested. The pre-fetcher can monitor memory requests associated with applications running in the CPU and use the monitored requests to determine or predict that the CPU is likely to access a particular sequence of memory addresses in the main memory. For example, the pre-fetcher may detect sequential memory accesses by the CPU by monitoring a miss address buffer that stores addresses of previous cache misses. The pre-fetcher can then attempt to predict future memory accesses by extrapolating based upon the current and/or previous sequential memory accesses. The pre-fetcher fetches the information from the predicted addressed locations in the main memory and stores this information in the cache so that the information is available before it is requested by the CPU. Pre-fetchers can keep track of multiple streams and independently pre-fetch data for the different streams.
Conventional pre-fetching algorithms can improve the performance of the cache system, particularly when the CPU is executing a series of memory accesses to sequential memory locations. However, conventional pre-fetching algorithms can encounter difficulties in a number of circumstances. For example, there is some latency between the time a pre-fetch request is posted and performance/issuance of the pre-fetch request, e.g. fetching the data into the cache. If a cache miss occurs to a posted pre-fetch memory address before the request issues, the pre-fetcher can fall behind the stream of memory requests. When the pre-fetcher falls behind it is fetching information into the caches after it has been requested by the CPU. For another example, in some cases the CPU requests are not perfectly sequential and instead have holes or gaps when the requests skip over some addresses in the address sequence. Conventional pre-fetchers may not associate the address requests that follow the holes or gaps as being part of the same sequence as the addresses that preceded the holes or gaps. The pre-fetcher may therefore allocate a new stream to the address requests that follow the holes/gaps instead of pre-fetching this information as part of the stream associated with the addresses that preceded the holes/gaps.