The invention relates generally to processors and, more particularly, to a method and apparatus for content-aware prefetching.
A conventional processor typically operates at a much faster speed than the main memory to which the processor is coupled. To overcome the inherent latency of main memory, which usually comprises dynamic random access memory (DRAM), a memory hierarchy is employed. The memory hierarchy includes one or more levels of cache, each cache comprising a relatively fast memory device or circuitry configured to hold data recently accessedxe2x80x94or expected to be accessedxe2x80x94by the processor. The purpose of the cache is to insure most data needed by a processor is readily available to the processor without accessing the main memory, as the process of accessing main memory is very slow in comparison to the speed of the processor or the speed at which the processor can access a cache.
Typically, a memory hierarchy comprises multiple levels of cache, wherein each level is faster than next lower level and the level closest to the processor exhibits the highest speed and performance. A cache may be located on the processor itselfxe2x80x94i.e., an xe2x80x9con-chipxe2x80x9d cachexe2x80x94or a cache may comprise an external memory devicexe2x80x94i.e., an xe2x80x9coff-chipxe2x80x9d cache. For example, a processor may include a high level on-chip cachexe2x80x94often times referred to as an xe2x80x9cL1xe2x80x9d cachexe2x80x94wherein the processor is coupled with a lower level off-chip cachexe2x80x94which is often referred to as an xe2x80x9cL2xe2x80x9d cache. Alternatively, a processor may include an on-chip L1 cache, as well as an on-chip L2 cache. Of course, a memory hierarchy may include any suitable number of caches, each of the caches located on-chip or off-chip.
As noted above, each level of cache may hold data recently accessed by the processor, such recently accessed data being highly likelyxe2x80x94due to the principles of temporal and spatial localityxe2x80x94to be needed by the processor again in the near future. However, system performance may be further enhancedxe2x80x94and memory latency reducedxe2x80x94by anticipating the needs of a processor. If data needed by a processor in the near future can be predicted with some degree of accuracy, this data can be fetched in advancexe2x80x94or xe2x80x9cprefetchedxe2x80x9dxe2x80x94such that the data is cached and readily available to the processor. Generally, some type of algorithm is utilized to anticipate the needs of a processor, and the value of any prefetching scheme is dependent upon the degree to which these needs can be accurately predicted.
One conventional type of prefetcher is commonly known as a xe2x80x9cstridexe2x80x9d prefetcher. A stride prefetcher anticipates the needs of a processor by examining the addresses of data requested by the processorxe2x80x94i.e., a xe2x80x9cdemand loadxe2x80x9dxe2x80x94to determine if the requested addresses exhibit a regular pattern. If the processor (or an application executing thereon) is stepping through memory using a constant offset from address to addressxe2x80x94i.e., a constant stridexe2x80x94the stride prefetcher attempts to recognize this constant stride and prefetch data according to this recognizable pattern. Stride prefetchers do, however, exhibit a significant drawback. A stride prefetcher does not function well when the address pattern of a series of demand loads is irregularxe2x80x94i.e., there is not a constant stridexe2x80x94such as may occur during dynamic memory allocation.
Another method of data prefetching utilizes a translation look-aside buffer (TLB), which is a cache for virtual-to-physical address translations. According to this method, the xe2x80x9cfill contentsxe2x80x9dxe2x80x94i.e., the requested dataxe2x80x94associated with a demand load are examined and, if an address-sized data value matches an address contained in the TLB, the data value likely corresponds to a xe2x80x9cpointer loadxe2x80x9dxe2x80x94i.e., a demand load in which the requested data is an address pointing to a memory locationxe2x80x94and is, therefore, deemed to be a candidate address. A prefetch request may then be issued for the candidate address. Because the contents of the requested dataxe2x80x94as opposed to addresses thereofxe2x80x94are being examined, this method may be referred to as content-based, or content-aware, prefetching. Such a content-aware prefetching scheme that references the TLB (or, more generally, that references any external source or index of addresses) has a significant limitation: likely addresses are limited to those cached in the TLB, and this constraint significantly reduces the number of prefetch opportunities. Also, this content-aware prefetching scheme requires a large number of accesses to the TLB; thus, additional ports must be added to the TLB to handle the content prefetcher overhead.