This invention relates to systems, apparatuses and methods employing and implementing cache memories. More specifically, this invention relates to systems, apparatuses and methods of managing prefetching in cache memory.
Cache memories generally comprise part of a memory system; the memory system in turn typically comprises part of a computing system, such as a personal computer or a TV set-top box. The computing system further comprises a processing device. In the computing system, the memory system stores information which the processing device accesses in read and write operations.
Memory systems generally are structured as a hierarchy. Hierarchical memory systems combine technologies, generally in physically distinct levels, so as to balance among speed, capacity and expense at each level and toward achieving, overall, both acceptable performance and economy. At their lowest level, hierarchical memory systems typically have registers which are integral with the system's processing device, are limited in number, are extremely fast and are disposed physically adjacent to the logic blocks of the processing device (e.g., the arithmetic logic unit); at the same time, the registers are expensive relative to other memory technologies. Hierarchical memory systems also have high level memory: this memory typically includes (i) a main memory, generally comprising volatile memory technology (e.g., random access memory in any of its forms) and (ii) more-permanent storage (e.g., compact disk, floppy, hard, and tape drives).
Interposed between the registers and the high level memory is the cache memory. The cache memory may itself occupy levels, including a first level that is resident as part of the processing device's integrated circuit ("on-chip"), and a second level that is not on-chip but may be inside the processing device's package or otherwise closely coupled to such device. The cache memory generally is implemented, relative to higher levels of memory, using fast technologies. The cache memory's fast technologies typically are buttressed by physically-close coupling to the processing device. These technologies and coupling tend to be relatively expensive on a per-bit basis. However, because the cache memory typically is small in capacity, its overall cost remains acceptable in the computing system.
The cache memory generally is implemented so as to hold the information that the processing device is most likely to seek in the immediate future. In that regard, if the sought information (e.g., data, instructions, or both) is found in the cache memory (a "hit"), the information can be provided at great speed to the device, it being understood that the processing device will first seek access to information via the cache memory. If, however, the information is not found in the cache memory (a "miss"), the processing device accesses the information via one of the next, higher levels of the memory system. These next-level accesses typically engender, relative to a cache hit, increasingly larger delays in the information's availability (the "miss penalty").
In order to hold in cache memory the information that the processing device is likely to seek in the near-term, it is conventional to engineer the cache memory so as to continually update its contents. The update mechanism duplicates the accessed information, e.g. of a cache miss, from the high level memory into the cache memory. Generally, this update mechanism is implemented to load not only the accessed information, but also the information of neighboring memory addresses. Moreover, the update mechanism typically uses this information to replace other information in the cache memory, the replacement comporting with a selected replacement policy. One such policy is to replace information which, as of the update, was the least recently used such information being deemed the least likely to be used in the near-term and, therefore, replaceable.
While updating is directed to information likely to be accessed, another approach is to load information into the cache memory that is known to be subject to near-term access by the processing device. To do so, the processing device issues a prefetch instruction to the cache memory. The instruction's issue is in advance of the processing device's need to access the information referenced by the instruction. In addition, the issue generally is responsive to software such as the programmer's coding or the processing device's operating system or compiler.
Although prefetching arrangements have been proposed, their use has been constrained by various limitations. An exemplary such limitation is the difficulty associated with identifying prefetchable information. Moreover, even where prefetchable information can be identified, the associated prefetching arrangements have tended to allow insufficient time periods for proper prefetching of information prior to the processing device's initiation of an access operation for that information. While this timing insufficiency may be addressed by programming the prefetch instruction's issue long in advance of the access, so programming is undesirable as it can introduce collateral problems. One such collateral problem is the potential to waste cache memory resources during the time period between the loading and the eventual use of the prefetched information, which waste can degrade the cache memory's performance. Another collateral problem is the potential removal of prefetched information due to replacement under the action of a replacement policy.
The timing insufficiency may also be addressed by segmenting prefetching into a series of prefetch instructions. However, using an instruction series also may be undesirable as it too can introduce collateral problems. One collateral problem is the difficulty of achieving optimal temporal spacing between adjacent prefetch instructions: (i) each prefetch instruction should issue so that its prefetch operations do not conflict with operations of the next, adjacent instruction and (ii) adjacent instructions should issue so as to minimize time gaps between the prefetch operations.
Accordingly, it is desirable to provide a cache memory supporting prefetching while overcoming the problems typically associated with such operations.