The invention pertains to the storage of data in a cache, and more particularly, to the reduction of cache pollution. Cache pollution is defined herein as 1) the overwrite of data that is more likely to be fetched from a cache with data that is less likely to be fetched from a cache, and 2) the preservation of data in a cache, which data is unlikely to be reused in the near future.
Note that the word xe2x80x9cdataxe2x80x9d is used herein in two senses. In one sense, it is used to refer to specific data values which are to be added, shifted, or otherwise consumed by a functional unit of a computer. In another sense, xe2x80x9cdataxe2x80x9d is used to generically refer to both specific data values which are consumed, and/or instructions which are executed, by a functional unit of a computer. In the preceding paragraph, the word xe2x80x9cdataxe2x80x9d is used in its generic sense.
Most modern computer systems comprise a number of functional units 104 and a memory hierarchy 102. The functional units, in combination with a portion of the memory hierarchy 106, 108, and control logic for transferring instructions and data between the functional units and memory hierarchy, form a central processing unit (or xe2x80x9cprocessorxe2x80x9d 100). See FIG. 1. Functional units may comprise integer processing units, floating-point processing units, branch target adders, instruction fetch units, data fetch units, and so on.
The speed at which the processor can consume instructions and data is largely dependent upon the rate at which instructions and data can be transferred between the functional units and the memory hierarchy. In an attempt to increase these transfer rates, many computer systems employ a hierarchy of memory caches 106, 108.
A cache is simply a small, high-speed buffer memory which is used to temporarily hold those portions of the contents of main memory 110 which it is believed will be consumed in the near future by a processor""s functional units. The main purpose of a cache is to shorten the time necessary to perform memory accesses, either for instruction or data fetch. Information stored in cache memory may be accessed in much less time than information located in main memory. Thus, a processor with a cache memory needs to spend far less time waiting for instructions and data to be fetched and/or stored. In a cache hierarchy, lower level caches typically store increasingly smaller subsets of the instructions and data which are stored in main memory and/or higher level caches. However, lower level caches also tend to provide fetched instructions and data to functional units at an increasingly faster rate.
Since instructions and data are retrieved from a cache much more quickly than they are retrieved from main memory, it is desirable to keep caches filled with the instructions and data which functional units are likely to consume next. To achieve this goal, some processors fetch instructions and data speculatively. That is, they will predict the outcomes of conditional instructions (e.g., branch instructions) and fetch instructions and data from target code sections. If the execution of a conditional instruction is predicted to result in a first outcome, a target code section might be synonymous with a sequential code section. If the execution of a conditional instruction is predicted to result in a second outcome, branching to a target code section might require a redirection of program flow so that instructions and data are fetched from a non-sequential code section.
Instructions and data which are retrieved from memory as a result of the predicted program flow described in the preceding paragraph are known as xe2x80x9cfetchxe2x80x9d data. However, additional instructions and data are sometimes retrieved from memory. These additional instructions and data are known as xe2x80x9cprefetchxe2x80x9d data. Prefetch data may comprise 1) instructions and data retrieved from an alternate program flow path, 2) instructions and data which an instruction explicitly asks hardware to load into a cache, and 3) instructions and data whose retrieval are triggered by a hint which is encoded in an instruction.
While some caches only store fetch data, other caches store both fetch and prefetch data. When a cache stores prefetch data, it is possible that some of the prefetch data will never be consumed by a functional unit. The storage of unneeded prefetch data in a cache is referred to as xe2x80x9ccache pollutionxe2x80x9d (and is sometimes referred to herein as xe2x80x9cprefetch pollutionxe2x80x9d). Cache pollution also results from the continued storage of fetch data in a cache, long after a current need for the data has passed. This second form of cache pollution is sometimes referred to herein as xe2x80x9cfetch pollutionxe2x80x9d.
A number of methods have been devised to reduce cache pollution. One method involves writing new cache data over least recently used cache data. A least recently used (LRU) replacement algorithm therefore requires the tracking of data usage. Although numerous LRU-based algorithms exist, a true LRU algorithm simply ranks the temporal use order of data values stored in a cache. In an n-way, set-associative cache, for example, the data values in each indexed set of data values can be ranked from most to least recently used. When a new data value is written into such a cache, it will typically 1) overwrite the least recently used data value in a set of data values, and 2) be ranked as the most recently used data value in the set. The use rankings of other data values in the set are then downgraded accordingly.
If a cache stores both fetch and prefetch data, the use of an LRU-based based algorithm to store data in the cache can be problematic. Although the use of an LRU-based algorithm tends to alleviate pollution due to the storage of stale fetch data, the use of such an algorithm can sometimes overpopulate a cache""s data entries with prefetch data, and thus increase prefetch cache pollution.
Another method for reducing cache pollution, and a method which alleviates both fetch and prefetch cache pollution, is to implement an LRU-based algorithm for data storage, but to only store fetch data in a cache 202. Such a solution can be implemented by storing fetch and prefetch data retrieved from a higher level memory 208 in a buffer 204, and then performing writes of data from the buffer to the cache. See FIG. 2. Fetch data can be written from the buffer to the cache at any time (e.g., when cache fill port bandwidth so permits). If data is allowed to be fetched from the buffer, thus bypassing the cache, then provisions can be made for upgrading the status of this data to xe2x80x9cfetchedxe2x80x9d, and also writing this data into the cache.
To assist in determining which data values should be written from the buffer to the cache, data can be stored in the buffer with a reference status (e.g., a single reference bit). A reference bit can be set to a first value to indicate that a data value stored in the buffer has been fetchedxe2x80x94either prior to storage in the buffer, or subsequently. Likewise, a reference bit can be set to a second value to indicate that a data value stored in the buffer has only been prefetched. Since a reference bit is used to determine which data values are written into the cache, fetch data values which are written from the buffer to the cache will be referred to herein as xe2x80x9creferencedxe2x80x9d data values, and all other data values which are written from the buffer to the cache will be referred to as xe2x80x9cnon-referencedxe2x80x9d data values.
Typically, the buffer which is used in the above method is small (perhaps on the order of eight entries). If the buffer is too large, it becomes similar to the cache, and some of its usefulness and efficiencies are lost. However, the small size of the buffer can also be problematic. If non-polluting prefetches are issued far in advance of fetches, or if many polluting prefetches are issued, the capacity of the buffer can quickly be exceeded, and useful prefetch data can be lost. Thus, the buffer reduces pollution in the cache, but at the risk of losing a greater percentage of prefetch data due to data overwrites. As is known in the art, the re-fetch of a data value from a higher level cache (or main memory) can be costly with respect to both timing and resource usage (e.g., a read port on the higher level cache, and all of the busses and other resources betweeen the higher level cache and a stalled pipeline often need to be used). A need therefore exists for better methods and apparatus for reducing cache pollution, which methods are less likely to result in a loss of prefetch data.
In accordance with the invention, methods and apparatus for reducing cache pollution while attempting to preserve both fetch and prefetch data in a cache are disclosed herein.
By way of example, a first preferred method for reducing cache pollution comprises marking non-referenced data as less recently used when it is written into a cache, and marking referenced data as more recently used when it is written into a cache. Upon subsequent fetch of a data value from the cache, its use status may be updated to more recently used. When new data is written into the cache, the new data is written over data which is marked as less recently used.
In summary, the above-described method dispenses with the LRU convention of always marking new cache data as most recently used. Instead, new fetch data is marked as more recently used (and in most cases will be marked as most recently used). However, new prefetch data is marked as less recently used (and in most cases will be marked as least recently used). In an n-way, set-associative cache, this has the affect of preserving (nxe2x88x921)/n of the cache""s entries for the storage of fetch data, while limiting the storage of prefetch data to 1/n of the cache""s entries. Pollution which might result from unneeded prefetch data is therefore limited to 1/n of the cache. In reality, however, pollution from unneeded prefetch data will be significantly less, as many prefetch data values will ultimately be fetched prior to their overwrite with new data, and upon their fetch, their use status can be upgraded to most recently used, thus ensuring their continued maintenance in the cache.
Also by way of example, a first preferred embodiment of a pollution reducing cache structure for implementing the above method might comprise a number of data entries, a number of temporal use entries, a means for updating a temporal use entry upon the write of data into a corresponding data entry, and a means for 1) reading at least one temporal use entry from the cache during a write operation, 2) identifying a data entry which the at least one temporal use entry has marked as less recently used, and 3) causing new data to be written into the identified data entry. The means for updating a temporal use entry upon the write of data into a data entry preferably 1) marks non-referenced data as less recently used, and 2) marks referenced data as more recently used. In a preferred embodiment of the invention, a buffer which in part serves as an interface between a higher level memory and the cache, is used to mark data as referenced or non-referenced. However, data""s fetch/prefetch status (or referenced/non-referenced status) may be tracked in a variety of ways, as is known by those skilled in the art.
As previously mentioned, referenced data is data that has been fetched because a functional unit needs the data, and non-referenced data is data that has only been prefetched.
The above described cache structure requires little additional supporting logic over prior art cache structures, yet serves to further reduce cache pollution while ensuring that needed data is maintained in the cache.
These and other important advantages and objectives of the present invention will be further explained in, or will become apparent from, the accompanying description, drawings and claims.