The invention pertains to methods and apparatus for improving computer system performance by improving cache utilization.
Most computer systems include a processing unit and a memory. The speed at which the processing unit can execute instructions and consume data depends upon the rate at which the instructions and data can be transferred from memory to the processing unit. In an attempt to reduce the time required for the processing unit to obtain instructions and data from main memory, many computer systems include a cache memory which is physically and logically located between the processing unit and main memory.
A cache memory is a small, high-speed buffer memory which is used to temporarily hold those portions of the contents of main memory which it is believed will be used in the near future by the processing unit. The main purpose of a cache is to shorten the time necessary to perform memory accesses, either for data or instruction fetch. Cache memory typically has access times which are several times faster than those of a system""s main memory. The use of cache memory can significantly improve system performance by reducing data access time, therefore enabling a processing unit to spend far less time waiting for instructions and data to be fetched and/or stored.
A cache memory comprises many lines of data which have been copied from a system""s main memory. Associated with each cache line is a cache tag. A line""s tag provides information for mapping the cache line""s data to its main memory address. Each time a processing unit requests an instruction or data from memory, an address tag comparison is made to see if a copy of the requested data resides in the cache. If the desired data is not in the cache, the requested data is retrieved from the main memory, stored in the cache, and supplied to the processing unit. Commonly used mapping functions for cache data storage include direct mapping and associative mapping techniques.
In an N-way set associative cache, a single index is used to simultaneously access a plurality of data arrays. A data array may be implemented by one or more random access memory integrated circuits. A set is a collection of all cache lines addressed by a single cache index. The number of data arrays addressed by a single cache index indicates the xe2x80x9cwayxe2x80x9d number of a cache. For example, if in a cache a single cache index is used to access data from two data arrays, the cache is a 2-way set associative cache. Similarly, if in a cache a single cache index is used to access data from four data arrays, the cache is a 4-way set associative cache.
When a multi-way cache access is made, a tag comparison is made for each data array or way. If a tag comparison indicates that the desired line of data resides in a particular data array, the line of data is output from the cache for subsequent use by the processing unit which requested the data.
Since there are a finite number of lines in a cache, it is frequently necessary to replace the information stored in a cache as new information is needed. However, each time a new line of data is retrieved from main memory, a determination must be made as to which line of data in the cache will be replaced. For an N-way set associative cache, this determination involves selecting the way in which the new line of data will be stored. If one or more ways hold invalid data, then it usually makes sense to fill an invalid way with valid data. However, if all of the ways hold valid data, a line of data must be bumped from the cache. In an ideal cache, it would be desirable to 1) retain valid lines of data which are most likely to be used again by a processing unit, 2) retain valid lines which are expensive to move around, and 3) replace valid lines which are inexpensive to move around. Since replacement of cache data is a relatively slow process in comparison to cache data retrieval, it is desirable to make as meaningful of a way selection as is possible given one""s speed, space and budget constraints.
There are many algorithms for improving cache utilization via better way selection for replacement data. However, one of three general approaches is commonly used. The first approach is a xe2x80x9crandomxe2x80x9d approach, in which new information is written into a randomly or pseudo-randomly selected way. Another approach is a xe2x80x9cround robinxe2x80x9d approach, in which the way to be filled with replacement data is simply the previously replaced way plus some constant. A third approach is to replace the xe2x80x9cleast recently usedxe2x80x9d way with replacement data. The first two methods yield acceptable results with a minimal performance penalty, little hardware, and little cost. The least recently used method achieves better performance, but with significant and costly hardware.
What is needed is a new method for determining which way of an N-way set associative cache should be filled with replacement data upon generation of a cache miss when all of the ways contain valid data. The new method needs to provide better cache utilization than a simple random or round robin approach, but needs to be less costly to implement than a least recently used approach.
In achievement of the foregoing need, the inventor has devised new methods and apparatus for determining which way of an N-way set associative cache should be filled with replacement data upon generation of a cache miss when all of the ways contain valid data. As previously stated, when some way contains invalid data, that way is the preferred target for replacement.
The methods and apparatus generate a first choice for way selection and at least one additional choice for way selection, and then use status information associated with the cache line corresponding to the first choice to determine whether the first choice or one of the at least one additional choices should be used to designate the way which will be filled with replacement data. Status information may comprise any data which is maintained on a cache line by cache line basis, but is preferably data which is maintained for purposes other than way selection. For example, status information might comprise indications as to whether a cache line is shared or private, clean or dirty.
In one embodiment of the invention, a first choice for way selection is generated using any available method. However, the method is preferably one that is simple and cost-effective to implement (e.g., random or round robin). A second choice for way selection is then derived from the first (e.g., the first choice plus some fixed constant). If the status of the way corresponding to the first choice differs from a bias status, the way corresponding to the second choice is designated as the way to be filled with replacement data. Otherwise, the way corresponding to the first choice is designated as the way to be filled with replacement data.
The bias status which is used to select between the first and second choices can be programmed to yield different results in different systems. For instance, a computer system with limited memory bus bandwidth relative to its processing unit""s capacity to execute instructions and consume data might perform better if the bias status favors replacing clean lines over dirty lines, or shared lines over private lines.
In a system comprising two or more processors which operate on data stored in a shared main memory, where data produced by one processor is consumed by another, it might be beneficial to program the bias status so that dirty lines are favored for replacement. In this manner, data needed by the second processor is more likely to be found in main memory rather than the first processor""s private cache, and the second processor can retrieve the data more readily.
These and other important advantages and objectives of the present invention will be further explained in, or will become apparent from, the accompanying description, drawings and claims.