Computer systems need increasingly faster and larger memory to accommodate ongoing technological advances. However, faster memory is generally more expensive than slower memory, and larger memory is generally slower than smaller memory. Cache memory is used in computer systems to provide fast, cost-effective memory access. Cache memory is smaller and faster than main memory. While cache memory is expensive due to its relatively fast speed, it is cost-effective because it is smaller than main memory. Most processor requests are found in the cache and are provided at a fast access rate. Only processor requests not found in the cache memory require accesses to main memory at its slower access rate.
If a processor request is found in a cache, a cache hit has occurred. Conversely, if a processor request is not found in a cache, a cache miss has occurred. A primary objective when designing a cache is to improve the cache hit rate, which may be done in part by increasing cache size. The larger the size of a cache, the more likely that processor requests are found there and are accessed at the faster cache access rate.
Unfortunately, larger caches are more costly because they require more space to store the cached information. Along with the cached information, status information needs to be kept in a directory to keep track of which segment of memory is stored in a particular cache location and its state. This may be done on a cache line basis, with one entry for each cache line. A cache in which status information is stored in a directory on a cache line basis is referred to as a non-sectored cache.
Table 1 provides an example of a non-sectored cache. Each directory entry contains one cache line, the address of the cache line, and bits indicating whether or not the cache line is valid and/or modified. There is no relationship between the cache lines with respect to their addresses. Replacement is performed by individual cache line.
TABLE 1Set AddressAddress TagValid BitModified BitData0Address Tag 0ValidModifiedData 01InvalidInvalidInvalidInvalid2Address Tag 2ValidCleanData 23Address Tag 3ValidModifiedData 34Address Tag 4ValidCleanData 45InvalidInvalidInvalidInvalid6Address Tag 6ValidCleanData 67Address Tag 7ValidModifiedData 7
As mentioned above, each cache line contains a bit indicating whether or not the corresponding cache data is valid. This bit indicates whether or not the data is valid for the entry. Initially, the valid bit will indicate that the corresponding cache data is invalid, since a cache at initialization merely contains random 1s and 0s. Once data is written for a cache line, the valid bit for that cache line will be set to indicate that the corresponding cache data is valid. However, the valid bit for a cache line subsequently may be set to indicate invalidity once again in certain situations. For example, when two or more caches have a copy of the same cache line and one of these caches modifies its version of the cache line, the other caches must invalidate their versions of the cache line, since their versions of the cache line are now outdated. To perform such invalidation, the caches with the outdated versions of the cache line may set the valid bit of the affected cache line to indicate that the corresponding cache data is now invalid.
Furthermore, as mentioned above, each cache line contains a bit indicating whether the corresponding cache data has been modified. The various embodiments described herein assume that a “write back” cache write policy is to be used. According to the write back cache write policy, writes initially modify data in the cache only, and thus data in main memory corresponding to modified cache data is modified only once a cache line corresponding to such modified cache data is to be replaced. That is to say, when a cache line corresponding to modified cache data is to be replaced, the modified cache data is “written back” to main memory.
In accordance with the write back policy, when a cache line is to be replaced but the corresponding cache data has not been modified, such cache line may be written over without first writing the corresponding cache data back to main memory. Such cache line may be identified by having its modified bit indicate that it is “clean”. Conversely, when a cache line is to be replaced but the corresponding cache data has been modified, according to the write back policy, the corresponding data of such cache line must be written back to main memory. Such cache line may be identified by having its modified bit indicate that it is “modified”.
Non-sectored caches are advantageous in that they provide increased flexibility with respect to where a cache line can be placed and individual cache line replacement resulting in the best possible cache hit rate. However, since status information is stored on a cache line basis, a non-sectored cache may occupy substantial directory space.
One way to save directory space is to use a sectored cache. A sectored cache is divided into sectors, and each sector is further divided into subsectors. Each subsector of a sector is related by a matching set of address bits. Only the address of the sector must be kept in the directory rather than the address of each subsector, thus reducing necessary directory space. Status information is kept on each subsector.
Table 2 provides an example of a sectored cache with two sectors. Each sector in this case contains four subsectors. All of the subsectors contain an address pattern in the address tag and are distinguished from each other by a few address bits that determine their position. The subsectors are somewhat limited with respect to the positions in which they can be placed. Bits indicating whether or not a subsector is valid and/or modified are stored in the directory for each subsector. As mentioned above, a key advantage of the sectored cache is that only a fraction of the address tags need to be stored in the directory, thus saving considerable space. This is especially important when the directory is physically located apart from the cache data, such as with a processor with an internal directory and an external data cache. Replacements within a sectored cache must be done on a sector basis, meaning that multiple modified subsectors may be replaced during a replacement. Thus, a single sector replacement may require multiple writebacks of modified subsectors to main memory.
TABLE 2Set AddressAddress TagValid BitModified BitData0Address Tag 0ValidModifiedData 0-0ValidCleanInvalidInvalidInvalidData 0-2ValidCleanData 0-31Address Tag 1ValidCleanData 1-0InvalidInvalidInvalidValidModifiedData 1-2ValidModifiedData 1-3
It should be noted that the valid and modified bits for subsectors in a sectored cache may function in the same way as do the valid and modified bits for cache lines in a non-sectored cache as previously described.
When a new sector needs to be added to the cache, such as in the event of a cache miss, a sector to be replaced must be chosen. A sectored cache replacement algorithm is typically used to determine which sector to be replaced. Such an algorithm may use historical information kept on each sector such as the frequency that such sector is used and how recently such sector was used. In accordance with a conventional cache replacement algorithm known as the sectored least recently used (LRU) algorithm, the least recently used sector among a plurality of replaceable sectors is replaced. The LRU cache replacement algorithm generally produces the best hit rates. However, hit rates are not the only factor required for good performance. Another important factor is the utilization of the bus connecting the cache to main memory. High bus utilization may significantly decrease computer system performance as a performance bottleneck may result from requests waiting to use the bus. A sectored cache can cause bus utilization that is significant and concentrated in time (i.e., “bursty”), since all modified data for a sector being replaced must be written back to main memory. More specifically, if a sector being replaced has a large amount of modified data, the bus between the cache and main memory must be utilized for a significant and concentrated amount of time in order to write back all of the modified data to main memory. Such scenario may occur when multiple subsectors of a sector to be replaced have been modified, thus requiring a writeback for each modified subsector. The LRU cache replacement algorithm and its variants do nothing to reduce such “burstiness”.