1. Field of the Invention
This invention relates to caches and, more particularly, to invalidating lines in a ache.
2. Description of the Related Art
Since main system memory is typically designed for density rather than speed, microprocessor designers have added caches to their designs to reduce the microprocessor""s need to directly access main memory. A cache is a small memory that is more quickly accessible than the main memory. Computer systems may have a number of different levels of caches. For example, a computer system may have a xe2x80x9clevel onexe2x80x9d (L1) cache and a xe2x80x9clevel twoxe2x80x9d (L2) cache. These caches are typically integrated with the microprocessor. Caches are typically constructed of fast memory cells such as static random access memories (SRAMs) which have faster access times than the memories used for the main system memory (typically dynamic random access memories (DRAMs) or synchronous dynamic random access memories (SDRAMs)). The faster SRAMs are not typically used for main system memory because of their low density and high cost.
Many other types of caching are also possible. For example, the main system memory may act as a cache for the system""s slower direct access storage devices (e.g., hard disk drives). Other devices, such as hard drives, may also include internal caches to improve their performance.
When a microprocessor needs data from memory, it typically first checks its L1 cache to see if the required data has been cached. If not, the L2 cache is checked. At the same time, the data may be requested from memory, in case there is a miss in the L2 cache. If the L2 cache is storing the data, it provides the data to the microprocessor (typically at much higher rate and lower latency than the main system memory is capable of), and if the data was requested from memory, that request may be cancelled. If the data is not cached in the L1 or L2 caches (referred to as a xe2x80x9ccache missxe2x80x9d), the data is read from main system memory or some type of mass storage device (e.g., a hard disk drive). Relative to accessing the data from the L1 cache, accesses to memory take many more clock cycles. Similarly, if the data is not in the main system memory, accessing the data from a mass storage device takes even more cycles.
Caches typically operate on the principal of locality of reference, which states that the data most recently used (and the data in that locality) is mote likely to be accessed than the rest of the data. This principle holds because computer software typically has loops and branches that cause previously executed code to be re-executed. By storing recently accessed instructions and data in a cache, system performance may be increased because the microprocessor need not wait for the instructions and data to be read from main memory.
Microprocessor and computer system architects have taken the locality of reference principle one step further by using techniques such as branch prediction to proactively store instructions and data in the cache before they are actually needed by the microprocessor. In addition, when an instruction or byte of data is read from memory, additional bytes following the instruction or data are read and cached. Once again, the principal of locality of reference dictates that these instruction and data bytes are more likely to be needed by the processor than the other data or instructions at large.
There are several different ways to map the system memory into the cache. One common approach utilizes an n-Way set-associative cache, in which the cache is segmented into sets. Each set contains n cache lines. A cache line is a sequential group of bytes (e.g., 32 or 64). For efficiency purposes, cache memory transactions are typically in cache lines rather than in single bytes. Cacheable locations in main memory may each be assigned to one of the sets of cache lines. As a result, each location may be cached in any one of the n locations within its assigned set. One special case of the n-Way set-associative cache is the direct-mapped cache. In a direct-mapped cache, n=1, and thus each memory location maps to only one location in the cache. Another special case of the n-Way set-associative cache is the fully associative cache. In this case, n=m, where m is the number of lines in the cache (and thus there is only one xe2x80x9csetxe2x80x9d). In this case, each memory location may map to any of the cache locations.
Two basic performance criteria for caches are hit ratio (i.e., the ratio of the memory accesses that are found in the cache to the total number of memory accesses) and search speed (i.e., how quickly a hit or miss determination can be made). In a direct-mapped cache, search speed is optimized at the cost of hit ratio. This is because it is relatively easy to determine hits/misses (since a memory location only maps to one cache line, only that line needs to be checked) but more difficult to have a high hit ratio since multiple memory locations map to a single cache line. Conversely, fully-associative caches optimize hit ratios while sacrificing search speed. Allowing all memory locations to map to any cache line improves the probability that there will be a hit but greatly increases the complexity of searches since all cache lines must be searched for each memory location. Set-associative caches attempt to compromise between the two by offering more associativity (and thus higher hit ratios) than direct-mapped caches while also offering faster search speeds than fully-associative caches.
Since cache size is limited by a number of factors (including die size, power consumption, and cost), care must be taken when loading information into the cache. Once particular area of concern for the designer arises when deciding a policy for overwriting or invalidating existing instructions and data in a cache to make room for new instructions and data. Thus, in set-associative caches where n  greater than 1 (and thus there are choices as to which line to cache a particular memory location), there needs to be some way to choose which of the possible cache lines to fill with new data. A common solution is to track the relative order of access to each cached memory location and then replace the least recently used instructions or data with new instructions or data. This solution is based on the principle that recently accessed cache lines are more likely to be accessed again. Other solutions include random replacement and first-in first-out techniques.
On average, least-recently used (LRU) cache replacement algorithms provide better performance than other algorithms. However, in order to determine the least recently used (LRU) cache line in an n-way set associative cache, conventional approaches require a significant amount of complex hardware, including counters and n-way multiplexers, to implement the LRU algorithm. Additionally, status bits for each cache entry track the usage of each entry. When a new entry is made in the set, the status bits are scanned to determine which of the cache lines is the least recently used or invalid. The least recently used or invalid line is then evicted to make room for the new entry. Drawbacks of a conventional LRU replacement algorithm include the amount of hardware and number of status bits time required to implement the algorithm as well as the time and hardware required to scan for invalid entries in the set.
In general, it is desirable to improve the performance of cache subsystems. For example, as processor speeds improve, it is desirable to provide cache subsystems that are capable of more quickly supplying more data.
In may often be useful to invalidate a line in a cache. However, a cache line invalidation may depend on many underlying factors. In many circumstances, error checking is performed to determine whether these underlying factors were correct. If these factors are not correct, the invalidation is erroneous and should not be performed. Since error checking may take a significant amount of time to complete, a determination as to whether an invalidation is erroneous or not may not be available when the invalidating request is actually accepted by a cache controller. As a result, invalidation requests may require a cache controller to spend time waiting for error checking to resolve, preventing the cache controller from moving on to other pending tasks. At the same time, it may be rare for a cache line invalidation to be erroneous, and thus the time the cache controller spends waiting for error checking to complete is often wasted.
If a cache controller is configured to speculatively invalidate a cache line, the cache controller may be able to respond to an invalidating request immediately instead of waiting for error checking to complete. In order to handle the rare situation where the invalidation is erroneous and thus should not be performed, the cache controller may also protect the speculatively invalidated cache line from modification until error checking is complete. This way, if the invalidation is later found to be erroneous, the speculative invalidation can be reversed.
Accordingly, various embodiments of methods and systems for speculatively invalidating a line in a cache are disclosed. In one embodiment, a computer system includes a processor, a system memory, a cache controller, a cache, and an error detection unit. The cache is coupled to the processor and includes a plurality of cache line storage locations. The cache controller is coupled to receive a first request that invalidates a first cache line. In response to receiving the first request, the cache controller is configured to speculatively invalidate the first cache line. In order to preserve the first cache line in case the speculative invalidation later needs to be reversed, the cache controller is further configured to prevent modification of the first cache line storage location until the invalidation of the first cache line becomes non-speculative. The error detection unit is configured to perform at least one check corresponding to the first request. For example, the error detection unit may be the cache controller itself, and the checking may involve checking to make sure that an operation that led to the speculative invalidation (e.g., a hit in an exclusive cache in response to a fill request from a higher-level cache) was proper given the state of the first cache line. If the error detection unit performs the check and does not detect any errors, the invalidation of the first cache line becomes non -speculative.
In one embodiment, the cache controller may be configured to speculatively invalidate the first cache line by toggling a validity bit associated with the first cache line. Accordingly, reversing the speculative invalidation may involve re-toggling the validity bit to show that the first cache line is again valid. Also, in some embodiments, the cache controller may be configured to not accept requests that depend on a state of or data in the first cache line until the invalidation of the first cache line becomes non-speculative. This way, these requests may be delayed until the speculative invalidation either becomes non-speculative or is reversed. Generally, until the invalidation of the first cache line becomes non-speculative, the cache controller may be configured to not accept additional requests based on the type of request and whether the request depends on or modifies the speculatively invalidated first cache line. For example, the cache controller may be configured to not accept the additional requests that are: fill requests from a higher-level cache that hit the first cache line, probe or state-change requests for the first cache line, or copy backs from the higher-level cache that select the first cache line for replacement. The cache controller may be configured to not accept additional requests that match part of the tag of the first cache line.
In some embodiments, as part of the speculative invalidation, the cache controller may be configured to save the pre-speculative invalidation replacement state (e.g., the state used to select a line for replacement) associated with the first cache line and to update the post-speculative invalidation replacement state of the first cache line as if the first cache line had been invalidated. If the speculative invalidation is later determined to be erroneous, the cache controller may restore the saved pre-speculative invalidation replacement state when reversing the speculative invalidation.
In another embodiment, a method of speculatively invalidating a line in a cache is disclosed. The method includes accepting a request that results in the line in the cache being invalidated, initiating checks that determine whether the invalidation is erroneous, and speculatively invalidating the line. Speculatively invalidating the line involves indicating that the line is invalid and protecting the line from subsequent modification until the checks have completed. If one of the checks determines that the invalidation is erroneous, the method also includes reversing the speculative invalidation by indicating that the line is valid again.
In another embodiment, a method of speculatively invalidating a first cache line in an exclusive cache is disclosed. The method may include accepting a fill request from a higher-level cache, determining whether the fill request hits in the exclusive cache, initiating checks that determine whether the fill request initiated erroneously, and, if the fill request hits in the exclusive cache, providing the first cache line from the exclusive cache to the higher-level cache. If the checks have not yet completed when the first cache line is provided to the higher-level cache, the first cache line may be speculatively invalidated. Speculatively invalidating may include indicating that the first cache line is invalid and protecting the first cache line from subsequent modification until the checks complete.
In yet another embodiment, a cache subsystem is disclosed. The cache subsystem includes a cache and a cache controller. The cache controller is configured to speculatively invalidate a first cache line. If the speculative invalidation is detected to be erroneous, the cache controller may be configured to reverse the speculative invalidation. The cache subsystem may also include a speculative invalidation controller configured to protect the first cache line from modification until the invalidation becomes non- speculative. Detecting whether the speculative invalidation is erroneous may take a certain number of cycles, and thus the speculative invalidation may not become non-speculative until after that certain number of cycles.