1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to performing a speculative cache fill in a computer system.
2. Description of the Related Art
Since main system memory is typically designed for density rather than speed, microprocessor designers have added caches to their designs to reduce the microprocessor""s need to directly access main memory. A cache is a small memory that is more quickly accessible than the main memory. A processor may have a number of different levels of caches. For example, a processor may have a xe2x80x9clevel onexe2x80x9d (L1) cache and a xe2x80x9clevel twoxe2x80x9d (L2) cache. These caches tend to be integrated on the same substrate as the microprocessor. Caches are typically constructed of fast memory cells such as static random access memories (SRAMs) which have faster access times and bandwidth than the memories used for the main system memory (typically dynamic random access memories (DRAMs) or synchronous dynamic random access memories (SDRAMs)). The faster SRAMs are not typically used for main system memory because of their low density and high cost.
Many other types of caches may also be present in computer systems. For example, the main system memory may act as a cache for the system""s slower direct access storage devices (e.g., hard disk drives). Other devices, such as hard drives, may also include internal caches. For example, hard drives may cache recently accessed or written data in order to improve their read performance. Generally, having a cache allows a device to retrieve data from the cache more quickly than if the device had to access a larger, slower memory to retrieve the data.
When a microprocessor needs data from memory, it typically first checks its L1 cache to see if the required data has been cached. If the data is not present in the L1 cache, the L2 cache is checked (if the processor has an L2 cache). If the L2 cache is storing the data, it provides the data to the microprocessor (typically at much higher rate than the main system memory is capable of). If the data is not cached in the L1 or L2 caches (referred to as a xe2x80x9ccache missxe2x80x9d), the data is read from main system memory or some type of mass storage device (e.g., a hard disk drive). Relative to accessing the data from the L1 cache, accesses to memory take many more clock cycles. Similarly, if the data is not in the main system memory, accessing the data from a mass storage device takes even more cycles.
One problem that arises due to caching is that, depending on the way in which updated data in the cache is presented to the memory, a copy of a particular line of data in a cache may not be the same as the copy of that line that is currently in system memory. For example, many caches use a write-back policy to update the copy of data in system memory. Write-back systems increase write efficiency because an updated copy of the cache line is not written back to system memory until the line is evicted from the cache. However, from the time the line is updated in the cache until the time the line is written back to system memory, the cache copy may differ from the memory copy (i.e., the memory has a xe2x80x9cstalexe2x80x9d copy of that data). As a result, accesses to system memory may be controlled so that other devices in the computer system do not access the stale copy of the data in the system memory. Generally, this problem is one of cache coherence, or ensuring that each device in the computer system accesses the correct (i.e., most recently updated) copy of a particular item of data, regardless of which device is requesting the data or where the data is actually stored. In single processor systems, maintaining cache coherency usually involves restricting I/O devices"" access to system memory and/or restricting which portions of system memory may be cached.
In multiprocessor systems, maintaining cache coherency may be a significant problem because the different processors may frequently attempt to access the same data. Additionally, it is desirable for all of the processors to be able to cache the data they operate on. Thus, each processor may have its own L1 and/or L2 cache, but the system memory may be shared between multiple processors. In such a system, one processor may update a copy of a particular memory location in its cache. If a write-back cache policy is being used, the system memory""s copy of the modified data may no longer be consistent with the updated copy in the first processor""s cache. If a second processor reads the unmodified data from the system memory, unaware of the first processor""s updated copy, memory corruption may result. In order to prevent this, whenever one processor needs to perform a cache fill, it may check to make sure none of the other processors in the system have a more recent copy of the requested data in their caches.
There are several different methods of detecting whether other processors have copies of a particular item of data in their caches. One method is called snooping. Snooping is typically used in systems where all processors that share memory are also coupled to the same bus. Each processor or cache controller monitors the bus for transactions involving data that is currently in its cache. If such a transaction is detected, the particular unit of data may be evicted from the cache or updated in the cache. Another method of detecting whether other caches have copies of requested data involves a data-requesting processor sending probe commands to every other processor and/or cache controller in the system. In response to receiving a probe, a processor or cache controller may generate a response indicating whether its cache contains a copy of the requested data.
In some systems, the time required to maintain cache coherency (e.g., the time required to send probes and receive responses) may be significant. The total time taken to perform a cache fill may depend on the latency of both the cache coherency mechanism and that of the memory system. As a result, the time spent maintaining cache coherency may significantly affect performance. Accordingly, one drawback of sharing memory between devices that have caches is that cache fill performance may decrease.
Various embodiments of methods and systems for performing a speculative cache fill are disclosed. In one embodiment, a computer system includes several caches that are each coupled to receive data from a shared memory. Each cache is controlled by a respective cache controller. A cache coherency mechanism, which in some embodiments may be part of a chipset, is coupled to the cache controllers and the memory. The cache coherency mechanism is configured to receive a cache fill request. In response to receiving the request, the cache coherency mechanism is configured to send a probe to some of the cache controllers (e.g., all of the cache controllers except for the one controlling the cache that is being filled by the cache fill request). Some time after sending the probe, the cache controller is configured to provide a speculative response to the requesting cache. By delaying to send the speculative response until some time after the probes are sent, the cache coherency mechanism may increase the likelihood that responses to the probes will be received in time to validate the speculative response.
The cache coherency mechanism may be configured to provide speculative response if at least one of the cache controllers to whom a probe was sent has not yet responded to the probe. If one of the cache controllers responds to the probe with an indication that its cache has a modified copy of the data, the cache coherency mechanism may be configured to invalidate the speculative response and provide a non-speculative response after obtaining the most recent copy of the data.
The cache coherency mechanism may be configured to validate the speculative response by providing a validation signal to the first cache""s cache controller during the speculative response. If fewer than all of the cache controllers have responded to the probe, the controller may invalidate the speculative response by failing to provide the validation signal.
In another embodiment, a computer system includes a first cache controller configured to control a first cache, a second cache controller configured to control a second cache, and a memory coupled to provide data to the first and second caches. Either or both of the cache controllers may be integrated with a respective processor. A cache coherency mechanism may be coupled to the first cache, the second cache, and the memory. In some embodiments, the cache coherency mechanism may be included in a chipset. In alternative embodiments, portions of the cache coherency mechanism may be included in one or more processors.
The cache coherency mechanism may be configured to receive a first request to provide a copy of data from the memory to the first cache. This request may be generated by a processor in response to a cache miss in the first cache. In response to receiving the first request, the cache coherency mechanism may be configured to send a probe to the second cache controller in order to determine whether the second cache contains a copy of the requested data. If a certain amount of time has elapsed since the probe was sent and the second cache controller has not yet provided a response to the probe to the cache coherency mechanism, the cache coherency mechanism may be configured to provide a speculative response to the first request. In some embodiments, this amount of time may be measured in clock cycles (e.g., by a counter).
The cache coherency mechanism may be configured to prioritize non-speculative responses before speculative responses. For example, if the first amount of time has elapsed and a second, non-speculative response is pending, the cache coherency mechanism may be configured to provide the second response to the first cache before providing the speculative response to the first cache. In some embodiments, the first amount of time may define the time at which the speculative response""s xe2x80x9claunch windowxe2x80x9d opens. The cache coherency mechanism may be configured to provide the speculative response at any time during this launch window.
If the cache coherency mechanism receives the response to the probe from the second cache controller, the cache coherency mechanism may be configured to provide an indication that the first response is non-speculative. If the response to the probe indicates that the memory contains the most recent copy of the requested data (and thus the speculative response is no longer speculative), the cache coherency mechanism may also be configured to provide a non-speculative response to the memory (or to validate a speculative response, if there is one). If the response to the probe indicates that the second cache contains a most recent copy of the requested data, the cache coherency mechanism may be configured to invalidate the speculative response, if one was launched, and to provide a non-speculative response to the cache fill request once the most recent copy of the requested data is obtained.
In another embodiment, a method of performing a cache fill in a shared memory computer system may include the steps of a first device asserting a first cache fill request, sending a probe to a second device in response to the first cache fill request being asserted, and, after a first amount of time has elapsed since the probe was sent, providing a speculative response to the first device.
If a response to the probe is not received before a decision point that occurs while the speculative response is being provided (e.g., a point at which the speculative response should be validated), the speculative response may be invalidated. Similarly, if a probe response is received but the response indicates that the system memory""s copy of the requested data is not the most recent copy, the speculative response may be invalidated. If the response to the probe is received and indicates that the system memory contains the most recent copy of the requested data, the speculative response may be validated.