1. Technical Field
The present invention relates generally to cache units contained within a data processing system. More specifically, the present invention relates to improving the coordination of operations between different level caches.
2. Description of the Related Art
Most modern data processing systems make use of caches to increase the rate at which they can process data. (As used herein, the term "data" refers to any type of information that can be stored in the memory of a data processing system. Specifically, data encompasses both program instructions and application data.) Generally, a cache is defined as a relatively small amount of relatively fast, expensive memory which resides between a processor and relatively large amount of slow, inexpensive memory (main memory). A cache attempts to store those portions of main memory which will be needed by the processor. When the processor needs data from main memory, it will first check to see if that data is in the cache. If the data requested by the processor is in the cache, the cache simply returns that data to the processor. This type of operation allows the processor to avoid having to access main memory. Since accessing the cache is faster than accessing main memory, the rate at which data is processed by the processor is increased.
A cache is comprised of a cache controller and cache RAM. The cache RAM serves as a storage area for cache line data, while the cache controller controls the storage and retrieval of cache line data from the cache RAM. The cache RAM is often divided into "blocks" or "lines," with each line having an associated "tag" and attribute bits The lines in cache RAM contain the actual data from main memory. The data from main memory that is stored in the cache RAM is referred to as cache line data. The tags specify which portion of main memory is contained in the line. A tag and associated attribute bits are often known as a directory entry, and the area of a cache's RAM which is used to store the directory entries is referred to as an array of directory entries (or a directory array).
Many modern processors use more than one cache to increase performance. One common arrangement is to have one small cache located on the same silicon die as the microprocessor. A cache that is located on the same silicon die, or otherwise very closely associated with a microprocessor, is often known as a L1 cache. Another cache, known as the L2 cache, can be placed apart from the microprocessor and the L1 cache. The L2 cache resides between the processor and main memory, and functions in a manner similar to that of the L1 cache. The L2 cache is almost always larger than the L1 cache, but the L2 cache cannot provide data to the processor as quickly as the L1 cache.
In operation, if the processor requires data from main memory, it will first check the L1 cache to see if that data is stored there. If the requested data is in the L1 cache, the L1 cache will forward this data to the processor and the processor will continue processing data. If the requested data is not in the L1 cache, the processor will look to the L2 cache for the data. If the requested data is in the L2 cache, the data will be forwarded to the processor. Data from the L2 cache cannot be retrieved as quickly as data from the L1 cache, but, retrieving data from the L2 cache is still much faster than retrieving the data from main memory. If the data requested by the processor is not in the L2 cache, the processor will retrieve the data from main memory, and will encounter significant performance penalties. The ability of a cache to quickly forward data to a processor can significantly affect the performance of the data processing system as a whole. Therefore, almost all aspects of a cache's organization, function, and size have been the subject of intense scrutiny.
Caches which are designed to be used in data processing systems with multiple processors contain additional levels of complexity. Caches operating in multiple processor systems must have the ability to monitor the data being stored and retrieved from main memory by other computing units (the term "computing units" refers to devices which can access main memory or other devices attached to a common system bus). Otherwise, the various computing units within the data processing system may interfere with each other's ability to accurately store and retrieve data from main memory. Caches use the attribute bits associated with each line of a cache to keep the contents of the cache consistent with the data contained in main memory.
Two of the attribute bits contain the "MESI" state of the line in the cache. Depending on the state of these bits, a cache controller can delay another computing unit from accessing main memory in order to update main memory with a new value contained in the cache line. For a more detailed explanation of how the MESI state of a cache line affects various computing operations, see the "Power PC 604 RISC Microprocessor User's Manual," by IBM Corp. and Motorola, Inc., (1994). Another attribute bit is known as the L1 Inclusive bit. When set, the L1 Inclusive bit indicates that a line in the L2 cache may be stored in the L1 cache as well.
Since the L2 cache serves as an interface to the system bus for the processor and the L1 cache, the L2 cache must know what areas of main memory are contained in the L1 cache and must know when a transaction occurring over the system bus would modify a section of main memory contained in the L1 cache. To accomplish this task, the L2 "snoops" the system bus for transactions which would modify an area of memory contained in its own cache as well as the L1 cache. "Snooping the bus" refers to the L2 cache monitoring the system bus for transactions which might have an effect on the state of a line within the L2 cache or the L1 cache.
When a line in an L2 cache has its L1 Inclusive bit set, many prior art caches process the line in the same manner regardless of whether the MESI state of the line is Invalid or Modified. However, operations in the data processing system can be enhanced by handling these situations in differently.
In addition, prior art caches have implemented inefficient flushing algorithms. As related to caches, flushing refers to writing all of the data that has been modified while in the cache to main memory. Flushing a cache ensures that all computing units which have access to main memory can access the same data at the same location. Also, when a cache is flushed, the MESI state of the lines within the cache is set to Invalid.
Another problem associated with prior art caches is their inability to efficiently update their directory array. The inefficiencies concern the writing of updated directory entries to the directory array. Many prior art systems use queues or other FIFO devices to buffer writes to the directory array. However, the switching involved in operating these FIFO devices consumes excessive amounts of power. Also, from a performance point of view, using FIFO devices can create a bottleneck. A bottleneck is created when an entry is first in line to be written to the directory array, and the writing of this entry is delayed because the entry is waiting to receive a result from the system bus. In traditional FIFO systems, other entries behind the entry first in line cannot be written to the directory array, and must wait on the entry which is first in line, even though these other entries are ready to be written to the directory array.
Yet another problem associated with prior art L2 caches is their handling of collisions. A collision occurs when a processor and another computing device (which could be another processor) try to access the same resource. This resource, typically, is an area of main memory. Since a L2 cache often resides between the processor and the system bus, the L2 cache is called upon to arbitrate between the processor's and other computing unit's competing requests to access the resource.
Typically, when an L2 cache controller detects a collision, it will send a RETRY signal to its processor. This RETRY signal will cause the processor to abort its attempted access of the shared resource, and will cause the processor to retry its access later. However, there are collision situations where an L2 cache can avoid sending a RETRY to the processor by simply delaying the processor's access of the shared resource for a short time period. This delay is often a much shorter period of time than the period of time it takes for the processor to retry an access.
Therefore, it would be desirable in a data processing system containing multiple computing units to have an L2 cache operable in a first mode of operation where a cache line is in a modified and inclusive state, and a second mode of operation where a cache line is in an invalid and inclusive state. In the first mode of operation, the L2 cache would, upon snooping a request, check an L1 cache to see if it had valid data. In this first mode of operation, if the L1 cache returns valid data to the L2 cache, the L2 cache writes this data to memory. If the L1 cache does not return data to the L2 cache, the L2 cache would write its copy of the data to memory.
In the second mode of operation, the L2 cache again queries the L1 cache for data. If the L1 cache does not return valid data to the L2 cache, the L2 cache does not write its copy of the data to memory. Instead, the L2 cache then knows that valid data exists in memory.
It would also be advantageous to have an L2 cache which implemented an efficient pipelined algorithm for flushing the L2 cache and for back-invalidating the L1 cache.
Also, an L2 cache which uses a priority queue to write directory entries to the directory array would be advantageous.
A final desirable goal is to provide an L2 cache which does not automatically send a RETRY signal to its processor in the event a collision is detected. Such an L2 cache should evaluate the situation and send a RETRY signal only when necessary.