1. Technical Field
The present invention relates generally to data processing and more specifically to cache access mechanisms in data processing systems. Still more particularly, the present invention relates to cache access mechanisms that reduce contention in a multi-sectored cache.
2. Description of the Related Art
A conventional multiprocessor data processing system may comprise a system bus to which a system memory and a number of processing units that may each include a processor and one or more levels of cache memory are coupled. Caches are temporary storage facilities utilized to store subsets of the overall memory of a data processing system at varying latencies. At the various levels of a cache hierarchy, a tradeoff is made between the size and the access latency of the cache at the given hierarchy level. The cache most directly coupled to a processing unit, typically referred to as an “L1” cache, usually has the lowest latency but at the cost of also being the smallest of the various caches. Likewise, the cache at the lowest level of the hierarchy usually has a larger storage capacity, often one or two orders of magnitude larger that the L1 cache, but at a higher access latency.
It is often the case, though not required, that the cache at a lower level of the cache hierarchy contains a copy of all the data contained in the caches at higher levels of the cache hierarchy. This property is known as “inclusion” and necessarily leads to the condition that a cache at a lower level of the cache hierarchy be at least as large as the cache at the next higher level of the hierarchy in order to allow the lower level cache to include the contents of memory cached at the next higher level. Those skilled in the art are familiar with the notion of constructing a multi-level cache hierarchy that optimizes the access latency and size characteristics of the various cache hierarchy levels according to available implementation technologies, leading to optimal system performance.
A cache, at a given level of hierarchy, is typically comprised of a number of components often including a cache directory array, a cache data array, and those functional logic units necessary to update and manage the cache. The cache data array portion of a cache is a set of data storage elements utilized to store copies of portions of main memory. The cache data array is divided into a series of so called “cache blocks”. These cache blocks are storage regions utilized to hold copies of contiguous portions of the main memory within the data processing system. These blocks are typically on the order of 128 bytes in size and are further arranged into groups, known as “sets”, of usually 8 to 16 blocks. The overall cache data array consists of a number of these sets. When placing a portion of memory within the cache, some number of the bits of the address of the block of memory are typically utilized to index into the various cache sets to determine a set within which to place the block of memory. That is to say, each contiguous aligned portion of main memory within the data processing system maps to a particular set. Within the cache set, various allocation policies are utilized to pick which member among the members within the set to place the block. In summary, the cache data array is divided into multiple cache sets which contain multiple cache blocks. Any given block in memory is typically allocated to some selected block within a particular set chosen by a mapping function of some of the address bits corresponding to the address of the block in main memory.
The cache further typically includes a cache directory array. This array consists of bookkeeping information detailing which portions of the overall data processing system memory and their processing states that are currently present within the cache. Typically, each block within the cache data array also has a corresponding entry within the cache directory array detailing which portion of main memory and its processing state is present in that cache data block. Each directory entry usually consists of a number of fields possibly including a TAG field, a STATE field, an LRU field, and an INCLUSION field. Further, an ECC field that provides error correction and detection against bit errors in any of the fields in the directory entry is also typically provided.
The TAG field within the directory entry corresponds to those high order address bits necessary to determine which block within the main memory is present within the cache data array entry associated with this directory entry. The TAG field typically represents the majority of the bits within a cache directory entry. The STATE field typically indicates the processing state of the cache line. For example, this field is often used to maintain the cache coherence state of the cache block according to some cache coherence protocol such as the well known “MESI” protocol. The LRU typically contains information about recent accesses to the cache line and is used to guide the cache block replacement policy when cache blocks of new addresses are allocated within the cache set. Finally, the inclusion field often indicates whether or not the current cache block is present in a higher level cache. Those skilled in the art will appreciate that the format and contents of the directory entry discussed here is but one representative format possible.
In order to allow for larger lower level caches without dramatically adding to cache directory array overhead, a technique known as “sectoring” is often employed. In sectoring, the cache blocks in a lower level cache often consist of a number of different “sectors”. That is to say, in the lower level cache, the cache blocks as described above are further divided into two or more like-sized sub-regions. These sectors are typically equal in size to the cache block size of the cache immediately above the current cache in the cache hierarchy.
Furthermore, each of the sectors can typically be manipulated and managed individually. For example, one sector of a cache block could be present in the lower level cache and the other sector could be not present. To support independent processing of the various sectors, the directory entry is usually reformatted to include STATE fields for each individual sector. Importantly, the single TAG field within the cache directory entry, which dominates the size of the cache directory entry, now corresponds to a larger cache block. In other words, a similar number of directory entries with additional STATE fields per sector can support a larger cache in the same cache directory area than would be possible with a non-sectored implementation that would require an additional TAG field for each sector.
Finally, the cache also contains functional logic queues that consist of the functional logic necessary to update the cache, provide data to higher level caches or the processing unit(s), and honor snooped requests from either the system interconnect or lower level caches. These functional queues are typically divided into two classes of queues: Read Queues and Snoop queues, which process requests from higher level caches or the processing unit(s) or from the system interconnect or lower level caches, respectively. As part of their function, these queues are responsible for updating the cache data and directory arrays. In a sectored cache, a conflict often arises when multiple queues wish to operate on the independent sectors within a cache block. Since multiple state fields are present, uncontrolled writes to the directory entry by differing queues operating on the cache block sectors can lead to corrupted state fields.
In order to prevent such contention, data processing systems have often required that only one queue be active within a given cache block. While this constraint helps ensure the correctness of updates to the various state fields within the directory entry, this technique can lead to lower system performance and/or error conditions. If a system prevents more than one queue from operating simultaneously within a cache block, an opportunity for overlapped processing is lost, increasing the overall system latency and decreasing performance. Also, for snoop queues, allowing only one active queue per directory entry leads to many operations having to be retried again at a later time. These retried operations can lead to system livelocks and deadlocks that should be avoided where possible.
The present invention recognizes that the conventional queue processing of sectored cache directories is subject to a number of inefficiencies and can potentially lead to incorrect or degraded system operation. Therefore, the present invention provides a means by which multiple queues may coordinate overlapping updates to a common directory entry within a sectored cache directory array. These and other benefits are provided by the invention described herein.