1. Technical Field
The present invention relates generally to data processing and more specifically to cache access mechanisms in data processing systems.
2. Description of the Related Art
A conventional multiprocessor data processing system may comprise a system bus to which a system memory and a number of processing units that may each include a processor and one or more levels of cache memory are coupled. Caches are temporary storage facilities utilized to store subsets of the overall memory of a data processing system at varying latencies. At the various levels of a cache hierarchy, a tradeoff is made between the size and the access latency of the cache at the given hierarchy level. The cache most directly coupled to a processing unit, typically referred to as the level one or “L1” cache, usually has the lowest latency but is the smallest of the various caches. Likewise, the cache at the lowest level of the hierarchy usually has a larger storage capacity, often one or two orders of magnitude larger that the L1 cache, but at a higher access latency.
It is often the case, though not required, that the cache at a lower level of the cache hierarchy contains a copy of all the data contained in the caches at higher levels of the cache hierarchy. This property is known as “inclusion” and necessarily leads to the condition that a cache at a lower level of the cache hierarchy be at least as large as the cache at the next higher level of the hierarchy in order to allow the lower level cache to include the contents of memory cached at the next higher level. Those skilled in the art are familiar with the notion of constructing a multi-level cache hierarchy that optimizes the access latency and size characteristics of the various cache hierarchy levels according to available implementation technologies, leading to optimal system performance.
A cache, at a given level of hierarchy, is typically comprised of a number of components often including a cache directory array, a cache data array, and those functional logic units necessary to update and manage the cache. The data array portion of a cache is a set of data storage elements utilized to store copies of portions of main memory. The data array is divided into a series of so called “cache blocks”. These cache blocks are storage regions utilized to hold copies of contiguous portions of the main memory within the data processing system. These blocks are typically on the order of 128 bytes in size and are further arranged into groups, known as “sets”, of usually 8 to 16 blocks. The overall data array contains of a number of these sets. When placing a portion of memory within the cache, some number of the bits of the address of the block of memory are typically utilized to index into the various cache sets to determine a set within which to place the block of memory. That is to say, each contiguous aligned portion of main memory within the data processing system maps to a particular set. Within the cache set, various allocation policies are utilized to pick which member among the members within the set to place the block. In summary, the data array is divided into multiple cache sets which contain multiple cache blocks. Any given block in memory is typically allocated to some selected block within a particular set chosen by a mapping function of some of the address bits corresponding to the address of the block in main memory.
The cache further typically includes a cache directory array. This array consists of bookkeeping information detailing which portions of the overall data processing system memory and their processing states that are currently present within the cache. Typically, each block within the cache data array also has a corresponding entry within the cache directory array detailing which portion of main memory and its processing state is present in that cache data block. Each directory entry usually includes a number of fields possibly including a TAG field, a STATE field, an LRU field, an INCLUSION field, and an ECC field, which provides error correction and detection.
The TAG field within the directory entry corresponds to those high order address bits necessary to determine which block within the main memory is present within the cache data array entry associated with this directory entry. The TAG field typically represents the majority of the bits within a cache directory entry. The STATE field typically indicates the processing state of the cache line. For example, this field is often used to maintain the cache coherence state of the cache block according to some cache coherence protocol such as the well known “MESI” protocol. The LRU field typically contains information about recent accesses to the cache line and is used to guide the cache block replacement policy when cache blocks of new addresses are allocated within the cache set. Finally, the inclusion field often indicates whether or not the current cache block is present in a higher level cache. Those skilled in the art will appreciate that the format and contents of the directory entry discussed here is but one representative format possible.
In order to allow for larger lower level caches without dramatically adding to cache directory array overhead, a technique known as “sectoring” is often employed. In sectoring, the cache blocks in a lower level cache often consist of a number of different “sectors”. That is to say, in the lower level cache, the cache blocks as described above are further divided into two or more like-sized sub-regions. These sectors are typically equal in size to the cache block size of the cache immediately above the current cache in the cache hierarchy.
Furthermore, each of the sectors can typically be manipulated and managed individually. For example, one sector of a cache block could be present in the lower level cache and the other sector could be not present. To support independent processing of the various sectors, the directory entry is usually formatted to include STATE fields for each individual sector. Importantly, the single TAG field within the cache directory entry, which dominates the size of the cache directory entry, now corresponds to a larger cache block. In other words, a similar number of directory entries with additional STATE fields per sector can support a larger cache in the same cache directory area than would be possible with a non-sectored implementation that would require an additional TAG field for each sector.
Finally, the cache also contains functional logic queues that consist of the functional logic necessary to update the cache, provide data to higher level caches or the processing unit(s), and honor snooped requests from either the system interconnect or lower level caches. These functional queues are typically divided into two classes of queues: Read Queues and Snoop queues, which process requests from higher level caches or the processing unit(s) or from the system interconnect or lower level caches, respectively. As part of their function, these queues are responsible for updating the cache data and directory arrays.
The methods used today to optimize cache behavior include alignment and cache-line padding. Large pages can also be used to provide a uniform distribution in the cache. Each of these three approaches presents frustrating problems. Alignment in the cache, while providing object separation (e.g., two blocks separated on two cache lines to avoid conflicts), provides poor utilization of an available cache resource through large amounts of unused space. Similar issues exist with cache-line padding. Large pages provide better distribution, because real addresses within the large page sequentially map into congruence class sets. However, multiple large pages cause conflicts in the cache when large page mappings become identical. In addition, any application's access pattern may not be totally ideally suited to large pages (e.g., an application may benefit from interleaving objects within the cache).