1. Technical Field
The present invention relates generally to data processing systems and more specifically to cache mechanisms within data processing systems. Still more particularly, the present invention relates to cache sector allocation within cache slices of a multi-sectored cache.
2. Description of the Related Art
A conventional multiprocessor data processing system may comprise a number of processing units, a system memory, and one or more levels of cache memory coupled between the processing units and the memory. Caches are temporary storage facilities utilized to store subsets of the overall memory of a data processing system at varying latencies. The various caches are configured in a cache hierarchy, defined as levels, relative to the processing units. At the various levels of the cache hierarchy, a tradeoff is made between the size and the access latency of the cache. Those skilled in the art are familiar with the notion of a multi-level cache hierarchy that optimizes the access latency and size characteristics of the various cache hierarchy levels according to available implementation technologies, leading to optimal system performance.
A cache, at a given level of hierarchy, typically comprises a number of components, including a cache directory array, a cache data array, and functional logic units necessary to update and manage the cache. The cache data array portion of a cache is a set of data storage elements utilized to store copies of portions of main memory. The cache data array is divided into a series of so called “cache blocks”. These cache blocks are storage regions utilized to hold copies of contiguous portions of the main memory within the data processing system. These blocks are typically on the order of 128 bytes in size and are a size that is a power of two.
In the following description, a cache block size of 128 bytes will be assumed. Those familiar with the art will be able to apply the invention to data processing systems with other cache block sizes. Further, portions of memory that are copied into cache blocks are also aligned. In other words, the starting address of a contiguous portion of memory that is mapped into a cache block is an integer multiple of the cache block size.
Typically, the data array portion of a cache is organized as an M×N matrix of cache blocks. Each row of the matrix is referred to as a “congruence class” and the number of congruence classes is typically a power of two. Within a given congruence class, N blocks are provided to hold copies of contiguous portions of main memory. Caches with N blocks in a congruence class are referred to as N-way set associative caches.
Each location in main memory is mapped, by cache blocks, to reside within a particular congruence class within a cache. The low order bits of the main memory address (seven bits for a 128 byte cache line) indicate which byte within a cache line is being accessed and do not affect the mapping of the cache block to a congruence class. The next most significant log2(M) bits of the address are known as the “congruence class address”. These address bits are used to index into the M rows of the cache. A cache block sized and aligned portion of memory may reside in any of the N blocks (entries) within the addressed congruence class. The remaining high order bits within the address are called the “tag” and are used to distinguish between the different blocks of main memory that may be allocated within a congruence class.
With reference now to FIG. 1A, there is shown a depiction of how the bits constituting a main memory address are interpreted to determine where a main memory location may be mapped within a cache for a system with a 64 bit address and a cache with 4096 congruence classes of 128 byte cache lines. The low order seven bits (bits 57 to 63) in field 103 indicate a byte within the cache line corresponding to this address. Since this field addresses bytes within a cache line, it is ignored when determining where the cache block may reside within the cache.
The next twelve bits (bits 45 to 56) in congruence class address field 102 indicate the congruence class within the cache this memory address maps to. The cache block containing this address may reside in any of the N blocks within the addressed congruence class. Finally, the remaining bits of the address (bits 0 to 44) in field 101 are referred to as the “tag” of the memory block.
In order to record which portions of main memory are present in a cache, a cache includes an M×N entry cache directory array. Each entry within this cache directory array corresponds directly to one entry in the M×N cache data array and indicates which portion of main memory is mapped to the corresponding entry of the cache data array and the state of the cache line at that entry.
With reference now to FIG. 1B, there is shown a depiction of a cache directory entry. The tag field 104 consists of the tag portion of the address of the block of main memory that is mapped to this entry within the cache. State field 105 contains the state of the cache block mapped to this entry. In the depicted embodiment, four bits are used to provide for up to 16 possible cache states. One of these states indicates that the line is “invalid”. In the presence of an invalid state, the value within the tag field for this directory entry is ignored because this entry in the cache is not active (this qualification is necessary because some value is always present in the tag field irrespective of whether the corresponding portion of memory has actually been populated within the cache entry).
To determine if a particular address is present within a cache, the tag portion of that address is compared to the N tag entries (tag field 104) within the congruence class associated with that address, ignoring those entries that are marked as invalid by state field 105. If a valid matching entry is found, the line is present in the cache. When a portion of main memory is installed within a cache block, the directory entry for the block is updated to indicate a non-invalid state and the tag portion of the memory block address is placed within tag field 104. When a block is de-allocated from the cache, state field 105 is set to invalid and the cache data (if necessary for coherency reasons) may be written back to main memory or another cache.
With reference now to FIG. 1C, there is shown a depiction of a cache according to the above description. Cache 110 consists of cache data array 120, cache directory array 130, and cache control logic 126b. Cache data array 120 consists of congruence classes 122 consisting of cache members 124. Cache directory array 130 is organized similarly to cache data array 120 as described above.
The foregoing has described a single cache structure that can map the entirety of main memory using a single cache structure consisting of a cache directory array, a cache data array, and a set of control logic to manage updates to the cache. However, in order to increase parallelism, a cache is often “sliced”. In a sliced cache, each slice contains a cache data array, cache directory array, and control logic. Typically, in a sliced cache with S slices, each slice is responsible for 1/S of the overall memory. The number of slices is often a power of two and this will be assumed in what follows.
With reference now to FIG. 2A, there is shown a depiction of a sliced cache 210 consisting of two slices 212a and 212b. A per-slice cache data array 222a or 222b is used to hold those regions of memory mapped to the given cache slice. A per-slice cache directory 230a or 230b, is used to track the portions of memory mapped within each cache slice. Finally, per-slice control logic 226a and 226b manage outstanding coherence operations for the given cache slice. By having more than one cache slice, a larger number of outstanding operations can be accommodated than would be possible within a monolithic cache structure like that of FIG. 1C.
Additional addressing means are typically provided to efficiently manage sliced caches such as that show in FIG. 2A by apportioning the overall system memory space among the cache slices. In particular half of the overall system memory space is cached by each of the slices in cache 210. With reference now to FIG. 2B, there is shown a depiction of how the bits constituting a main memory address are interpreted to determine where a main memory location may be mapped within cache 210 for a system with a 64 bit address and a cache with 4096 congruence classes of 128 byte cache lines. The low order seven bits (bits 57 to 63) in field 203 indicate a byte within the cache line corresponding to this address. Since this field addresses bytes within a cache line, it is ignored when determining where the cache block may reside within the cache.
The next field, SS field 214, is the slice selector field. This field is used to determine which slice to allocate a given cache block memory address. If the SS field has a value of ‘0’, the cache block memory address is allocated to slice 212a. Likewise, if the SS field has a value of ‘1’, the cache block memory address is allocated to slice 212b. This mapping based on the SS field has the effect of causing cache block addresses ending with a hexadecimal value of ‘00’ to be mapped to slice 212a and those cache block addresses ending with a hexadecimal value of ‘80’ to be mapped to slice 212b. For a cache with more than two slices, additional bits would be included in the SS filed (two bits in the case of 4 slices) and would map the system memory into distinct subsets that are mapped to each slice (if a non-power of two number of slices is used, a hashing function is typically employed among several bits to select the slice to map a given cache block address to). For a given cache slice, congruence class address field 202 and tag field 201 serve the same functions as congruence class field 102 and tag field 101, as described above.
In the caching structures described above, a single directory entry is utilized for each entry within the cache data array. Cache directory arrays require a significant portion of circuit resources and it is advantageous to reduce this resource requirement where possible. To this end, so called “sectored caches” are often utilized. In a sectored cache, a single directory entry is employed to track the state of greater than one contiguous cache line sized block of memory. In a two-sector cache, a single directory entry will track the state of two contiguous blocks of system memory.
With reference now to FIG. 3A, there is shown a depiction of a directory entry for a two sector cache. The directory entry consists of a tag field 304 and two sector state fields 305a and 305b corresponding to the coherence state of each of the cache lines associated with the directory entry.
With reference now to FIG. 3B, there is shown a depiction of how the bits constituting a main memory address are interpreted to determine where a main memory location may be mapped within a cache for a system with a 64 bit address and a cache with 4096 congruence classes of 128 byte cache lines utilizing a two sectored cache. The low order seven bits (bits 57 to 63) in field 303 indicate a byte within the cache line corresponding to this address. Since this field addresses bytes within a cache line, it is ignored when determining where the cache block may reside within the cache.
The next bit in the address, T field 308, is used to select between sectors mapped to a given directory entry. By utilizing this low order bit, contiguous cache block regions of system memory are allocated to a given directory entry. In this case address that end with hexadecimal address 00 and 80 are mapped into sectors of a given directory entry.
The next twelve bits (bits 44 to 55) in congruence class address field 302 indicate the congruence class within the cache this memory address maps to. The cache block containing this address may reside in any of the N pairs of cache blocks within the addressed congruence class. Finally, the remaining bits of the address (bits 0 to 43) in tag field 301 are referred to as the “tag” of the memory block and identify the unique contiguous cache block pair. Tag field 301 contains one less bit than tag field 101 of FIG. 1A because directory entries in a two sector cache serve to map a two cache block sized region of system memory, instead of a single cache block entry.
A sectored cache provides a means of reducing the size of the directory but at a cost in efficiency of the cache. If both sectors of the various directory entries are not concurrently in use, the effective amount of memory that can be contained in the cache is reduced. In a pathological case where the access pattern touches only every other block in main memory, the effective amount of memory that can be cached is cut in half. In practice, the efficiency losses due to sectoring a cache are considerably less than this. It is often the case that a sectored cache of roughly equivalent area to a non-sectored cache will perform better (especially for larger lower-level caches).