1. Technical Field
The present invention generally relates to computer systems, and more specifically to a cache and method for accessing a cache in a computer system. In particular, the present invention allows caches with a number of congruence classes other than a power of two, and provides a way to split congruence classes to provide storage for values for which some class selectors would otherwise select a nonexistent class.
2. Description of the Related Art
The basic structure of a conventional computer system includes one or more processing units connected to various input/output devices for the user interface (such as a display monitor, keyboard and graphical pointing device), a permanent memory device (such as a hard disk, or a floppy diskette) for storing the computer""s operating system and user programs, and a temporary memory device (such as random access memory or RAM) that is used by the processor(s) in carrying out program instructions. The evolution of computer processor architectures has transitioned from the now widely-accepted reduced instruction set computing (RISC) configurations, to so-called superscalar computer architectures, wherein multiple and concurrently operable execution units within the processor are integrated through a plurality of registers and control mechanisms.
The objective of superscalar architecture is to employ parallelism to maximize or substantially increase the number of program instructions (or xe2x80x9cmicro-operationsxe2x80x9d) simultaneously processed by the multiple execution units during each interval of time (processor cycle), while ensuring that the order of instruction execution as defined by the programmer is reflected in the output. For example, the control mechanism must manage dependencies among the data being concurrently processed by the multiple execution units, and the control mechanism must ensure the integrity of data that may be operated on by multiple processes on multiple processors and potentially contained in multiple cache units. It is desirable to satisfy these objectives consistent with the further commercial objectives of increasing processing throughput, minimizing electronic device area and reducing complexity.
Both multiprocessor and uniprocessor systems usually use multi-level cache memories where typically each higher level is smaller and has a shorter access time. The cache accessed by the processor, and typically contained within the processor component of present systems, is typically the smallest cache.
Both data and instructions are cached, and data and instruction cache entries are typically loaded before they are needed by operation of prefetch units and branch prediction units. Called xe2x80x9cstreamsxe2x80x9d, groups of instructions associated with predicted execution paths can be detected and loaded into cache memory before their actual execution. Likewise data patterns can be predicted by stride detection circuitry and loaded before operations requiring the data are executed.
Cache memories are typically organized in a matrix arrangement. One direction in the matrix corresponds to congruence classes and the other, equivalent sets within each congruence class. The congruence class partitioning divides the use of the cache with respect to information type, typically a portion of the address field of a memory location is used to partition the distribution of values from memory across the congruence class sets. In this way, it is only necessary to examine the entries within the directory for a given class in order to determine memory conflicts or whether a particular value at a given address is present or absent from the cache.
Due to the level of complexity of modern processors, there is often a need to adjust the device area during the design process. When area constraints are encountered, removal of a portion of the cache memory can free large device areas, since the data storage and logic associated with a cache can be quite large as compared to other functional blocks within the processor. Also, extra area can be available when a design is complete that is insufficient to double the size of a cache, but is substantial enough to provide, for example, a 50 percent increase in the size of a cache.
Reducing the associativity of a cache can directly scale the size of the cache, but it has a direct impact on performance, as all of the sets will be reduced in depth. Reducing associativity also can complicate the directory logic and storage for example, reduction of an 8-way cache to a 7-way cache creates an invalid value in a 3-bit set selector, and use of that value must be avoided by the logic, or a fault may occur.
Reducing or increasing the number of congruence classes presents a more complex problem. The congruence must cover all of the information classes in the class set. For example if a cache has 8 congruence classes associated with three bits out of an address field, a set cannot be simply removed. The number of congruence classes could be reduced to 4 with each of the congruence classes covering two of the information classes covered by each class in the 8 congruence class design, but that would have a large impact on cache performance. Likewise, adding congruence classes to a cache design (other than increasing the number to the next exponential power of two) is similar to reducing the quantity from the next power of two. Selector values for the congruence classes that were not added would have no storage or directory entries associated with them.
In light of the foregoing, it would be desirable to implement a cache method and a cache architecture in which the number of congruence classes in a cache may be reduced or increased in increments other than a power of two without causing a large reduction in cache access latency.
It is therefore one object of the present invention to provide an improved cache memory for a computer system.
It is another object of the present invention to provide a cache memory for a computer system, wherein the number of congruence classes in said cache can be adjusted to sizes other than powers of two.
It yet another object of the present invention to provide a cache memory for a computer system that has a non-power of two congruence class count and introduces no delay while incorporating logic to enable a non-power-of-two size.
The foregoing objects are achieved in a cache and method for accessing a cache, wherein the cache receives an access request containing a congruence class selector, the cache determines that the congruence class is a class providing storage for values associated with at least one other congruence class selector and the cache selects a subset from a plurality of sets associated with said congruence class in conformance with said congruence class selector. The cache may further select the subset by ignoring a remainder of sets by comparing a decode of the congruence class selector with a fixed value as part of the directory tag entry comparison, causing the tag comparisons for the remainder to be masked. The cache may further incorporate a least-recently-used (LRU) tree that has behavior selected by the determination that the congruence class is a class providing storage for values associated with at least one other congruence class selector.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.