1. Technical Field of the Invention
This invention generally relates to caches for computer systems, such as set associative caches and direct-mapped caches, and more particularly to providing data coherency in systems including processors having two levels of cache and those having a single level cache.
2. Background Art
Microprocessor development projects are costly in terms of time and money. As such, there is a strong desire to make each of these design efforts versatile enough to cover the entire planned product range. The problem stems from the fact that the entry level machines must be inexpensive in deference to performance, while the upper end of the product range must favor performance.
A common way to achieve this involves making the microprocessor core "programmable." This translates into a need to make it work with different external environments. To be specific, it is designed to accommodate multiple levels of cache buffer, with the first built into the microprocessor core, and the second external to the chip. This external second level cache can be quite expensive and unnecessary for the entry level models. However, it is absolutely necessary to the high-end which requires that multiple processors with their external second level caches are coupled together and cooperatively work on programs and data.
A cache is a high speed buffer which holds recently used memory data. Due to the locality of references nature for programs, most of the access of data may be accomplished in a cache, in which case slower accessing to bulk memory can be avoided.
A typical shared memory multiprocessor system implements a coherency mechanism for its memory subsystem. This memory subsystem contains one or more levels of cache memory associated with a local processor. These processor/cache subsystems share a bus connection to main memory. A snooping protocol is adopted where certain accesses to memory require that processor caches in the system be searched for the most recent (modified) version of requested data.
In accordance with an exemplary high-end system, a two level cache subsystem with level 2 (L2) cache line size some power of 2 larger than level 1 (L1) cache line size is implemented. Both caches implement writeback policies, and L1 is set-associative. L1 is subdivided into sublines which track which portions of the cache line contain modified data. Snoop requests from the bus are received at L2 and, if appropriate, the request is also forwarded on to L1. The snoop request forwarded to L1, however, requires accessing the L1 directory for all of the consecutive L1 cache entries which may contain data associated with the L2 cache line.
Two fundamental ways to manage the contents of multiple levels of caches exist:
1. Allow unique: the first level and second level caches are allowed to have unique data.
2. Force inclusion: the first level of cache is required to be a subset of the second level of cache.
In a coherent shared memory multiprocessing environment, each time a processor issues a request for memory data, the other processors' caches may need to be searched for copies of that data, depending upon the type of request. Also, in a system with a single processor, memory coherency needs to be maintained with other devices that access memory, such as I/O processors.
Consider a first exemplary system including a plurality of high-level processors--such that private first and second level caches exist for each processor in the multiprocessing system. Assume, also that a snooping bus protocol is used for the multiprocessor memory hierarchy and that cache blocks may exist in cache in the modified, exclusive, shared, or invalid state (MESI protocol).
1. ALLOW UNIQUE: In a first exemplary system operating in an "allow unique" environment where the first level and second level caches are allowed to have unique data, inasmuch as data may exist in either one or both levels of cache within the individual processors, maintaining cache coherency requires that both the first and second level caches be searched each time an alternate processor request shows up on the snooping bus. This causes interference at the first level cache which may adversely affect performance of the executing instruction stream within the processor. Conversely, it requires an added port on the first level cache to allow interrogation of the cache for snooping requests. In any case, one is faced with pulling data from one or both levels of cache in order to supply it to requests from other processors. Additionally, coordination of the snoop responses between the two levels of cache is required within each processor.
Consider a second exemplary system including one or more entry-level processors--such that private first level caches exist for all processors in the multiprocessing system, and second level caches do not exist for all processors in the system. In an "allow unique" environment the first level cache already has the mechanisms in place to allow snoop requests to be handled.
2. FORCE INCLUSION: In a forced inclusion environment, the contents of the first level cache must always be a subset of the second level cache within an individual processor. This allows a system to be built where the second level cache can control the snooping of external bus requests without interference in the first level cache unless data actually exist there that are required. This is facilitated by a first level cache status array maintained at the second level cache controller which indicates presence, and possibly state, of the first level cache data. Note that presence is minimally required as the state of the first level data must be less than or equal to that of the encompassing second level cache block. Also, in this exemplary system, the L2 cache controller has primary responsibility for managing cache coherency. It also forms requests for data and other commands based on processor requests which are forwarded to the system bus.
In a forced inclusion environment, for entry-level microprocessor where the second level cache is not included, the ability to control cache coherency is more difficult because the mechanisms for this control reside in the L2 cache controller.
It is an object of the invention to achieve a cooperative work environment in multiprocessor configurations including either low-end or high-end microprocessors. It is a further object of the invention to maintain data coherency in such a system.
It is a further object of the invention to provide a microprocessor design which is adaptable to either the low-end or high-end configuration.