The need to maintain "cache coherence" in multiprocessor systems is well known. Maintaining "cache coherence" means, at a minimum, that whenever data is written into a specified location in a shared address space by one processor, the caches for any other processors which store data for the same address location are either invalidated, or updated with the new data.
There are two primary system architectures used for maintaining cache coherence. One, herein called the cache snoop architecture, requires that each data processor's cache include logic for monitoring a shared address bus and various control lines so as to detect when data in shared memory is being overwritten with new data, determining whether it's data processor's cache contains an entry for the same memory location, and updating its cache contents and/or the corresponding cache tag when data stored in the cache is invalidated by another processor. Thus, in the cache snoop architecture, every data processor is responsible for maintaining its own cache in a state that is consistent with the state of the other caches.
In a second cache coherence architecture, herein called the memory directory architecture, main memory includes a set of status bits for every block of data that indicate which data processors, if any, have the data block stored in cache. The main memory's status bits may store additional information, such as which processor is considered to be the "owner" of the data block if the cache coherence architecture requires storage of such information.
In the present invention, a System Controller maintains cache coherence among the cache memories for a set of data processors by maintaining a set of duplicate cache tags and using those duplicate cache tags to determine which cache memories store data corresponding to each memory transaction performed by each of the data processors. The System Controller determines when the contents of a cache line in a cache memory must be invalidated, and when a cache line in a cache memory is to be source of data for a datum requested by another data processor. The duplicate cache tag architecture removes the bus snooping burden from the data processors without incurring the delays associated with the memory directory architecture.
Referring to FIG. 11, there is shown a simplified block diagram of a standard, prior art cache memory device 400 coupled to a data processor 402. The cache memory includes a cache line array 404 and a cache tag array 406. The cache line array 404 includes 2.sup.CS cache lines 410, where CS is the number of address bits needed to uniquely identify a cache line 410. Each cache line 410 stores one data block of a fixed size, such as 64 bytes. The cache tag array 406 includes 2.sup.CS cache tags 412.
Cache logic 414 controls the operation of the cache memory 400. A comparator circuit 416 compares a portion of the address specified in a cache memory request with the address tag stored in a selected cache tag 412, and also checks the cache state value stored in the selected cache tag to determine whether the address tag is valid.
State update logic 418 is a circuit for updating the address and cache state information stored in the cache tag 412 selected by the address specified in a cache memory request. Cache Line Access/Update Logic 420 is a circuit for reading and writing data from and to the cache line 410 selected by the address specified in a cache memory request.
A "cache miss" is defined as the failure to locate a specified data block in a cache memory. When a cache memory request specifies an address for a data block that is not stored in the cache memory, a cache miss is said to have occurred.
A "cache hit" is defined successfully locating a specified data block in a cache memory. When a cache memory request specified an address for a data block that is stored in the cache memory, the memory request will be handled by the cache memory and a cache hit is said to have occurred.
Since the present invention involves a system and method for speeding cache memory operations by a clock cycle or two when a cache hit occurs, FIG. 11 does not show the circuitry for responding to cache misses by forwarding the memory access request to another memory device, such as a large cache memory or the system's main memory.
A cache memory access operation in a standard prior art cache memory 400 such as the one shown in FIG. 11 begins as follows, regardless of whether it is a read or a write operation. An address value and a command signal are presented to the cache memory 400 by a data processor 402 or other device. The address value contains a sequence of address bits which for purposes of this explanation are divided into three groups, {A}, {B} and {C}. The full address value presented to the cache memory is written {A B C} where {A} represents the most significant address bits, {B} represents a set of middle address bits and {C} represents a set of least significant address bits.
For example, consider a system using a 41 bit address value PA&lt;40:0&gt;, where
stands for "physical address," and where each unique address value identifies a distinct byte in the system's main memory. If that system includes a data processor having a cache memory 400 with a 64 byte cache line size, and a 512K byte cache memory size, {A} represents the 21 most significant address bits PA&lt;40:19&gt;, {B} represents the 13 middle address bits PA&lt;18:6&gt; and {C} represents the 6 least significant address bits PA&lt;5:0&gt;.
The cache memory 400 in this example has 2.sup.13 cache lines 410 and 2.sup.13 corresponding cache tags 412. Each cache tag 412 stores a cache line state value, and an address tag. The cache line state value indicates whether the corresponding cache line stores valid data and whether the data block stored in the corresponding cache line 410 is exclusively stored by this cache memory or is shared with at least one other cache memory. The cache state value also indicates whether the corresponding cache line stores modified data that will need to be written back to main memory when the data block stored in the cache line 410 is displaced from the cache memory. The address tag in the cache tag 412 stores the {A} address bits PA&lt;40:19&gt; that uniquely identify, along with the cache tag's position in the cache tag array 406 the data block's home base location in main memory.
In a normal cache memory, each cache memory operation begins with the {B} address bits being used to select and access a corresponding one of 2.sup.B cache tags in the cache tag array, and to also address the corresponding cache line 410 in the cache line array. A comparator circuit compares the address tag in the accessed cache tag with the {A} address bits and also compares the cache line state value with a predefined "invalid" cache state value. If the address tag matches the {A} address bits and cache state value is not invalid, then a cache hit has occurred. A cache hit/miss signal is transmitted to the State Update Logic 418 and the Cache Line Access/Update Logic 420 so that those circuits can process the pending cache memory request accordingly.
For instance, for a read request (also known as a load request), if a cache miss occurs the request cannot be serviced and the request will therefore be forwarded to another, larger memory device, while if a cache hit occurs, the read request will be serviced. For a write request, if a cache hit occurs, the write transaction will be serviced by writing data into the selected cache line 410, and the state value stored in the corresponding cache tag 412 will be updated if necessary to indicate that modified data is stored in the cache line 412.
If a cache miss occurs when servicing a write request, and the corresponding cache tag 412 is invalid, then data can be written into the corresponding cache line 410 and a new state value stored in the cache tag. However, if a cache miss occurs for a write request and the state value stored in the corresponding cache tag 412 indicated that the cache line is both valid and modified, then the data block currently stored in the cache line will be displaced and must be written back to main memory.
The cache tag lookup and compare steps typically take or one two data processor clock cycles, depending on the specific design of the cache logic 414. When the cache access request to cache memory 400 is initiated by data processor 402, the cache tag lookup and compare steps are necessary. Cache memory access by data processor 402 is considerably faster than main memory access, even with the delays caused by the cache tag lookup and compare steps.
However, when a cache memory access request to cache 414 is initiated by another data processor device, there are a number of additional delays introduced, causing the memory access time to be much closer to that of accessing main memory than the access time normally associated with accessing cache memory. The present invention is a dual port cache memory controller that reduces the time for accessing a remotely located cache memory by one or two clock cycles by eliminating the cache tag compare step of the cache memory accessing process whenever the remote accessing device guarantees that the data block addressed by the transaction is currently stored in the cache memory. When the remote accessing device cannot make this guarantee the cache memory controller uses the standard cache memory access methodology. The remote accessing device indicates whether this "guarantee" is being made by setting a special cache access mode flag in each cache memory access request.