This invention relates generally to computer systems and more specifically to the displacement of data elements from cache subsystems in computer systems.
As it is known in the art, a multiprocessing computer system includes a plurality of central processing units (CPUs), a main memory and system control logic. Each CPU typically includes a cache for storing data elements that are accessed most frequently by the CPU. Each CPU may also include victim buffers for temporarily storing data which is displaced from its cache. The system control logic provides a communication interconnect for data and commands sent between the CPUs and between the CPUs and main memory. The system control logic often includes a duplicate tag store and an arbitration circuit. The arbitration circuit produces a serial stream of command references which is applied to all CPUs. The duplicate tag store holds status information pertaining to data stored in the caches coupled to each of the CPUs. The duplicate tag store is coupled with the arbitration logic so that it may operate on the serial stream of commands. It is therefore implemented remote from the CPUs.
Each CPU may issue a variety of commands to the system control logic dependent upon the current cache status of a given data element and the operation the CPU needs to perform on that data element. If a CPU needs to access a copy of a data element that is not already in its cache, it issues a "read-miss" command to the system control logic. That command will retrieve an up-to-date copy of the requested data element and store it in the CPU's cache. The associated status information will indicate that the data is in an unmodified state. If the CPU needs to modify a data element that is not already in its cache, it issues a "read-miss-modify" command to the system control logic. That command will retrieve an up-to-date copy of the requested data element and store it in the CPU's cache. The associated status information for this data block will indicate that the data is in an exclusive, modified state. If the CPU needs to modify a data element that is already in its cache but in a nonexclusive or unmodified state, it issues a "change-to-dirty" command to the system control logic. This will change the state of the data element to the exclusive, modified state by invalidating each copy of the data stored in other CPU's caches.
When a CPU issues a "read miss" or "read miss modify" command to the system control logic, the requested data element may displace a previously cached data element from the CPU's cache. This displaced element is referred to as a "victim". If the victim is in a modified state, then it is considered a "most up to date" version of the data element in the computer system. More particularly, if a victim is in the exclusive, modified state then it is the only up to date copy of the data element in the computer system. Therefore, to maintain proper system operation, all modified victims must be written back to main memory. A modified victim, exclusive or non-exclusive, is referred to as a "dirty-victim".
When a CPU issues a "read-miss" or "read-miss-modify" command that displaces a dirty-victim, the CPU places a copy of the dirty-victim data into a victim buffer and issues both a read-miss or read-miss-modify command and a victim command to the system control logic together. A read-miss command and its associated victim command are referred to as a readmiss/victim command pair. A read-miss-modify command and its associated victim command are referred to as a readmissmod/victim command pair.
The system control logic receives commands from a plurality of CPUs. The system control logic includes an arbitration circuit through which these commands arbitrate for access to the system's duplicate tag store and main memory resources. The output stage of this arbitration circuit, referred to as the "system serialization point", produces a serial stream of CPU commands which are issued directly to the duplicate tag and the main memory. For each command in this serial stream, the system control logic performs a duplicate tag store lookup operation. This operation returns the cache status for each CPU, for the specific data element referenced by the command. Specifically, this lookup operation will return status information indicating which CPUs have copies of the referenced data element and which copies are the most up-to-date version of the data element. Therefore, if memory does not have the most up to date version of the data in the system, the duplicate tag store will indicate which CPU does. When the system is processing a read-miss command, a read-miss-modify command, a readmiss/victim command pair or readmissmod/victim command pair, it uses this information to determine whether to fetch data from main memory or another CPU. If it must fetch data from another CPU, it does so by issuing a message referred to as a "forwarded-read" to that other CPU. When the system is processing a read-miss-modify or change-to-dirty command, it uses the duplicate tag store information to determine which CPUs need to be issued messages to remove any copies of the referenced data element that are about to become invalid. These messages are referred to as "invalidates". Forwarded-read messages (probe read message) and invalidate messages (probe invalidate messages) are together referred to as "probe" messages. Probe messages are issued to their target CPUs through a set of "probe queues" in the system control logic. Each CPU is associated with one probe queue from that set of probe queues.
The system control logic also executes a duplicate tag store update operation for each command in the serial stream. For each command, the update operation will modify the state of the duplicate tag entries for both the CPU that issued the command and any CPUs to which this command caused probe messages to be issued. When the system control logic is processing read-miss or read-miss-modify commands, it updates the duplicate tag store state of the issuing CPU to indicate that the referenced block is now an element of the issuing CPU's cache. When the system control logic is processing a readmiss/victim command pair, it updates the state of the issuing CPU to indicate that the referenced block is now an element of the issuing CPU's cache, and also to indicate that the victim block is no longer a member of the issuing CPUs cache.
When the arbitration circuit of the system control logic issues a command to the duplicate tag store, it simultaneously issues the same command to the main memory of the computer system. If the command is a read-miss command, a read-miss-modify command, a readmiss/victim command pair or a readmissmod/victim command pair, and the duplicate tag indicates that the most up-to-date copy of the data element is in memory, then the system control logic will return a copy of the data element from main memory to the requesting CPU via a "fill" message. Similarly, if the duplicate tag indicates that the most up-to-date copy of the data element is in another CPU's cache, then the system control logic will return a copy of the data element from the other CPU to the requesting CPU via a "fill" message. Fill messages are returned to their issuing CPUs via a set of "fill queues" in the system control logic. Each CPU is associated with one fill queue from that set.
The fill queue and the probe queue associated with a given CPU operate independent from each other and are processed at different rates by the associated CPU. As such, it is possible for a probe message in a CPUs probe queue to be processed by the CPU after a fill message from the CPU's fill queue that was generated by a command that issued to the system serialization point later than the command that generated the probe. The processing of this fill message before the probe message is referred to as "bypassing".
In such a computer system it is possible that a first CPU issues a readmiss/victim command pair to the system control logic that victimizes a given data element, concurrently with a command from a second CPU that references that same data element. If the command from the second CPU is issued to the system serialization point before the readmiss/victim command pair from the first CPU, then the command from the second CPU may cause a probe message, targeting the victim data element, to be placed on the probe queue of the first CPU. In order for both the victim command from the first CPU and the probe from the second CPU to be serviced correctly, the victim data buffer associated with the readmiss/victim command pair from the first CPU must retain a copy of the data element until certain conditions occur. Accordingly the victim data buffer is retained until the victim data has been written back to memory and until all probes, that target the victim data element, that are stored in the first CPU's probe queue when the readmiss/victim command pair is issued to the system serialization point, have been able to retrieve a copy of the data to return to the requesting CPUs. The system control logic can determine when all relevant probes have been retired by the use of numerous methods known in the art, referred to as "probe searching mechanisms". The multiplicity of probe searching mechanism embodiments includes elaborate comparator structures like those employed in Digital Equipment Corporations AlphaServer 8000 series of computers.
If, on the other hand, both a first and second CPU have unmodified, nonexclusive copies of a first data element in their caches, a problem can arise. Consider that the first CPU issues a change-to-dirty command targeting the first data element. Issuance of the change-to-dirty command will cause the system control logic to place a probe invalidate message on the second CPU's probe queue. If the second CPU issues a read-miss command that targets a second data element (which displaces the first data element) that is issued to the system serialization point after the change-to-dirty command from the first CPU, this will cause a fill message for the second data element to be placed on the second CPU's fill queue. A copy of the displaced data will not be retained since the data need not be written back to memory. The fill message on the second CPU's fill queue may bypass the probe invalidate in the second CPU's probe queue. In such a case, since there is no victim data buffer prohibiting the issuance of further references to either data element, the second CPU may issue a read-miss-modify command that re-fetches the first data element and displaces the second data element. That read-miss-modify command must be issued to the system serialization point subsequent to the change-to-dirty from the first CPU. It will generate a second fill message targeting the first data element on the second CPU's fill queue. This second fill message may also bypass the probe invalidate message on the second CPU's probe queue, creating an exclusive, modified copy of the first data element in the second CPU's cache. If this copy of the first data element is not displaced from the second CPU's cache before the probe invalidate in the second CPU's probe queue is processed by the second CPU, then the invalidate will erroneously invalidate the only up-to-date copy of the first data element. The error of erroneously invalidating a data element in the manner described is referred to as a "double-wrap-around invalidation error".
Prior art systems have typically eliminated double-wrap-around invalidation by simply combining a given processor's probe and fill queues. In the example above, this would prevent either fill message issued to the second CPU from bypassing the invalidate message. Because, however, this solution requires all fill messages to wait for the completion of the typically slower moving probe messages that precede them in the queue, it results in a lower performing computer system. Another approach has been to implement a set of comparators that would compare the target addresses of fill messages on the fill queue against the target addresses of probes on the probe queue as they are placed on the fill queue. The results of this comparison is used to enforce a temporary ordering between the fill and probe queues. The solution, however, is complex both in terms of logic gates required to implement it and verification effort to ensure its proper operation.