This invention relates generally to computer systems and more specifically to the flow control of subsequent references to data elements in computer systems.
As it is known in the art, a multiprocessing computer system includes a plurality of central processing units (CPUs), a main memory and system control logic. Each CPU typically includes a cache for storing data elements that are accessed most frequently by the CPU. The system control logic provides a communication interconnect for data and commands sent between the CPUs and between the CPUs and main memory. The system control logic often includes a duplicate tag store and an arbitration circuit. The arbitration circuit produces a serial stream of command references which are derived from commands from all CPUs and is applied to the duplicate tag store and main memory. The duplicate tag store holds status information pertaining to data stored in the caches coupled to each of the CPUs. The duplicate tag store is coupled with the arbitration circuit so that it may operate on the serial stream of commands. It is therefore implemented remote from the CPUs.
Each CPU may issue a variety of commands to the system control logic dependent upon the current cache status of a given data element and the operation the CPU needs to perform on that data element. If a CPU needs to access a copy of a data element that is not already in its cache, it issues a "readmiss" command to the system control logic. That command will retrieve an up-to-date copy of the requested data element and store it in the CPU's cache. The associated status information will indicate that the data is in an unmodified state. If the CPU needs to modify a data element that is not already in its cache, it issues a "read-miss-modify" command to the system control logic. That command will retrieve an up-to-date copy of the requested data element and store it in the CPU's cache. The associated status information for this data block will indicate that the data is in an exclusive, modified state. When a data element is in this exclusive modified state, it is considered the "most up to date" copy of the data element in the system.
The system control logic receives commands from a plurality of CPUs. The system control logic includes an arbitration circuit through which these commands arbitrate for access to the system's duplicate tag store and main memory resources. The output stage of this arbitration circuit, referred to as the "system serialization point", produces a serial stream of CPU commands which are issued directly to the duplicate tag and the main memory. For each command in this serial stream, the system control logic performs a duplicate tag store lookup operation. This operation returns the cache status for each CPU, for the specific data element referenced by the command.
Specifically, the lookup operation will return status information indicating which CPUs have copies of the referenced data element and which of these copies is the most up-to-date version of the data element. Therefore, if memory does not have the most up to date version of the data in the system, the duplicate tag store will indicate which CPU does. When the system is processing a readmiss command or a read-miss-modify command it uses this information to determine whether to fetch data from main memory or another CPU. If it must fetch data from another CPU, it does so by issuing a message referred to as a "forwarded-read" or a "probe read" to that other CPU. Probe messages like the forwarded-read are issued to their target CPUs through a set of "probe queues" in the system control logic. Each CPU is associated with one probe queue from that set of probe queues.
The system control logic also executes a duplicate tag store update operation for each command in the serial stream. When the system control logic is processing readmiss or read-miss-modify commands, it updates the duplicate tag store state of the issuing CPU to indicate that the referenced block is now an element of the issuing CPU's cache. In response to a read-miss-modify command the system control logic also updates the duplicate tag to indicate that the copy of the data element in the issuing CPU's cache is the most up-to-date copy of the element in the system.
When the arbitration circuit of the system control logic issues a command to the duplicate tag store, it simultaneously issues the same command to the main memory of the computer system. If the command is a readmiss command or a read-miss-modify command and the duplicate tag indicates that the most up-to-date copy of the data element is in memory, then the system control logic will return a copy of the data element from main memory to the requesting CPU via a "fill" message. Similarly, if the duplicate tag indicates that the most up-to-date copy of the data element is in another CPU's cache, then the system control logic will return a copy of the data element from the other CPU to the requesting CPU, also via a fill message.
Readmiss and read-miss-modify commands that are serviced from data stored in another CPU, require the issuance and servicing of a probe read message. Because of these added operations it can take a longer amount of time to return a fill message for a readmiss or read-miss-modify command serviced from another CPU's cache than it does for a readmiss or read-miss-modify command that is serviced from main memory. All fill messages are returned to their issuing CPUs via a set of "fill queues" in the system control logic. Each CPU is associated with one fill queue from that set.
Since fill messages are returned to the issuing CPU with variable timing and since the fill queue and the probe queue associated with a given CPU operate independent from each other, it is possible for a probe message to reach the top of a CPU's probe queue before a fill message that was generated in response to a command issued to the system serialization point before the command that caused generation of the probe.
In such a computer system it is possible that a first CPU issues a read-miss-modify command to the system control logic, concurrently with a readmiss or read-miss-modify command issued from a second CPU that references that same data element, and wherein the most up-to-date copy of the data element resides in a third CPU's cache. If the read-miss-modify command from the first CPU is issued to the system serialization point before the readmiss command from the second CPU, then the duplicate tag store lookup for the read-miss-modify command from the first CPU will cause a first probe read message to be issued to the third CPU. The system control logic then updates the duplicate tag store to indicate that the first CPU now has the most up-to-date copy of the data in the system. When the arbitration circuit in the system control logic issues the second CPU's readmiss command, the associated duplicate tag store lookup will detect the duplicate tag store update from the first CPU's read-miss-modify command and a second probe read message will be issued to the first CPU. This second probe read message may reach the top of the first CPU's probe queue before the fill message associated with the first probe read message reaches that same CPU. Since the fill message associated with the first probe read contains the data required by the second probe read, the second probe read cannot be serviced.
Prior art systems have resolved this issue through a variety of means. Many computer systems, such as the AlphaServer 8000 series of computers manufactured by Digital Equipment Corporation, include arbitration circuits in the system control logic that block the issuance of the readmiss command from the second CPU until the fill message for the first CPU has reached the first CPU, or until that fill message is guaranteed to reach the first CPU before the second probe read message reaches the top of the probe queue. Implementations of such "arbitration blocking" mechanisms are either complex, e.g. where the mechanism blocks only references to a given data element, or present limitations to system performance, e.g. where the mechanism blocks access to an entire region of memory.