1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to coherency systems within computer systems.
2. Description of the Related Art
Generally, personal computers (PCs) and other types of computer systems have been designed around a shared bus system for accessing memory. One or more processors and one or more input/output (I/O) devices are coupled to memory through the shared bus. The I/O devices may be coupled to the shared bus through an I/O bridge which manages the transfer of information between the shared bus and the I/O devices, while processors are typically coupled directly to the shared bus or are coupled through a cache hierarchy to the shared bus.
Unfortunately, shared bus systems suffer from several drawbacks. For example, since there are multiple devices attached to the shared bus, the bus is typically operated at a relatively low frequency. The multiple attachments present a high capacitive load to a device driving a signal on the bus, and the multiple attach points present a relatively complicated transmission line model for high frequencies. Accordingly, the frequency remains low, and bandwidth available on the shared bus is similarly relatively low. The low bandwidth presents a barrier to attaching additional devices to the shared bus, as performance may be limited by available bandwidth.
Another disadvantage of the shared bus system is a lack of scalability to larger numbers of devices. As mentioned above, the amount of bandwidth is fixed (and may decrease if adding additional devices reduces the operable frequency of the bus). Once the bandwidth requirements of the devices attached to the bus (either directly or indirectly) exceeds the available bandwidth of the bus, devices will frequently be stalled when attempting access to the bus. Overall performance may be decreased.
One or more of the above problems may be addressed using a distributed memory system. A computer system employing a distributed memory system includes multiple nodes. Two or more of the nodes are connected to memory, and the nodes are interconnected using any suitable interconnect. For example, each node may be connected to each other node using dedicated lines. Alternatively, each node may connect to a fixed number of other nodes, and transactions may be routed from a first node to a second node to which the first node is not directly connected via one or more intermediate nodes. The memory address space is assigned across the memories in each node.
Nodes may additionally include one or more processors. The processors typically include caches which store cache blocks of data read from the memories. Furthermore, a node may include one or more caches external to the processors. Since the processors and/or nodes may be storing cache blocks accessed by other nodes, a mechanism for maintaining coherency within the nodes is desired.
The problems outlined above are in large part solved by a computer system as described herein. The computer system may include multiple processing nodes, one or more of which may be coupled to separate memories which may form a distributed memory system. The processing nodes may include caches, and the computer system may maintain coherency between the caches and the distributed memory system. Particularly, the computer system may implement a flexible probe command/response routing scheme.
In one embodiment, the scheme employs an indication within the probe command which identifies a receiving node to receive the probe responses. Generally, the probe command is a request to a node to determine if a cache block is stored in that node and an indication of the actions to be taken by that node if the cache block is stored in that node. The probe response indicates that the actions have been taken, and may include a transmission of data if the cache block has been modified by the node. By providing the flexibility to route the probe responses to different receiving nodes depending upon the command sent, the maintenance of coherency may be performed in a relatively efficient manner (e.g. using the fewest number of packet transmissions between processing nodes) while still ensuring that coherency is maintained.
For example, probe commands indicating that the target or the source of transaction should receive probe responses corresponding to the transaction may be included. Probe commands may specify the source of the transaction as the receiving node for read transactions (such that dirty data is delivered to the source node from the node storing the dirty data). On the other hand, for write transactions (in which data is being updated in memory at the target node of the transaction), the probe commands may specify the target of the transaction as the receiving node. In this manner, the target may determine when to commit the write data to memory and may receive any dirty data to be merged with the write data.
Broadly speaking, a computer system is contemplated. The computer system may comprise a first processing node and a second processing node. The first processing node may be configured to initiate a transaction by transmitting a request. Coupled to receive the request from the first processing node, the second processing node may be configured to generate a probe in response to the request. The probe includes an indication which designates a receiving node to receive responses to the probe. Additionally, the second processing node may be configured to generate the indication responsive to a type of the transaction.
A method for maintaining coherency in a computer system is also contemplated. A request from a source node is transmitted to a target node. A probe is generated in the target node responsive to the request. A receiving node is designated for responses to the probe via an indication within the probe. A probe response to the probe is routed to the receiving node.