1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to computer systems wherein write operations modifying the contents of memory must be properly ordered with respect to one another in order to maintain memory coherency within the computer systems.
2. Description of the Related Art
Generally, personal computers (PCs) and other types of computer systems have been designed around a shared bus system for accessing memory. One or more processors and one or more input/output (I/O) devices are coupled to memory through the shared bus. The I/O devices may be coupled to the shared bus through an I/O bridge which manages the transfer of information between the shared bus and the I/O devices, while processors are typically coupled directly to the shared bus or are coupled through a cache hierarchy to the shared bus.
Unfortunately, shared bus systems suffer from several drawbacks. For example, the multiple devices attached to the shared bus present a relatively large electrical capacitance to devices driving signals on the bus. In addition, the multiple attach points on the shared bus produce signal reflections at high signal frequencies which reduce signal integrity. As a result, signal frequencies on the bus are generally kept relatively low in order to maintain signal integrity at an acceptable level. The low signal frequencies reduce signal bandwidth, limiting the performance of devices attached to the bus.
Lack of scalability to larger numbers of devices is another disadvantage of shared bus systems. As mentioned above, the available bus bandwidth is substantially fixed (and may decrease if adding additional devices causes a reduction in signal frequencies upon the bus). Once the bandwidth requirements of the devices attached to the bus (either directly or indirectly) exceeds the available bandwidth of the bus, devices will frequently be stalled when attempting access to the bus. Overall performance of the computer system including the shared bus will most likely be reduced.
On the other hand, distributed memory systems lack many of the above disadvantages. A computer system with a distributed memory system includes multiple nodes, two or more of which are coupled to different memories. The nodes are coupled to one another using any suitable interconnect. For example, each node may be coupled to each other node using dedicated lines. Alternatively, each node may connect to a fixed number of other nodes, and transactions may be routed from a first node to a second node to which the first node is not directly connected via one or more intermediate nodes. A memory address space of the computer system is assigned across the memories in each node.
In general, a xe2x80x9cnodexe2x80x9d is a device which is capable of participating in transactions upon the interconnect. For example, the interconnect may be packet based, and the node may be configured to receive and transmit packets. Generally speaking, a xe2x80x9cpacketxe2x80x9d is a communication between two nodes: an initiating or xe2x80x9csourcexe2x80x9d node which transmits the packet and a destination or xe2x80x9ctargetxe2x80x9d node which receives the packet. When a packet reaches the target node, the target node accepts the information conveyed by the packet and processes the information internally. Alternatively, a node located on a communication path between the source and target nodes may relay the packet from the source node to the target node.
Distributed memory systems present design challenges which differ from the challenges in shared bus systems. For example, shared bus systems regulate the initiation of transactions through bus arbitration. Accordingly, a fair arbitration algorithm allows each bus participant the opportunity to initiate transactions. The order of transactions on the bus may represent the order that transactions are performed (e.g. for coherency purposes). On the other hand, in distributed systems, nodes may initiate transactions concurrently and use the interconnect to transmit the transactions to other nodes. These transactions may have logical conflicts between them (e.g. memory coherency conflicts for transactions involving the same address).
It would thus be desirable to have a system and method for properly ordering write operations within a computer system. Such a system and method would help to maintain memory coherency within computer systems having distributed memory systems.
A computer system is presented implementing a system and method for properly ordering write operations. The system and method may aid in maintaining memory coherency within the computer system. The computer system includes multiple interconnected processing nodes. One or more of the processing nodes includes a central processing unit (CPU) and/or a cache memory, and one or more of the processing nodes includes a memory controller coupled to a memory. The CPU/cache generates a write command to store data within the memory. The memory controller receives the write command and responds to the write command by issuing a target done response to the CPU/cache after the memory controller: (i) properly orders the write command within the memory controller with respect to other commands pending within the memory controller, and (ii) determines that a coherency state with respect to the write command has been established within the computer system.
The CPU may execute instructions of a predefined instruction set, and may generate the write command in response to instruction execution. The memory controller receives the write command, and may respond to the write command by properly ordering the write command within the memory controller with respect to other commands pending within the memory controller. One or more of the processing nodes may include a cache, and the memory controller may determine the coherency state has been established within the processing nodes including a cache by: (i) sending a probe request to each processing node including a cache, and (ii) receiving a probe response from each processing node including a cache. After properly ordering the write command within the memory controller and receiving the probe response from each processing node including a cache, the coherency state with respect to the write command has been established within the computer system, and the memory controller may then issue the target done response to the CPU. The CPU may thus be informed that the write command has reached a point of coherency within the computer system.
The processing node including the CPU and the processing node including the memory controller may be different processing nodes. In this case, the processing nodes of the computer system may route the write command from the processing node including the CPU to the processing node including the memory controller.
At least two of the processing nodes may include a memory controller coupled to a different memory, and a different portion of a memory address space of the computer system may be associated with each memory controller and memory coupled thereto. In this case, the computer system has a distributed memory system.
A cache within a processing node may generate a victim block command to store data to the memory. The memory controller may receive the victim block command and respond to the victim block command by issuing a target done response to the cache. The cache may store the victim block in, for example, a buffer, and may maintain coherency for the victim block during the pendancy of the victim block command. The target done response from the memory controller may signal the cache that the cache may stop maintaining coherency for the victim block.
Before issuing the target done response to the cache, the memory controller may: (i) properly order the write command within the memory controller with respect other commands pending within the memory controller, and (ii) determine that a coherency state with respect to the write command has been established within the computer system. With respect to the coherency state, the computer system may operate such that data coherency is maintained within the computer system. The cache may be separate from, or reside within, a CPU. At least two of the processing nodes may include a cache, and the computer system may implement a cache protocol which maintains coherency with respect to data stored within each cache.
In a first method for properly ordering memory operations within the computer system, the CPU issues a write command to store data within a memory of the computer system. A memory controller coupled to the memory receives the write command, determines that a coherency state with respect to the write command has been established within the computer system, and responds to the write command and the coherency state by issuing a target done response to the CPU. The determining step may include: (i) properly ordering the write command within the memory controller with respect to other commands pending within the memory controller, (ii) sending a probe request to each processing node including a cache, and (iii) receiving a probe response from each processing node including a cache.
In a second method for properly ordering memory operations within the computer system, a cache within a processing node issues a victim block command in order to store data within a memory of the computer system. A memory controller coupled to the memory receives the victim block command and responds to the victim block command by issuing a target done response to the cache. Before issuing the target done response to the cache, the memory controller may: (i) properly order the write command within the memory controller with respect other commands pending within the memory controller, and (ii) determine that a coherency state with respect to the write command has been established within the computer system. With respect to the coherency state, the computer system may operate such that data coherency is maintained within the computer system. As stated above, the cache may be separate from, or reside within, a CPU.