1. Technical Field
Embodiments of the present invention generally relate to systems and methods of accessing memory locations. More particularly, embodiments relate to systems and methods of performing locked reads in which peer-to-peer non-posted input/output (I/O) transactions are supported.
2. Discussion
In modern day computer systems, the ability to read, modify and write data in an uninterrupted sequence is often required in order to guarantee exclusive access to a given memory location, where the memory location is shared between multiple nodes (or agents) in the system. For example, such “atomic” accesses are often used for applications in which “locked” read/write sequences are needed for security purposes. Another example of the need for atomic accesses involves “hot-plugging”, where a component such as a microprocessor is added to the system while the system is in operation. In such a case, certain uninterrupted configuration transactions must be made to multiple agents in order to provide for proper operation of the added component. The hot-plug lock flow therefore differs from the normal lock flow in that the normal lock flow involves accesses to a single agent. While conventional approaches to performing atomic accesses in multi-node systems have been effective in certain circumstances, there still remains considerable room for improvement.
FIG. 1 shows a conventional multi-node system 10 having a network interconnect 16, a plurality of processor nodes 12 (12a-12c) and a plurality of input/output (I/O) nodes with I/O hubs 14 (14a, 14b) coupled to the network interconnect 16. The I/O hubs 14 can be selected from the 870 chipset family available from Intel® in Santa Clara, Calif. The processor node 12a is coupled to a dynamic random access memory (DRAM) 18 and the I/O hubs 14 are coupled to I/O devices 20 (20a-20n), where the I/O devices have memory mapped I/O (MMIO) space 22 (22a-22n). Whenever a processor node 12 or I/O device 20 requires locked access to the DRAM 18 or the MMIO space 22, a locked read request is sent to a central lock arbiter 24. The locked read request is essentially a read request with a lock attribute to indicate that exclusive access is desired. If multiple locked read requests are received by the arbiter 24, the arbiter selects a winning request and subsequently retries losing requests.
In order to implement the locked read, the arbiter 24 “freezes” and flushes the outstanding transactions on the nodes in the system 10, then permits only the initiator of the read request and the target of the read to proceed until the atomic access is completed. The term “freeze” is used herein to refer to the process of halting a boundary (or port) of a network component so that a given transaction does not cross the boundary and remains where it is. Thus, the arbiter 24 broadcasts a processor flush command to the processor nodes 12, where each processor node 12 halts a front side bus (FSB) in response to the processor flush command. The processor nodes 12 then flush all outstanding transactions to the network interconnect 16. After the transactions have been flushed from a particular processor node, a flush completion message is sent to the arbiter 24. The arbiter 24 waits for all completions before proceeding to the next operation of the locked read.
When all of the processor node completions have been received, the arbiter 24 broadcasts an I/O hub (IOH) flush command 26 to the I/O hubs 14. The I/O hubs 14 have inbound ordering queues (IOQs), which are halted in response to the flush command 26. The IOQs are used to enforce I/O transaction ordering rules, which described in more detail below. The I/O hubs 14 then flush all outstanding transactions from their outgoing request buffers (ORBs), which hold information regarding requests that are pending in the network, to the network interconnect 16 and send a flush completion message to the arbiter 24. Although such an approach can be suitable for some purposes, a number of difficulties remain. A particular difficulty relates to transaction posting. Posting enables a device to proceed with the next operation while the posted transaction is still making its way through the network interconnect 16 to its ultimate destination. The use of unordered interconnects, however, can lead to multiple paths for data traveling from a source to a destination. Because some transactions, such as read requests, are heavily dependent on the order in which they are processed, certain read requests are designated as “non-posted” in order to ensure that they are not passed by a transaction that should be processed after the read request. The use of the non-posted designation in conjunction with well documented producer/consumer rules keeps the system from functioning in an unintended manner.
Unlike posted transactions, non-posted transactions have an explicit completion message that is returned from the destination to the source. FIG. 2 illustrates that in the case of a non-posted transaction between I/O hubs, the completion message can be trapped in the IOQ of the destination I/O hub and ultimately cause a deadlocked condition. Specifically, a first I/O hub 14a halts an IOQ 28 and a second I/O hub 14b halts an IOQ 30 in response to an IOH flush command. The first I/O hub 14a flushes the non-posted transaction “P2P RD A” from an outgoing request buffer (ORB) to the second I/O hub 14b by way of the network interconnect 16. The target I/O device 20c receives the non-posted transaction and returns a completion message “P2P Rd Comp A” to the second I/O hub 14b. Because the second I/O hub 14b has halted the IOQ 30, however, the completion message is trapped in the IOQ 30 and is never received by the first I/O hub 14a. Accordingly, the first I/O hub 14a is not able to return the completion message necessary for the arbiter 24 (FIG. 1) to proceed with the locked read. Due to such a potential for a deadlocked condition, conventional architectures do not permit peer-to-peer non-posted I/O transactions.