The use of split-response buses and bus-like interconnects is a recent trend in computer design. In some computer architectures, several processors may be coupled to the same split-response bus. In multi-processor architectures, however, several local buses are often employed with a plurality of processors attached to each local bus. The local buses are in turn connected together by bridges that transfer data from one bus to the other and vice versa. The bridges vary significantly in complexity, and can provide for the complete reformatting of data passing through the bridge or do little more than temporarily buffer the data. Moreover, it is common for the processors to share memories which may be attached to any one or several of the buses in such a multi-processor environment.
Because of the configuration just described, data transfers between nodes of the system, i.e., from memory to the processors, from processor to processor, or from bridge to processor, are a common event. A majority of the bus bandwidth is utilized servicing transfer requests via read and write transactions. In multiprocessor environments, reads and writes are used to transfer data between memory and caches, or between processor caches. Input and output (I/O) devices also use reads and writes to access memory. An I/O device can also use a write transaction to interrupt the processor when an I/O operation completes.
Any device connected to the bus that can transmit and receive read and write transactions is referred to as a node. In split-response bus multiprocessor systems, read and write transactions sent over the bus between nodes are split into request and response subactions. Request and response subactions transmitted between local systems must pass through a bridge.
The bridge has the behavior of an intermediate agent as these subactions are transferred from a requesting node to a responding node. On the first bus, the bridge consumes packets generated by the requesting node, queuing them for retransmission on the second bus. On the second bus, the bridge is the requester, generating subaction packets for consumption by the responder node.
Each local bus in the multiprocessor computer system may be connected to the split-response bus using an adapter. Adapters for local buses, such as for a peripheral component interconnect (PCI) bus, are similar in operation to bridges, but have other special requirements. An example of such special requirements is that the PCI specification normally assumes that PCI adapters will process write requests as posted writes.
A posted write is a request that is performed by pretending that the write has completed when the write request has been queued in the adapter. More specifically, when a posted write request is received by the adapter, the adapter queues the request and attempts to forward the request to the responding node on the local bus. Even if the request cannot be immediately forwarded, the adapter generates a response subaction having a completed status and transmits the response back to the requesting node. The completed status indicates that the write has completed, even though in many cases it has not.
For posted writes, the adapter enforces ordering constraints on incoming request to guarantee that other nodes cannot determine that the write has not yet completed. By properly delaying the progress of (potentially) dependent request and response subactions within its queue, the adapter allows the posted write to complete before the queued-but-incomplete nature of the posted-write can be observed by other nodes. For example, assume that after transmitting a posted write and receiving a response in return, the requesting node initiates a read to determine if the write has taken effect. The adapter would queue the new read request, but enforce ordering of the queued requests so that by the time the read request is serviced, the write request has completed. This is an example of the posted-write/request ordering constraint; a following read or write request is not allowed to bypass a previously-posted write transaction.
As another example, assume a DMA-capable I/O device is on busB, a processor and memory are on busA, and that busA and busB are connected by a bridge. To transfer data into the memory on busA, the I/O device issues a posted write request from busB-to-busA, and a done bit is set on busB when the write completes. To read the I/O device's done bit, the processor on busA initiates a busA-to-busB read request. Delaying the response for the read request until the busB-to-busA posted write completes is an example of a posted-write/read response ordering constraint.
Thus, with a posted write, there is no way of determining through normal bus connection paths whether the posted write request has actually been performed.
Handling posted write requests is one instance when adapters delay the progress of transactions. Adapters, as well as bridges, must also delay the progress of subactions when their queues become filled with pending subactions. There are generally two ways of delaying the progress of other subactions; a busy/retry protocol, and a reject/resend protocol. The busy/retry protocol is typically employed by both bridges and adapters, while the reject/resend protocol is only used by adapters.
The busy/retry protocol is normally used by a consumer node (including adapters and bridges) to delay the acceptance of new request and response subactions once the consumer node's queues have been temporarily filled. When a bridge forwards a subaction packet from a producer to a responder node, the bridge becomes the producer and the responder becomes the consumer. If the consumer node's queues have been temporarily filled, the consumer node returns a busy indication (also called an acknowledge or ack) to the producer, which in this case is the bridge. After observing the busy indication, the producer retries by retransmitting the subaction packet (repeatadely if necessary). The producer stops retrying when a done indication (as opposed to busy) is returned by the responder.
Adapters sometimes use the reject/resend protocol to delay processing of request subactions when maintaining the illusion of posted-write completions. In this case, a request subaction is temporarily accepted into a request queue of an adapter. To make space for additional requests, the adapter has the behavior of a surrogate responder: the request is converted into a response that is returned to the requester. The response contains a CONFLICT indication, that causes the requester to resend the request subaction, in the hope that the rejection condition will eventually be resolved.
Using the busy/retry protocol when transmitting data between bridges is more efficient than the reject/resent protocols because the subactions can remain in bridge queues while being retried. The reject/resend protocol is less efficient because a new request has to be resent from the original requester in reaction to a rejection, even though the request may have passed through multiple bridges before the conflict condition was detected.
Standard busy/retry techniques exist to ensure forward progress, in that the oldest of the retried subactions is eventually accepted. In some environments, however, the busy/retry protocols can deadlock because the adapter/bridge queues are interdependent. For example, assume the queues in nodes A,B,C are full. Assume further that node A sends a subaction from its queue towards node B, node B sends a subaction from its queue towards node C, and node C, in turn, sends a subaction from its queue towards node A. Note that the nodes cannot receive incoming subactions until a space is freed in their respective queues. Normally, a space is made when a subaction is sent from the node's queue and received by another node. In the example above, however, a space cannot be made in the node queues because the transactions sent from each queue can never be received by the intended node. This condition is known as a queue dependency.
Similar queue dependencies can cause deadlocks between the posted-write queues of PCI adapters and the request/response queues of split-response buses, unless some of the dependent subactions in the adapter queues are rejected. For that reason, the reject/resend protocols (rather than the busy/retry protocols) are used to delay the completion of (potentially) dependent subactions when posted writes have been queued in PCI adapters.
Although the reject/resend techniques are supported in several standards, there are no equivalent assurances of forward progress defined for the reject/resend protocols as there are for the busy/retry protocol. Starvation could occur if a subactions sent by a particular node were continually rejected by a responding node in favor of other nodes. And resolving queue conflicts by rejecting all incoming subactions would result in livelock; a situation where the queues are immediately refilled with resent requests. Both starvation and livelock prevent forward progress of subactions over the buses in a multiprocessor computer system.
Accordingly, what is needed is a method and system for avoiding starvation and deadlocks in a split-response computer system containing bridges and adapters. The present invention addresses such needs.