1. Field of the Invention
This invention relates generally to memory access and, more particularly, to a destination controlled remote direct memory access ("DMA") engine.
2. Description of the Related Art
One important component in a computer's performance is the efficiency with which it accesses memory. Most, if not all, instructions that a computer executes require the computer's processor to either write or read data from memory. Thus, the more efficiently the computer accesses data in memory, the better the computer's overall performance. It is also therefore important that a computer both read from and write to memory efficiently since both operations limit the computer's performance. Gains in performance can consequently be obtained by improving the efficiency of either reading or writing.
FIG. 1 illustrates a particular computer's prior art memory and input/output ("I/O") subsystem 10. The subsystem 10 is constructed and operates in accord with an industry standard known as the Peripheral Component Interface ("PCI") specification. The subsystem 10 includes a memory 12 that receives and transmits data over a host bus 14. To facilitate data transfer during I/O operations, the subsystem 10 includes a host/PCI bridge 16 between the host bus 14 and a PCI bus 18. The PCI bus 18 provides a communications mechanism permitting a variety of peripheral components (not shown) to conduct their business without slowing operations on the host bus 14.
The peripheral components in the subsystem 10 are I/O devices, such as a monitor, a keyboard, a mouse, or a printer, that interface with the PCI bus 18 through I/O adapters 20. As used hereafter, the term "I/O adapter" shall mean either an I/O device or an interface to an I/O device. As shown in FIG. 1, there are several I/O adapters 20, each of which must transact its business on the PCI bus 18, but only one can do so at a time. The individual I/O adapters 20 arbitrate among themselves and the host/PCI bridge 16 in between transactions to see who will control the PCI bus 18 for the next transaction. Once an individual I/O adapter 20 wins the arbitration and controls the PCI bus 18, it can access the memory 12 through the host/PCI bridge 16 over the PCI bus 18 and the host bus 14.
To write data to an I/O adapter 20, an initiating device (not shown), such as a processor, puts the data on the host bus 14. The host bus 14 then receives the data and writes it to a write buffer 24 of the host/PCI bridge 16. The host/PCI bridge 16 then arbitrates for control of the PCI bus 18 and, upon receiving control, writes the data to the I/O adapter 20. The host/PCI bridge 16 then relinquishes control of the PCI bus 18.
To read data from the memory 12, an individual I/O adapter 20 wins control of and then issues a read transaction on the PCI bus 18. The host/PCI bridge 16 receives the read transaction. Upon receiving the read transaction, the host/PCI bridge 16 signals the I/O adapter 20 to retry at a later time, reserves a read buffer 22 for use in the read transaction, and queues a memory access request to fetch the data from the memory 12 over the host bus 14. The I/O adapter 20 then relinquishes control of the PCI bus 18. When the host/PCI bridge 16 receives the data, it writes the data in the reserved read buffer 22. The I/O adapter 20, in the meantime, periodically retries getting the data from the host/PCI bridge 16, each retry requiring the I/O adapter 20 to win control of the PCI bus 18. Eventually, the host/PCI bridge 16 has the data in its read buffer 22. The I/O adapter 20 then receives the data from the host/PCI bridge 16 whereupon the host/PCI bridge 16 releases the reserved read buffer 22 and the I/O adapter 20 relinquishes control of the PCI bus 18.
Thus, there are at least two technological problems with the structure and operation of the subsystem 10 in FIG. 1. First, there is a great disparity between reads and writes for the I/O adapters 20 in the efficiency with which the resources of the subsystem 10 are used. Second, the design does not scale well in the sense of adding I/O adapters 20 and PCI buses 18 and 28 to expand the I/O subsystem.
More particularly, for the read transaction, a read buffer 22 must be reserved for the entire read transaction. Also, there are many more arbitrations for control of the PCI bus 18 for reads than there are for writes. This disparity is compounded for a read by an I/O adapter 26 by the necessity to operate over the PCI bus 28 and through the PCI/PCI bridge 32. When the number of I/O adapters 20 and 26 performing reads exceeds the number of available read buffers 22 in the bridges 16 and 32, additional latency is incurred before the bridges 16 and 32 can even forward the read requests to the host bus 14. When additional PCI/PCI buses 28 are added to expand the I/O subsystem, latencies are accumulated since each bridge 32 must reserve a read buffer 22 from its parent bridge 16, competing with all other bridges and I/O adapters on the PCI/PCI bridge 32's primary bus. For a single read to complete, a read buffer 22 in each bridge 16 and 23 is consumed and, when a read buffer 22 is not available, the transaction stalls. Since each bridge 16 and 32 has a limited number of read buffers 22, the subsystem 10 does not scale well.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.