1. Field of the Invention
This invention relates generally to methods and apparatus for transferring commands and associated data blocks. In particular, the present invention relates to methods and apparatus for efficiently transferring commands and their associated data between various devices in a network or in a server architecture.
2. Description of the Related Art
The latency incurred when transferring data can greatly diminish the performance of networks and server architectures since the transferring and the transferee input/output (I/O) devices are usually unable to engage in other operations until the data transfer is complete. This latency is longer and even more complicated in networks and server architectures than in other computer systems because there is so much competition for network and server resources including system memory, processor(s) and multiple I/O devices. This can be quite disadvantageous in networks and server architectures where a large number of data blocks are frequently transferred between the processor, memory and several different I/O devices and/or the data blocks are of widely different sizes. Indeed, the lack of efficiency in transferring data blocks may have a larger effect on overall performance than the speed or other performance characteristics of the elements in the network or server architecture. It also may be that the buses and/or I/O adaptor cards connecting I/O devices to the processor are the bottleneck and the performance of these I/O subsystem components needs to be improved.
Conventional servers typically have multiple adaptor cards, each of which usually supports multiple I/O devices. A server may have a significant number of I/O devices configured in a load/store configuration such as shown in FIG. 1. Even though the processor may perform optimally, the performance of the server is still less than optimum because the I/O devices in the server may be of radically different types, store different kinds of data and/or vary from each other in the addressing sequence by which the data blocks containing the data are written and read out. For example, a pre-recorded CD-ROM may store large contiguous blocks of image data and the read out of such image data by an optical disk drive may consist of several smaller sequential reads. Another I/O device may store heavily fragmented user data and the data readout from such a device rarely consists of large blocks of data.
More particularly, in the example of FIG. 1, there is shown a processor P, system memory SM, I/O adaptor card A, hard disk HD, I/O adaptor card B, a network interface card NIC, I/O adaptor card C and a CD-ROM drive CD, all connected along an input/output bus, for example, a Peripheral Component Interconnect (PCI) synchronous bus as described in the latest version of “PCI Local Bus Specification, Revision 2.1” set forth by the PCI Special Interest Group (SIG) on Jun. 1, 1995. The PCI architecture provides the most common method currently used to extend computer systems for add-on arrangements (e.g., expansion cards) with new disk memory storage capabilities.
In this load/store configuration, taking a write command, for example, suppose the processor P wishes to write a block of data within the hard disk HD. First, as shown in FIG. 2, the processor P stores the command and its associated data to be written within a block A within the system memory SM. The processor P transfers a command to the register on the PCI I/O adapter card A via a path over the system bus, PCI bus bridge, and PCI bus. This tells the I/O adapter card A that a new command has been issued. I/O adapter card A must decipher that command and then read system memory SM to obtain the address of the write command. It must also read a pointer, which is the value representing an address within the system memory SM where the data associated with the command can be found. (The pointer may be virtual or physical and the location of the data is not necessarily contiguous with the location of the command. Indeed, the data may be split, requiring a Scatter/Gather List (SGL) to describe the locations of the data.) The I/O adapter card A then goes to the address of system memory SM pointed to by the pointer. The block of data A is read from the system memory back to the I/O adapter card, which will require several more fetches. The data is then subsequently written from the I/O adaptor card A to the hard disk HD. Even if the processor sets aside known areas for the commands in system memory SM so that the I/O adaptor card A always knows the address of the command, the I/O adaptor card would still need to read the write command to know where the data is located and to perform the fetches to obtain the data.
A similar procedure occurs when the processor P reads a block of data from the hard disk HD, i.e., the adapter card A would store the block of data within a block B within the system memory SM, then pass an indication to the processor P that the read process from the hard disk HD has been finished, whereupon the processor P can access the block B within the system memory SM to obtain the data. Such a conventional procedure (illustrated generally in FIG. 3) of sending a command with pointer (step 1), waiting for and receiving a request for data (step 2) and subsequently sending the data in response to the request (step 3) has substantial inherent latencies and delays. The procedure is very inefficient and slows down the entire system since many processor cycles will pass before the data transfer is completed.