1. Field of the Invention
The present invention relates to a device for connecting a computer system to a computer network, and more particularly to a method and an apparatus for reducing bus overhead in communications between a computer system and a network interface device through which the computer system communicates with a high speed packet-switched network.
2. Related Art
The advent of computer networking has given rise to devices that connect computer systems to packet-switched data networks. These devices (known as network interface controllers, or NICs) typically include interfaces to both the computer system and the packet-switched data network, as well as a buffer memory for buffering packets of data in transit between the computer system and the packet-switched data network. The interface to the computer system typically connects to a bus within the computer system, such as a PCI bus, through which data is transferred between the computer system memory and the NIC. As computer networks and NICs greatly increase in performance, communications across this bus can become an impediment to achieving high performance in communications between the computer system and the packet-switched data network.
Three methods can be used to communicate between a computer system and a device such as a NIC. (1) Programmed I/O (PIO) operates by including explicit I/O commands in the application programs executed by the computer system. PIO can be implemented with a simple hardware and operating system design. However, it places a tremendous burden on the application program to explicitly manage communications between the computer system and the NIC. (2) Shared memory can be used to facilitate communications between the NIC and the computer system. In a shared memory system, the NIC and the computer system communicate by writing to and reading from a shared memory that exists in both the address space of the computer system and the address space of the NIC. This again leads to a simple hardware and operating system implementation, and a clean interface between the computer system and the NIC. However, it again places a burden on the application program to explicitly manage communications between the computer system and the NIC. (3) Finally, direct memory access (DMA) can be used to transfer data between the NIC and the memory of the computer system. DMA operates by allowing the NIC to perform bus operations to directly access the memory of the computer system in order to transfer data between the computer system and the NIC. A DMA system requires considerable complexity in hardware and operating system design. However, it relieves the application program of the burden of explicitly managing communications between the computer system and the NIC.
DMA transfers between computer systems and NICs are commonly accomplished using the scatter-gather technique. In scatter-gather, a bus master device in the NIC is first instructed to obtain a command block from the memory of a host computer system. At a minimum, the command block contains a list of physical addresses for blocks within the host system memory that are to be copied to the DMA device. The command block also contains a count of the number of fragments in the command block and the overall length of the data contained in the fragments pointed to by the command block. The DMA device parses the command block, extracting the address of each fragment, and transfers the fragments from the host memory to the DMA device. This process is repeated for each fragment listed in the command block until all of the data described by the command block is copied to the DMA device.
A significant performance bottleneck in using the scatter-gather technique for transferring data to a high speed network is the translation from virtual to physical addresses. Peripheral devices, such as a NIC, cannot use virtual memory addresses to effect the transfers, because the hardware to implement the virtual-to-physical address translation is typically located inside the CPU. This means that conversion between virtual and physical addresses must take place before transfers between a computer system and a NIC can take place. This conversion can take a great deal of time and consume a significant amount of the computer system""s processing power. When data is passed to a device driver for transmission to the NIC, the driver first performs a virtual-to-physical address conversion for each buffer fragment passed down to it from the application layers above. It is possible for each buffer fragment to straddle physical pages of the memory system. Thus, more than one physical address may correspond to each virtual address converted. Consequently, several virtual-to-physical address conversions may be required for each buffer of data that is transferred from the computer system to the NIC. This can be very time-consuming because each virtual-to-physical address translation can take from tens to hundreds of CPU cycles to accomplish.
Another significant performance impediment associated with the scatter-gather technique is its command block nature. Peripheral devices such as NICs typically connect to computer systems through a peripheral interconnect bus, such as the PCI bus. In order to transfer data to or from the computer system, devices connected to the bus contend for control of the bus. Once a device is granted control of the bus, it drives bus signal lines to transfer data to or from the computer system. The performance impediment stems from the number of times a NIC must contend for the peripheral interconnect bus when transferring data using the scatter-gather technique. Under ideal circumstances for scatter-gather, bus contention to transfer data between a NIC and an attached computer system will occur three times per buffer transferred: first, when the computer system informs the NIC that a buffer is available for its use; second when the NIC reads the command block describing the buffer; and third when the NIC transfers data to or from the buffer. In typical scenarios, at least two buffer fragments will be described in each command block. As a result, there will be at least four contentions instead of three. These additional contentions create opportunities for other devices to obtain control of the bus and thus delay transfers initiated by the NIC.
What is needed is a method for performing DMA between a computer system and a NIC which is free from the overhead of performing virtual to physical address translations and minimizes the number of bus transactions required to initiate the DMA transfer process.
The present invention provides a method and an apparatus for transferring data between a computer system and a network interface card that avoids virtual-to-physical address translations. The computer system allocates blocks of memory during system initialization for storing data in transit between the computer system and the NIC, and the physical addresses of these blocks of memory are stored in a table on the NIC. Consequently, address conversion is performed only once, when the memory is allocated. When a request to transfer data to the NIC is received from the upper layers, the device driver copies the data from the upper layers into the next available memory block. The device driver then formats a command and passes it to the NIC for processing. Data transfer commands are communicated to the NIC through a packet descriptor command (PDC), which is a 32-bit value subdivided into fields that completely describe the data transfer operation. The PDC contains a small ordinal value that indexes a table in the NIC, which includes a set of physical addresses of buffers preallocated by the computer system in the computer system memory. These buffers are used for storing data in transit to the NIC. The PDC also contains the length of the buffer to be copied to or from the NIC. The present invention also allows for multiple packets to be formatted into buffers and then subsequently transferred to the NIC in a single I/O operation.
The present invention provides a number of advantages. First, virtual-to-physical address translation is avoided at run time. Second, the formatting of a packet descriptor list is greatly simplified. Third, the amount of control data transferred to the NIC by the computer system is greatly reduced. Finally, multiple packets can be transferred to the NIC in a single I/O operation, thereby making more efficient use of bandwidth on the interconnect bus.
The present invention incurs additional overhead because the processor must move data from the application program into the data buffers in the computer system""s memory before this data is transferred to the NIC. At first glance, this double copy operation appears to incur a great amount of additional processor overhead. However, this additional overhead is considerably smaller than the overhead involved in performing virtual-to-physical address translations. Each translation requires many tens (if not hundreds) of CPU cycles, and many such translations may be required for a single transfer operation. Consequently, the present invention provides a significant performance advantage for small data transfers, which represent a significant percentage of all data transfers. Hundreds of bytes can be moved to the preallocated buffer in the time it takes to perform just one virtual-to-physical address translation. Moreover, as microprocessors move to 64 and 128 bit architectures, their capacity to move data per clock will increase thereby further widening the performance advantage of the present invention over conventional scatter-gather DMA.
Furthermore, CPU utilization may not be the primary bottleneck. In systems which move around large amounts of data, bus utilization may be the largest bottleneck. Hence, favoring bus utilization at the expense of CPU utilization is often a desirable tradeoff to make.
Thus, the present invention can be characterized as an apparatus for facilitating communications between a computer system, including a memory and a bus, and a packet-switched network, comprising: a bus interface coupled to the bus, for communicating across the bus; a transmit buffer, for storing data to be transmitted on the packet-switched network; a transmit data path, coupled to the bus interface and the transmit buffer, for transferring data from the bus interface to the transmit buffer; a receive buffer, for storing data received from the packet-switched network; a receive data path, coupled to the bus interface and the receive buffer, for transferring data from the receive buffer to the bus interface; a buffer address table, coupled to the bus interface, for storing at least one address of at least one buffer in the memory of the computer system, the at least one buffer being preallocated by the computer system and used to store data in transit between the computer system and one of the transmit buffer and the receive buffer; and a controller coupled to the transmit buffer, the receive buffer and the buffer address table, for controlling the transfer of data from the computer system to the transmit buffer, and from the receive buffer to the computer system.
According to an aspect of the present invention, the apparatus includes: a transmit command queue coupled to the bus interface and the controller, for storing transmit commands from the computer system; and a transmit execution queue, coupled to the bus interface, the transmit command queue and the controller, for storing and processing commands from the transmit command queue, and command blocks from the computer system which are referenced by commands from the transmit command queue.
According to another aspect of the present invention, the apparatus includes a receive command queue coupled to the bus interface and the controller, for storing receive commands from the computer system; and a receive execution queue, coupled to the bus interface, the receive command queue and the controller, for storing and processing commands from the receive command queue and command blocks from the computer system that are referenced by commands from the receive command queue.
According to another aspect of the present invention, the controller includes a mechanism to transfer a plurality of packets in a single operation between the at least one buffer preallocated by the computer system and the transmit buffer.
According to another aspect of the present invention, the controller includes a mechanism to transfer a plurality of packets in a single operation between the receive buffer and the at least one buffer preallocated by the computer system.
The present invention can also be characterized as a method for transferring data between a computer system and a network interface device, the network interface device being coupled to a packet-switched network, and the computer system including a memory and a communication channel, the communication channel being coupled to the network interface device, the method comprising: receiving at the network interface device at least one address of a preallocated buffer in the memory; storing in the network interface device the at least one address of the preallocated buffer; receiving a command from the computer system through the communication channel, the command indicating that a transfer between the network interface device and the computer system is to take place; retrieving an address from the at least one address of a preallocated buffer stored in the network interface device; using the address to transfer data from the preallocated buffer in the memory to the network interface device if the command is a transmit command; and using the address to transfer data from the network interface device to the preallocated buffer in the memory if the command is a receive command.