The invention relates to device drivers in general. More particularly, the invention relates to a method and apparatus for managing the transfer of data from memory to an input/output (I/O) device using smart coalescing.
Local area networks (LANs) are attractive to many small to mid-size companies due to their performance and cost efficiency. A LAN typically comprises a number of personal computers (PCS) connected by some sort of transmission medium such as fiber optic cable. Each PC is equipped with a Network Interface Card (NIC). The NIC manages the flow of information between the PC and the network using, among other things, a media access control (MAC) protocol. Recently, a new MAC protocol was introduced that substantially increases data transfer speeds, which is defined in the Institute of Electrical and Electronics Engineers (IEEE) standard 802.3z titled xe2x80x9cSupplement to Information Technologyxe2x80x94Local and Metropolitan Area Networksxe2x80x94Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specificationsxe2x80x94Media Access Control Parameters, Physical Layers, Repeater and Management Parameters for 1,000 Mb/s Operation,xe2x80x9d Jun. 20, 1996 (xe2x80x9cGigabit Ethernetxe2x80x9d).
Gigabit Ethernet is a packet based protocol. This means that information that is to be transferred from one PC to another PC is broken down into packets which are communicated over the transmission medium under the control of the respective PC""s NIC. A typical packet may contain several fragments, such as one from the Data Link Layer, one for the Network Layer, one for the Transport layer, one for the payload, and so forth. For example a Transport Control Protocol/Internet Protocol (TCP/IP) packet over Ethernet will generally have an Ethernet header fragment of 14 bytes, an IP header fragment of 20 bytes, a TCP header fragments of 20 bytes, and one or two data payload fragments of 1-1460 bytes. Each fragment is stored somewhere in memory prior to transmission by the NIC.
Prior to transmitting a packet the NIC must read each packet fragment from its respective location in memory. The NIC relies upon a number of PC sub-systems to accomplish this process, namely the memory sub-system and the peripheral component interconnect (PCI) sub-system. The PC sub-systems are coordinated by a device driver supporting the NIC. The method the device driver employs to manage the PC sub-systems directly impacts the speed at which the NIC can read the packet fragments and transmit the packet to the network. Consequently, a substantial need exists for optimizing the device driver to improve network transfer speeds.
Optimization of the NIC device driver is particularly important for Gigabit Ethernet networks. Gigabit Ethernet operates at speeds of 1000 Megabits per second (Mbps). In full duplex at 100% of wire speed the throughput of a Gigabit Ethernet NIC is about 250 megabytes per second (Mbps). This is significantly faster than the potential data transfer speeds of the PCI sub-system used in conventional PCS, which typically have 32 bit PCI slots operating at 33 Megahertz (MHZ). Therefore, maximizing the bandwidth of the PCI sub-system is crucial to achieving high throughput for a Gigabit Ethernet system.
There are currently two general methods that attempt to maximize the bandwidth of the PCI sub-system. In both methods, the network operating system (NOS) sends a first list to the NIC device driver. The first list contains a location (e.g, memory address) for each fragment stored in host memory. The NIC device driver then generates a second list using information contained in the first list and sends the second list to the NIC. The NIC then reads each fragment from memory via direct memory access (DMA) transfers in accordance with the second list.
One difference between the two methods is the number of DMA transfers required for the NIC to read each packet fragment. In the first method, the driver receives the first list and copies each fragment to a buffer, which is typically referred to as a coalesce buffer. The driver stores the memory location for the coalesce buffer in the second list and sends the second list to the NIC. The NIC then retrieves the contents of the buffer using a single DMA transfer and transmits the data. In the second method, the driver receives the first list and generates a second list corresponding to the first list without any memory-to-memory copies. Since the second list is made up completely of NOS owned memory, the driver must xe2x80x9clock-downxe2x80x9d each fragment so that the fragment data is not moved in physical memory by the NOS before or during the DMA. The NIC then retrieves each fragment from its memory location using a separate DMA transfer for each fragment. Each fragment is unlocked once it has been read by the NIC.
Certain advantages and disadvantages are associated with each method. The first method uses a single DMA transfer and therefore minimizes the associated latency. The first method, however, heavily burdens the memory sub-system since a memory-to-memory copy must be made for each fragment. Further, copying larger fragments may take longer than simply using a single DMA transfer. In addition, retrieval of the fragments cannot begin until copying is complete. With respect to the second method, the burden on the memory sub-system is alleviated, but multiple DMA transfers are necessary which shifts the burden to the PCI sub-system and increases the overall DMA latency time. This becomes particularly problematic for smaller fragments since each DMA has an associated overhead latency for bus arbitration regardless of the fragment size. Further, the second method must lock-down each fragment prior to transfer. This further delays the fragment retrieval process.
In view of the foregoing, it can be appreciated that a substantial need exists for a NIC device driver that solves the above-discussed problems.
One embodiment of the invention comprises a method and apparatus for managing data transfers from memory to an input/output device where the data is stored in memory as data fragments. A first list of memory locations for the fragments is received. A sub-set of fragments for copying to at least one of a first and second buffer is selected based on fragment size. A request to copy the selected sub-set of fragments to the at least one first and second buffer is sent. A request to lock down any unselected fragments is sent. A second list of memory locations for the fragments is created. The second list comprises memory locations for the at least one first and second buffer and locked down fragments.