BACKGROUND
SUMMARY
BRIEF DESCRIPTION OF THE FIGURES
DETAILED DESCRIPTION
Introduction
One Embodiment of a High Performance Network Interface Circuit
An Illustrative Packet
One Embodiment of a Header Parser
Dynamic Header Parsing Instructions in One Embodiment of the Invention
One Embodiment of a Flow Database
One Embodiment of a Flow Database Manager
One Embodiment of a Load Distributor
One Embodiment of a Packet Queue
One Embodiment of a Control Queue
One Embodiment of a DMA Engine
Methods of Transferring a Packet Into a Memory Buffer by a DMA Engine
A Method of Transferring a Packet with Operation Code 0
A Method of Transferring a Packet with Operation Code 1
A Method of Transferring a Packet with Operation Code 2
A Method of Transferring a Packet with Operation Code 3
A Method of Transferring a Packet with Operation Code 4
A Method of Transferring a Packet with Operation Code 5
A Method of Transferring a Packet with Operation Code 6 or 7
One Embodiment of a Dynamic Packet Batching Module
Early Random Packet Discard in One Embodiment of the Invention
CLAIMS
This invention relates to the fields of computer systems and computer networks. In particular, the present invention relates to a Network Interface Circuit (NIC) for processing communication packets exchanged between a computer network and a host computer system.
The interface between a computer and a network is often a bottleneck for communications passing between the computer and the network. While computer performance (e.g., processor speed) has increased exponentially over the years and computer network transmission speeds have undergone similar increases, inefficiencies in the way network interface circuits handle communications have become more and more evident. With each incremental increase in computer or network speed, it becomes ever more apparent that the interface between the computer and the network cannot keep pace. These inefficiencies involve several basic problems in the way communications between a network and a computer are handled.
Today""s most popular forms of networks tend to be packet-based. These types of networks, including the Internet and many local area networks, transmit information in the form of packets. Each packet is separately created and transmitted by an originating endstation and is separately received and processed by a destination endstation. In addition, each packet may, in a bus topology network for example, be received and processed by numerous stations located between the originating and destination endstations.
One basic problem with packet networks is that each packet must be processed through multiple protocols or protocol levels (known collectively as a xe2x80x9cprotocol stackxe2x80x9d) on both the origination and destination endstations. When data transmitted between stations is longer than a certain minimal length, the data is divided into multiple portions, and each portion is carried by a separate packet. The amount of data that a packet can carry is generally limited by the network that conveys the packet and is often expressed as a maximum transfer unit (MTU). The original aggregation of data is sometimes known as a xe2x80x9cdatagram,xe2x80x9d and each packet carrying part of a single datagram is processed very similarly to the other packets of the datagram.
Communication packets are generally processed as follows. In the origination endstation, each separate data portion of a datagram is processed through a protocol stack. During this processing multiple protocol headers (e.g., TCP, IP, Ethernet) are added to the data portion to form a packet that can be transmitted across the network. The packet is received by a network interface circuit, which transfers the packet to the destination endstation or a host computer that serves the destination endstation. In the destination endstation, the packet is processed through the protocol stack in the opposite direction as in the origination endstation. During this processing the protocol headers are removed in the opposite order in which they were applied. The data portion is thus recovered and can be made available to a user, an application program, etc.
Several related packets (e.g., packets carrying data from one datagram) thus undergo substantially the same process in a serial manner (i.e., one packet at a time). The more data that must be transmitted, the more packets must be sent, with each one being separately handled and processed through the protocol stack in each direction. Naturally, the more packets that must be processed, the greater the demand placed upon an endstation""s processor. The number of packets that must be processed is affected by factors other than just the amount of data being sent in a datagram. For example, as the amount of data that can be encapsulated in a packet increases, fewer packets need to be sent. As stated above, however, a packet may have a maximum allowable size, depending on the type of network in use (e.g., the maximum transfer unit for standard Ethernet traffic is approximately 1,500 bytes). The speed of the network also affects the number of packets that a NIC may handle in a given period of time. For example, a gigabit Ethernet network operating at peak capacity may require a NIC to receive approximately 1.48 million packets per second. Thus, the number of packets to be processed through a protocol stack may place a significant burden upon a computer""s processor. The situation is exacerbated by the need to process each packet separately even though each one will be processed in a substantially similar manner.
A related problem to the disjoint processing of packets is the manner in which data is moved between xe2x80x9cuser spacexe2x80x9d (e.g., an application program""s data storage) and xe2x80x9csystem spacexe2x80x9d (e.g., system memory) during data transmission and receipt. Presently, data is simply copied from one area of memory assigned to a user or application program into another area of memory dedicated to the processor""s use. Because each portion of a datagram that is transmitted in a packet may be copied separately (e.g., one byte at a time), there is a nontrivial amount of processor time required and frequent transfers can consume a large amount of the memory bus"" bandwidth. Illustratively, each byte of data in a packet received from the network may be read from the system space and written to the user space in a separate copy operation, and vice versa for data transmitted over the network. Although system space generally provides a protected memory area (e.g., protected from manipulation by user programs), the copy operation does nothing of value when seen from the point of view of a network interface circuit. Instead, it risks over-burdening the host processor and retarding its ability to rapidly accept additional network traffic from the NIC. Copying each packet""s data separately can therefore be very inefficient, particularly in a high-speed network environment.
In addition to the inefficient transfer of data (e.g., one packet""s data at a time), the processing of headers from packets received from a network is also inefficient. Each packet carrying part of a single datagram generally has the same protocol headers (e.g., Ethernet, IP and TCP), although there may be some variation in the values within the packets"" headers for a particular protocol. Each packet, however, is individually processed through the same protocol stack, thus requiring multiple repetitions of identical operations for related packets. Successively processing unrelated packets through different protocol stacks will likely be much less efficient than progressively processing a number of related packets through one protocol stack at a time.
Another basic problem concerning the interaction between present network interface circuits and host computer systems is that the combination often fails to capitalize on the increased processor resources that are available in multi-processor computer systems. In other words, present attempts to distribute the processing of network packets (e.g., through a protocol stack) among a number of protocols in an efficient manner are generally ineffective. In particular, the performance of present NICs does not come close to the expected or desired linear performance gains one may expect to realize from the availability of multiple processors. In some multi-processor systems, little improvement in the processing of network traffic is realized from the use of more than 4-6 processors, for example.
In addition, the rate at which packets are transferred from a network interface circuit to a host computer or other communication device may fail to keep pace with the rate of packet arrival at the network interface. One element or another of the host computer (e.g., a memory bus, a processor) may be over-burdened or otherwise unable to accept packets with sufficient alacrity. In this event one or more packets may be dropped or discarded. Dropping packets may cause a network entity to re-transmit some traffic and, if too many packets are dropped, a network connection may require re-initialization. Further, dropping one packet or type of packet instead of another may make a significant difference in overall network traffic. If, for example, a control packet is dropped, the corresponding network connection may be severely affected and may do little to alleviate the packet saturation of the network interface circuit because of the typically small size of a control packet. Therefore, unless the dropping of packets is performed in a manner that distributes the effect among many network connections or that makes allowance for certain types of packets, network traffic may be degraded more than necessary.
Thus, present NICs fail to provide adequate performance to interconnect today""s high-end computer systems and high-speed networks. In addition, a network interface circuit that cannot make allowance for an over-burdened host computer may degrade the computer""s performance.
In one embodiment of the invention a system and method are provided for transferring a packet, received from a network, to a host computer. In this embodiment, data from multiple packets in a communication flow may be re-assembled in one memory area, thus allowing efficient transfer to a user or program""s memory space.
A communication device such as a high performance network interface of a host computer receives a packet from a network. From information in a header portion of the packet a flow key is generated to identify a communication flow that includes the packet. Illustratively, the flow key identifies the source and destination entities exchanging the packet or a connection between the entities. In one embodiment of the invention the flow key and, possibly, other information concerning the flow, is stored in a flow database. Thus, in this embodiment a flow may be identified by its flow key or a flow number that serves as an index of the flow within the database.
The packet is stored in a packet memory, such as a queue, to await transfer to the host computer by a transfer engine. The packet""s flow number or flow key, and possibly other information concerning the packet, is stored in another memory. In particular, an operation code that informs the transfer engine whether the packet""s data should or should not be re-assembled with other data from the packet""s flow, is stored. An operation code may also inform the transfer engine how to process the packet and the type of host memory area or buffer in which to store the packet for the host computer. The operation code may be generated or assigned by module that maintains the flow database.
The network interface may be configured to re-assemble only packets that are formatted in accordance with one or more of a set of pre-selected protocols. For example, where the network from which the packet is received is the Internet, the network interface may be configured for packets adhering to the Internet Protocol and the Transport Control Protocol. A header parser module of the network interface may examine a header portion of a packet to determine if it is compatible with (e.g., reflects) the pre-selected protocols.
In one embodiment of the invention different types of host memory areas or buffers are used to store different types of packets. A re-assembly buffer may be used to re-assemble data from multiple packets of a single communication flow. Collecting multiple data portions in a single buffer, illustratively of memory page size, allows efficient transfer of the data to a destination application or user. Other than re-assembly buffers, header/small buffers may be used to store header portions of re-assembled packets as well as entire packets that are less than a predetermined size (e.g., 256 bytes). Thus, a packet eligible for re-assembly may be stored across two buffersxe2x80x94its data in a re-assembly buffer and its header in a header buffer. Also, one or more non-re-assembly buffers may be used to store packets that are not being re-assembled.
The transfer engine detects a packet stored in the packet memory and fetches information concerning the packet, such as its operation code and flow number. Based on the operation code, the transfer engine stores the packet in an appropriate host memory buffer.
In one embodiment of the invention, the network interface learns of available host memory buffers by retrieving buffer identifiers from descriptors in a free descriptor ring. In this embodiment, a free descriptor references (e.g., contains a pointer to) one empty buffer and may be identified by its index within the free descriptor ring or a position within a separate data structure that contains references to multiple free descriptors. Thus, when a host memory buffer is needed the transfer engine may identify or retrieve a descriptor and follow its reference to locate the buffer.
When a packet is stored in a buffer, a completion descriptor in a completion descriptor ring is configured to convey relevant information concerning the packet to the host computer or to software operating on the host computer. This information may include a mechanism, such as an index or reference within a table or other data structure, for identifying the buffer(s) in which the packet was stored. The host computer can then locate the buffer(s), and the packet, by examining the provided index. The completion descriptor may, in particular, store one or both of a header index, to identify the buffer that contains a header portion of the packet, and a data index, to identify the buffer that contains a data portion of the packet. When the entire packet is stored in one buffer, only one index need be provided. The transfer engine may also provide the host computer with an offset in a buffer at which a packet""s header and/or data portion begins. Other information, such as the size of a packet""s header or data portion, an index identifying a buffer that stores a second portion of a packet""s data, and various flags may also be stored in a completion descriptor.
The transfer engine may, in a present embodiment, notify the host computer through a completion descriptor of the pending transfer of another packet in the same communication flow as a packet just transferred. Upon such notification, the host computer may delay processing the transferred packet (e.g., through its protocol stack) in the interest of efficiency, so that multiple flow packets may be processed collectively. The network interface may include a packet batching module to determine when another packet in the same flow as a transferred packet is available.
In one embodiment of the invention, the transfer engine releases a completion descriptor, or returns ownership of it to the host computer, by changing a specified field in the descriptor. Illustratively, if some value other than a predetermined value (e.g., zero) is stored in the field, the host computer recognizes that the transfer engine no longer needs the descriptor. Alternatively, the transfer engine may alert the host computer in some other manner.
In an alternative embodiment of the invention, one or more optimizations may be employed. In one optimization, pad bytes are inserted into a buffer (e.g., a header/small buffer or a non-re-assembly buffer) before a header portion of a packet in order to align the layer three header with a sixteen-byte boundary for efficient memory transfer. In another optimization, free and/or completion descriptors may be of such sizes that multiple descriptors may be transferred between the network interface and the host computer in one stage. For example, where a cache line of a host computer is sixty-four bytes wide, a free descriptor may be sixteen bytes and a completion descriptor may be thirty-two bytes in size.