1. Field of Invention
The present invention relates generally to the packet based data communications field, and more particularly, relates to a network processor that includes a shared data buffer between the user interface, network interface, and processor. The invention also includes protocol aware logic that monitors the data being written into the shared buffer.
2. Description of Related Art
Internet and Intranet traffic typically consist of four different types of packets, ARP, ICMP, UDP, and TCP (see FIG. 1). When sent over an Ethernet link, these packets are embedded inside an Ethernet packet. All these packets have an Ethernet header 100, one or more protocol headers 121, and a data payload 106. The most common protocol is the Internet Protocol (IP), which has two sub protocols: transmission control protocol (TCP), and user datagram protocol (UDP). An IP packet 122, consists of an IP header 102, TCP header 104 (or UDP header 110 or an ICMP header 112), and a data payload 106. TCP is used for breaking up the data to be sent into datagrams, reassembling the datagrams at the other end, resending any datagrams that are lost, and finally re-assembling them in the correct order. UDP is a much simpler protocol that also breaks up the data into datagrams but does not have all the functionality of TCP. The overlying protocol of both UDP and TCP is IP. IP is used for the actual routing of datagrams. The Internet control message protocol (ICMP) is sometimes used for diagnostics and communication between two nodes on the network. These messages are not used to pass user data.
The basic format of a network packet 120 that is an IP packet 122 sent under the Ethernet transport protocol over any Ethernet link is as follows: Ethernet header 100, IP header 102, a TCP header 104 (or an UDP header 110 or an ICMP header 112), data payload 106, and the Ethernet checksum 108. The protocol headers 121 for IP packets include the IP header 102, TCP header 104, UDP header 110, or ICMP header 112. Other network packets based on different combinations of protocols would appear differently but would repeat the general pattern of a series of two or more nested protocols with the transport protocol as the top (outer) layer.
For example, if the transport protocol is something other than Ethernet, then the Ethernet header 100 and the Ethernet checksum 108 would be replaced with the header and checksum (if any) for that particular transport protocol. Typically, for Ethernet and other transport protocols, the transport header is generated in the network stack (described below) and the transport checksum is generated in the network interface (described below).
To understand, the ARP packet, (shown as the fourth packet in FIG. 1), it is useful to introduce some of the basic functions the Internet protocol uses to communicate.
Every device on the Internet or an Intranet that uses Ethernet to communicate has a unique MAC identifier that is programmed into the device by the manufacturer of the device. When two points, (“nodes”), want to communicate, they must do so by an underlying protocol. Most frequently, this will be Internet Protocol or IP. IP also has addresses for addressing one or more nodes and are used for relaying traffic between two or more nodes. IP addresses come as two different varieties, public and private. Public IP addresses are unique addresses that are registered with the Network Information Center, www.internic.net. Private IP addresses are reserved IP address ranges that can never be registered, but can be used behind protective networking devices called routers. The router will have a unique public IP address that it uses to talk on the Internet with other devices.
The IP protocol allows nodes to communicate using the IP address of each node. Not all IP addresses are unique, yet all MAC addresses are unique. Thus, when two devices need to communicate, they must take their unique addresses (MAC) and bind them to the addresses used to route data on the Internet/Intranet which are not always unique. Therefore, when MAC identification is bound to the IP address, address resolution protocol (ARP) is used. ARP is not an IP packet, thus the packet consists of the Ethernet header 100, ARP header 114, and the Ethernet checksum 108. Once the IP address is bound to the MAC address through ARP messages, TCP or UDP communications can take place. Typically, a node maintains a list of IP addresses bound to the MAC address. ARP messages are sent if the node does not have a MAC address for the IP address being contacted, or when the communication to the stored IP/MAC address fails.
Referring to FIG. 2, a network nodes typically include a network interface 600, which has a network buffer 605, a DMA engine 610, a processor executing a network stack (or more simply the “network stack”) 630, at least one application 635, processor memory 625, and a user interface 620, which has a user buffer 615.
This system can be broken into three major modules: the network interface 600, processor running a network stack 630 and an application 635, and user interface 620. To allow all these modules to operate independent of each other, local buffers have been added (605, 625, and 615). When data from a first module is required in a second module, the DMA engine 610 performs a fast copy from one module to the other. Typically, there is one bus on which all three modules reside, thus the DMA engine 610 can only service one copy request at a time as the copy ties up the shared bus. (Shared bus not shown in FIG. 2 in order to avoid undue clutter in that figure).
Prior Art Processing of Packets from Network Interface
Having reference to FIG. 2 and FIG. 3, these are the steps associated with receiving a network packet 120, from the network interface 600. In order to promote the description of the process to one of skill in the art, the process is shown by a combination of interaction steps 601 to 699 on FIG. 2 and process steps 700 to 760 on FIG. 3.
STEP 601/700—As the network packet 120 is being received from the network; the network interface 600 writes the entire packet into the network buffer 605.
STEP 606/705—Once the entire network packet 120 has been received, the network interface 600, will inform the network stack 630.
STEP 611/710—The network stack 630 configures the DMA engine 610 to copy the entire network packet 120 from the network buffer 605 into the processor's memory 625.
STEP 616/621/715—The DMA engine 610 reads the network packet 120 from the network buffer 605, and writes the network packet 120 into the processor's memory 625.
STEP 626/717—The DMA engine 610 informs the network stack 630 when the copy is completed.
STEP 631/720—The network stack 630 reads the network packet 120 in the processor's memory 625 and determines which protocols are used, and if the packet is valid.
STEP 636/725—The network stack 630 must process portions of the protocol headers 121, and update the socket management data structure (not shown) stored in the processor's memory 625. (A socket is a connection between two network devices on a specific port.)
BRANCH 631/730—The network stack 630 then checks the network packet 120 stored in the processor's memory 625 to determine if the network stack 630 is the final destination, or if the data payload 106 is bound for the user's interface 620.
STEP 636/735—If the network packet 120 is to be consumed by the network stack 630, then the network packet 120 will be processed by the network stack 630, and the packet buffer will be released from the processor's memory 625.
STEP 696/737—If the network packet 120 is bound for the user interface 620, the data payload 106 is passed to the application 635.
STEP 646/740—The application 635 configures the DMA engine 610 to copy the data payload 106 of the network packet 120 from the processor's memory 625 into the user buffer 615.
STEP 651/656/745—The DMA engine 610 reads the data payload 106 from the processor's memory 625, and writes it into the user buffer 615.
STEP 661/747—The DMA engine 610 informs the application 635 when the copy is completed.
STEP 666/750—The application 635 informs the user interface 620 of a valid data payload 106 in the user buffer 615.
STEP 671/755—The user's circuitry then reads the data payload 106 in the user buffer 615 through the user interface 620.
STEP 675/760—The packet buffer is released in the user buffer 615.
Prior Art Processing of Payloads from User Interface
Having reference to FIG. 2 and FIG. 4, these are the steps associated with the prior art method of receiving a data payload 106, from the user interface 620. In order to promote the description of the process to one of skill in the art, the process is shown by a combination of interaction steps 601 to 699 on FIG. 2 and process steps 800 to 855 on FIG. 4.
STEP 675/800—The user's circuit starts by writing a data payload 106 into the user buffer 615 through the user interface 620.
STEP 681/805—When the entire data payload 106 has been written, the user interface 620 informs the application 635 that the data payload 106 for a network packet 120 is ready for transmission.
STEP 646/810—The application 635 configures the DMA engine 610 to copy the data payload 106 from the user buffer 615 into the processor—s memory 625.
STEP 686/621/812—The DMA engine 610 copies the entire data payload 106 from the user buffer 615 into the processor's memory 625.
STEP 661/815—The DMA engine 610 informs the application 635 that the copy has been completed.
STEP 697/820—The application 625 passes a pointer to the data payload 106 stored in the processor's memory 625 to the network stack 630, and informs the network stack 630 what socket to send the network packet 120 on the network.
STEP 636/825—The network stack 630 builds the protocol headers 121. When there are a series of nested protocols below the transport layer, the network stack 630 would build all of the lower layer protocol headers. The protocol headers 121 are based on the socket identified supplied by the application 625 in STEP 697/820. Thus, a socket using IP would get an appropriate IP header.
STEP 636/830—The network stack 630 calculates the checksums for each protocol below the transport layer, and writes the values into the protocol headers 121. (For purposes of this application, CRC values (cyclical redundancy check values) and are a type of checksum). (The transport layer checksum is calculated by the network interface 600.)
STEP 611/835—The network stack 630 then configures the DMA engine 610 to copy the network packet 120 from the processor'memory 625 to the network buffer 605
STEP 651/696/837—The DMA engine 610 reads the network packet 120 from the processor's buffer 625 and writes it into the network buffer 605.
STEP 626/840—The DMA engine 610 informs the network stack 630 when the copy has been completed.
STEP 698/845—The network stack 630 informs the network interface 600 that a network packet 120 is ready for transmission.
STEP 699/850—The network interface 600 reads the network packet 120 from the network buffer 605 and sends it across the network after adding the transport layer header and checksum, in this case 100 and 106. .
STEP 601/855—The network interface 600 releases the packet buffer from the network buffer 605.
As illustrated by FIGS. 2 to 4 and the associated descriptions, the prior art suffers from several drawbacks. First, large amounts of memory are required. The network interface 600, processor 630, and user interface 620 must all contain buffers, which increase system cost and complexity. Another drawback is the network packet 120 must be copied between the network interface 600, processor 630, and user interface 620, which consume time that could be used for packet processing. As network data rates increase, these copies affect system bandwidth. Typically, the network buffer 605, user buffer 616, and processor memory 625 only have one bus (not shown) to transfer data between each of the buffers. As packet transfer rates increase, this bus becomes a limiting element in bandwidth. Finally, many protocol-processing tasks are better suited for a hardware implementation that allows for parallel processing, as opposed to the current sequential methods used in a pure software implementation.
While the prior art has suggested the use of a state machine in order to address previously recognized shortcomings with the prior art, this solution does not address the issue of future protocol support. Since a state machine solution calls for the protocol processing to be handled purely in hardware, the addition of new protocols would require a new device to be built. Replacing those devices already in use becomes very difficult and expensive. Another problem with the state machine solution is special user software cannot be performed in the device. Therefore, another processor must be attached to the state machine device in order to run a user's application. Finally, the state machine solution becomes very complex and expensive for certain protocols. For example, a web server is very complex, and must be configurable to handle many different user web pages. The state machines to handle these pages would become far too large and complex to be commercially viable.
While the prior art has taught that offloading the steps of checksum generation and verification from the processor can be beneficial, this offloading by it self is not sufficient to provide a highly efficient method for handling incoming and outgoing packets. This prior art solution does not eliminate the number of buffers required, nor does it remove the need to copy the packet between each of these buffers. Likewise, this prior art solution does not allow for the protocol header to be processed in parallel with the data payload reception.
While the prior art has taught filtering for established socket connections that allow the network interface to pass the protocol headers to the network stack and the data payload to the user application, it only occurs after the protocol headers and data payload are completely received in the network buffer. Both the protocol headers and data payload must still be copied into the processor's memory. The data payload must then be copied into the user interface. This prior art solution is adapted for interfacing with a user application residing on a personal computer. This solution could also be applied to an embedded environment in which the application is replaced with a user buffer and user interface. In this case, three separate buffers are still required; the network buffer, processor memory, and user buffer. Data copies are also still required to move the packet portions between the buffers. This prior art solution also has shortcomings for packet transmissions from the user interface to the network interface. The protocol checksums are based on the data payload, and protocol headers. Since these two parts are separated in the prior art, the transmit protocol checksum generation becomes difficult.
It is an object of the present invention to provide an improved method for handling the receipt of incoming packets in order to improve the efficiency of handling incoming packets.
It is another object of the present invention to provide an improved method for handling the outgoing packets in order to improve the efficiency of handling outgoing packets.
It is a further object of the present invention to develop a method that uses a shared buffer such that data may be shared among the network interface, the user interface, or the processor without the need to be copied for each subsystem to use the data.
It is yet another object of the present invention to use “protocol aware logic” in conjunction with the write path to the shared buffer to offload a portion of the processor workload.
It is yet another object of the present invention to develop a more efficient method for processing packets by processing solely the header portion of packets and not the data payload.
It is yet another object of the present invention to develop a method that increases throughput through use of parallel processing and the avoidance of memory copies of packets.
It is yet another object of the present invention to allow certain packets to be passed from the network receiver to the user interface without the use of the processor.
These and other advantages of the present invention are apparent from the drawings and the detailed description that follows.