1. The Field of the Invention
This invention relates generally to network data routing. More specifically, the present invention concentrates on improving two aspects of data routing. The first improvement is utilizing a data packet steering mechanism that parses elements of the packet header in parallel and in advance of the body of the packet, thereby increasing efficiency and reducing latency and processor overhead. The second and related improvement is a FIFO-based packet memory management system that provides greater flexibility and control over network packet transmission.
2. The State of the Art
The state of the art in high speed data access on computer networks has in large part been driven by exponential growth in the Internet and e-commerce. Furthermore, as computers become more powerful, applications are always being developed which take advantage of any increase in computer performance. Often, these applications utilize networks, both local and global.
It is becoming increasingly important to keep pace with the increased demands for network services by the general public. This can be accomplished by removing the bottlenecks that inhibit data transfer across computer networks because the thirst for increased bandwidth is ever present. Internet users are becoming ubiquitous as home users and businesses tap into the resources of the information superhighway. Electronic mail, which is fast becoming the preferred method of communication in business as well as in the private sector, and new business models, such as the Virtual Office, rely on computer networks for their very existence. In essence, the demand for computer networking connectivity and bandwidth is large, and growing larger all the time.
In an effort to keep up with increasing network connectivity and bandwidth demands, makers of networking hardware and software, as well as the Information Services (IS) managers that operate computer networks are continually looking for ways to improve network connectivity and bandwidth, while reducing network traffic latency.
Increasingly, computer networks are being called upon to carry time-critical telecommunications and video data streams. Guaranteed bandwidth to residential communications ports that carry voice, video and data has increased from tens of kilobits/second to Megabits/second levels. Commercial communications bandwidth has increased to several Megabits/second guaranteed bandwidth per port. However, the infrastructure that enables Wide and Local Area Networks to operate is comprised of installed network gear that is running industry standard network protocols that are not well-suited for the performance demands of time-critical, latency-intolerant network traffic such as voice and video. The reason for this is that the traditional approach to providing connectivity and bandwidth in today's computer networks is based on packet-switched protocols.
FIG. 1 is an illustration of how a time-critical application is typically integrated into the transmit path for network traffic in a traditional, packet switched computer network environment. Within an operating system 110, executing on a packet-switched network host computer, a time critical application 13 takes analog data 111, such as a voice or video data stream that has been digitized by an Analog-to-Digital (A/D) converter 114, and places it in an application data buffer 115 in system memory. The time critical application 113 competes with other network applications 112 for the network protocol stack 116 and other system resources to thereby establish a network connection and to process the data stream into packets.
The packets from the time critical application 113 are disposed in system packet data buffers 118 along with packets from a number of other data streams and pointed to by a linked list of packet descriptors 117. A direct memory access (DMA) engine 119 located on a network interface card (NIC) 123 follows the linked list of packet descriptors 117 in order to find and move the packet data from the appropriate packet buffer in the system packet data buffers 118 to a packet first-in-first-out FIFO buffer 122 on the network interface card 123. The packet data is then moved sequentially in the order in which it was received to the Media Access Control (MAC) interface 120. The MAC interface 120 translates the digital packet data into network signals 121 to be transmitted on the network physical interface. The receive path is essentially the transmit path shown in FIG. 1, but operated in the reverse direction.
In order for a real-time conferencing application to be perceived as good, it must have less than 200 ms of latency (time from first analog capture to final display), less than 20 ms of jitter (the relative time difference between individual packet delivery), and sufficient bandwidth to maintain frame rate and resolution in real-time.
While the prior art architecture shown in FIG. 1 is used with some success in legacy networks for providing the bandwidth and latency requirements of timing-insensitive computer data traffic, several aspects of this architecture make it ill-suited for the low latency and strict timing requirements of video or even voice data.
There are several obstacles to meeting the requirements of real-time network traffic. In particular, the serial nature of the packet FIFO 122 on the NIC 123 gives no priority to time-critical packets. The system also has non-deterministic latency, thus introducing jitter.
What is needed is a means for enabling the NIC to have more flexibility and control in how it schedules packets for transmission, based on Quality of Service (QOS) parameters.
There is another issue involved here as well. Data on networks travels in packets, or bundles of data, where the packet is generally of variable length. Packetization of data enables the isolation of different protocols so that data can be transmitted and received through disparate types of networks without regard to the content of the information being transmitted.
In data networks, it is necessary to direct, or steer, data packets from node to node within a network or within a network traffic device such as a router. As packet traffic arrives at a node, key decisions have to be made about whether to accept or reject the packet, where the packet must be routed for further processing, and so forth.
The node device that accepts incoming data is the MAC. It connects the network router or switch to the network cable or fiber and converts the packet traffic into useable data for the hardware and software within the network node, or controller.
In prior art store and forward network controllers, a packet is fully received by a MAC and stored in a buffer where it can then be accessed by a processor or moved to other memory accessed by a processor or moved to other memory accessed by a processor. The processor examines the packet to parse or extract the information necessary to route the packet. This is generally done because the cyclic redundancy error checking (CRC) bytes are at the very end of the packet. Table 1 is provided to illustrate how data is stored in a typical Ethernet IP packet that arrives at a MAC port.
TABLE 1FrameData (payload)CheckPreambleEthernet HeaderIP HeaderAdditional DataSequence8 bytes6 bytes6 bytes2 bytes20 bytesUp to 1480 bytes4 bytesClock bits &DestinationSourceIP Routing,TCP Headers &CRCSFDAddress (DA)Address (SA)ID, Type,Higher LevelOther DataData
The preamble contains 62 bits of alternating 1's and 0's, used by the receiver to acquire and synchronize with the incoming signal. The final 2 bits known as the start of frame delimiter (SFD), are consecutive 1 bits that are used by the hardware to align the bytes.
The header information in a generic Ethernet packet, the source address (SA), destination address (DA), and protocol or length information, is contained in the first 14 bytes of the packet following the preamble, regardless of the length of the payload or higher level data carried by the packet.
The data or payload bytes, contain whatever higher level data is being passed across the network. This data is generally independent of all processing manipulation at this level. Certain types of packets, however, contain useful routing information at the beginning of the payload. The inventors believe that it may be fruitful to examine the payload further. For example, in all TCP/IP packets, important routing information is contained in the first 50 bytes of the payload region of the packet.
By convention, routing information for all normally used network protocols is contained in the first 50 bytes. Therefore, it is generally recognized that the first 64 bytes of an Ethernet packet will contain all the critical information necessary to route the packet. The data portion of a generic Ethernet packet may not exceed 1500 bytes in length. In the TCP/IP packet shown, the first 20 bytes comprise the IP header, shown separately. Additionally, 20 bytes of TCP header information are contained in the Data field. This leaves up to 1450 bytes for other data. Finally, the Frame Check Sequence (FCS) contains 4 bytes of CRC error checking data to help ensure the packet is not corrupt or malformed.
FIG. 2 shows that the decisions that need to be made in the parsing of the packet header can be represented as a tree. In this figure, each circle represents a decision outcome 10. Within each decision circle, the selected field identifier 12 is shown, and below that, the position of the bytes 14 that represent that field within the packet.
As packet header information becomes available, the first field examined is the 6-byte destination address (DA) 16. From this, decisions must be made about where the packet must go. In other words, there are multiple outcomes 18, 20, 22 possible that are based on the data in the DA field 16. For instance, it must be determined whether this is the current device's address, a broadcast address 18, or the address of some other destination 20.
Assuming the packet is intended to remain and be processed further, the next processing step examines the 6-byte source address (SA) 22 to determine if, for instance, this packet is from a port that the current device is accepting data from. Again, several outcomes are 24, 26 are possible.
The next field, bytes 13 and 14, identifies the protocol type 26. Again, the tree may branch in many different ways, whether the packet is IP 28, IPX 30, AppleTalk or some other network protocol 32. Each of these, again, will have multiple branching possibilities 34, 36, 38. In the example, the packet is an IP protocol packet 34.
The next decision is to determine what kind of IP packet it is. The options include TCP 34, UDP, ARP 113, etc. This data is found in byte 9 of the IP Header. Following FIG. 2 to the bottom, this TCP-style (determined by bytes 35 to 64) IP packet contains a variable-length URL 40, or world wide web address.
FIG. 3 is provided to show the manner in which this packet processing is normally implemented in hardware and is illustrated in a block diagram. The packet arrives 52 in an elastic buffer 54, a part of the MAC 50, which serves as a clock matching device, collecting bytes of the incoming packet and matching the bit rate of the incoming packet to the speed of the hardware. Sometimes, the DA is checked at this point and a decision is made whether to continue accepting the packet. The packet is then moved to a larger buffer 56, which can either be system memory 58 or a FIFO buffer 56 connected to system memory. Once the entire packet is in system memory, it is parsed by the processor and steering information is extracted.
Once the packet in system memory 58 has been parsed, it is routed 64 to the appropriate output port. The routing mechanism 62 may or may not be memory based. If it is not memory-based, the system memory 58 must be large enough to hold outgoing packets and accommodate latencies and any blockages that occur in the output data flow.
In a processor-based system, the CPU 60 is responsible for parsing the packet headers to derive routing information. This is normally done by comparing the incoming data to a series of known outcomes, or criteria. If there is a match, a jump occurs to enable the appropriate routing. If not, a jump to another comparison process occurs, and then another, until a match is found or it is determined that there is an error.
FIG. 2 outlines the flow of one example in the decision tree. There is a subset of destination addresses that the processor is programmed to act upon 18, 20, 22. For example, there may be 10 DA's that this particular device needs to know about. Thus we may only achieve a 10:1 ratio of “first compare” successes. Usually, designers will apply weightings to the multitude of compares that must be performed. Statistical analysis is employed to prioritize the order of testing. This prioritization is based on the likelihood of a given outcome. In other words, the most commonly expected outcomes, and thus the most common data paths, will be tested first.
In this example, three possible outcomes are shown 18, 20, 22 and a successful SA compare 22 forms the basis of the next compare. We assume that all source addresses are valid and acceptable, a 1:1 hit ratio.
The next comparison to be made originates with the protocol field 24. It may have a 3:1 or better hit ratio. Since IP is the most common protocol encountered, it is tested first, followed by IPX 26, then AppleTalk and others 28. Since this is a 16-bit field, there are actually 65,536 possibilities. Designers typically optimize for the top 32 outcomes based on statistical analysis of the expected traffic, yielding a fairly reliable worst-case 32:1 ratio.
Assume that there is a match on TCP. The next comparison examines the TCP header 30 and asks what kind of TCP data is being carried, in this case a URL 36, or address of a world wide web site. This is where the tree tends to spread widely. At the URL level, for example, there may be another 256 authorized outcomes, a worst-case hit ratio of 256:1 where there may not be significant statistical weighting to optimize performance.
The sequential nature of this traditional parsing mechanism is inefficient. If this process is managed by a best-case high-speed RISC processor, for instance, one that can do a compare and branch in a single clock cycle, it may take hundreds or even thousands of cycles to branch all the way down this decision tree when a worst-case packet arrives.
A worst-case packet would be one where each decision point results in the least-likely outcome, in other words the last one tested. After testing all of the pre-determined outcomes, the packet is passed off to the processor for further consideration. If a stream of worst-case packets arrives, the processor is likely to fall behind. This risk is reduced, but not eliminated, by statistical analysis of the anticipated traffic and weighting of the various possible outcomes.
The burgeoning number of diverse applications running over today's data networks complicates this scenario. Digital video, voice over IP (VOIP), and other converging uses of data networks logarithmically increases the complexity and sheer number of decisions that must be made to manage network traffic. As new and different kinds of data traffic compete for routing and QOS resources, prioritization becomes more difficult.
What is needed is a way to traverse a very complex decision tree quickly and efficiently, without incurring long delays that may be encountered while a buffer receives the rest of a packet. The method should also not require extremely fast sequential processing in order to keep up with minimum length packets.
As payload size, and this packet size, decreases, the overhead associated with that payload increases. The worst-case scenario in an Ethernet environment is a continuous stream of minimum-size packets. A minimum size packet would consist of the 64-byte header, a preamble of 8 bytes, a CRC of 4 bytes, and an interframe gap of 12 bytes, for a total of 88 bytes. To achieve wire-speed performance, this small packet must be fully processed in time for the hardware to receive the next packet. In a Gigabit environment, this must be within a time window approximately 700 nanoseconds long.
Accordingly, what is needed is a system whereby critical routing decisions can be made without the need to store the entire packet prior to processing.