1. Field of the Invention
This invention generally relates to the processing of network communications and, more particularly, to a system and method for using hardware circuitry to perform Ethernet packet forwarding functions.
2. Description of the Related Art
As noted in Wikipedia, in packet routing, the control plane is the part of the router architecture that is concerned with drawing the network map, or the information in a (possibly augmented) routing table that defines what to do with incoming packets. Control plane functions, such as participating in routing protocols, run in the architectural control element. In most cases, the routing table contains a list of destination addresses and the outgoing interface(s) associated with them. Control plane logic also can define certain packets to be discarded, as well as preferential treatment of certain packets for which a high quality of service is defined by such mechanisms as differentiated services.
Depending on the specific router implementation, there may be a separate forwarding information base that is populated (i.e., loaded) by the control plane, but used by the forwarding plane to look up packets, at very high speed, and decide how to handle them.
A major function of the control plane is deciding which routes go into the main routing table. “Main” refers to the table that holds the unicast routes that are active. Multicast routing may require an additional routing table for multicast routes. Several routing protocols e.g. open-shortest-path-first (OSPF) and border gateway protocol (BGP) maintain internal data bases of candidate routes which are promoted when a route fails or when a routing policy is changed.
Several different information sources may provide information about a route to a given destination, but the router must select the “best” route to install into the routing table. In some cases, there may be multiple routes of equal “quality”, and the router may install all of them and load-share across them.
There are three general sources of routing information: Information on the status of directly connected hardware and software-defined interfaces; manually configured static routes; and, information from (dynamic) routing protocols.
Routers forward traffic that enters on an input interface and leaves on an output interface, subject to filtering and other local rules. While routers usually forward from one physical (e.g., Ethernet, serial) to another physical interface, it is also possible to define multiple logical interfaces on a physical interface. A physical Ethernet interface, for example, can have logical interfaces in several virtual Local Area Networks (LANs) defined by IEEE 802.1Q VLAN headers.
When an interface has an address configured in a subnet, such as 192.0.2.1 in the 192.0.2.0/24 (i.e., subnet mask 255.255.255.0) subnet, and that interface is considered “up” by the router, the router thus has a directly connected route to 192.0.2.0/24. If a routing protocol offered another router's route to that same subnet, the routing table installation software will normally ignore the dynamic route and prefer the directly connected route.
There also may be software-only interfaces on the router, which it treats as if they were locally connected. For example, most implementations have a “null” software-defined interface. Packets having this interface as a next hop will be discarded, which can be a very efficient way to filter traffic. Routers usually can route traffic faster than they can examine it and compare it to filters, so, if the criterion for discarding is the packet's destination address, “blackholing” the traffic will be more efficient than explicit filters.
Other software defined interfaces that are treated as directly connected, as long as they are active, are interfaces associated with tunneling protocols such as generic routing encapsulation (GRE) or Multi-Protocol Label Switching (MPLS).
Router configuration rules may contain static routes. A static route minimally has a destination address, a prefix length or subnet mask, and a definition where to send packets for the route. That definition can refer to a local interface on the router, or a next-hop address that could be on the far end of a subnet to which the router is connected. The next-hop address could also be on a subnet that is directly connected, and, before the router can determine if the static route is usable, it must do a recursive lookup of the next hop address in the local routing table. If the next-hop address is reachable, the static route is usable, but if the next-hop is unreachable, the route is ignored.
Static routes also may have preference factors used to select the best static route to the same destination. One application is called a floating static route, where the static route is less preferred than a route from any routing protocol. The static route, which might use a dialup link or other slow medium, activates only when the dynamic routing protocol(s) cannot provide a route to the destination.
Static routes that are more preferred than any dynamic route also can be very useful, especially when using traffic engineering principles to make certain traffic go over a specific path with an engineered quality of service.
See forwarding plane explanation below for more detail, but each implementation has its own means of updating the forwarding information base (FIB) with new routes installed in the routing table. If the FIB is in one-to-one correspondence with the routing information base (RIB), the new route is installed in the FIB after it is in the RIB. If the FIB is smaller than the RIB, and the FIB uses a hash table or other data structure that does not easily update, the existing FIB might be invalidated and replaced with a new one computed from the updated RIB.
In routing, the forwarding plane, sometimes called the data plane, defines the part of the router architecture that decides what to do with packets arriving on an inbound interface. Most commonly, it refers to a table in which the router looks up the destination address of the incoming packet and retrieves the information necessary to determine the path from the receiving element, through the internal forwarding fabric of the router, and to the proper outgoing interface(s). The IP Multimedia Subsystem architecture uses the term transport plane to describe a function roughly equivalent to the routing control plane.
The table also might specify that the packet is discarded. In some cases, the router will return an Internet Control Message Protocol (ICMP) “destination unreachable” or other appropriate code. Some security policies, however, dictate that the router should be programmed to drop the packet silently. By dropping filtered packets silently, a potential attacker does not become aware of a target that is being protected.
The incoming forwarding element will also decrement the time-to-live (TTL) field of the packet, and, if the new value is zero, discard the packet. While the Internet Protocol (IP) specification indicates that an Internet Control Message Protocol (ICMP) “TTL exceeded” message should be sent to the originator of the packet i.e., the node with the source address in the packet), routers may be programmed to drop the packet silently.
Depending on the specific router implementation, the table in which the destination address is looked up could be the routing table (also known as the routing information base, or a separate forwarding information base that is populated (i.e., loaded) by the routing control plane, but used by the forwarding plane to look up packets, at very high speed, and decide how to handle them. Before or after examining the destination, other tables may be consulted to make decisions to drop the packet based on other characteristics, such as the source address, the IP protocol identifier field, or Transmission Control Protocol (TCP) or User Datagram Protocol (UDP) port number.
Forwarding plane functions run in the forwarding element. High-performance routers often have multiple distributed forwarding elements, so that the router increases performance with parallel processing.
The outgoing interface will encapsulate the packet in the appropriate data link protocol. Depending on the router software and its configuration, functions, usually implemented at the outgoing interface, may set various packet fields, such as the DSCP field used by differentiated services.
In general, the passage from the input interface directly to an output interface, through the fabric with minimum modification at the output interface, is called the fast path of the router. If the packet needs significant processing, such as segmentation or encryption, it may go onto a slower path, which is sometimes called the services plane of the router. Service planes can make forwarding or processing decisions based on higher-layer information, such as a Web URL contained in the packet payload.
Vendors design router products for specific markets. Design of routers intended for home use, perhaps supporting several PCs and VoIP telephony, is driven by keeping the cost as low as possible. In such a router, there is no separate forwarding fabric, and there is only one active forwarding path: into the main processor and out of the main processor.
Routers for more demanding applications accept greater cost and complexity to get higher throughput in their forwarding planes. Several design factors affect router forwarding performance:                Data link layer processing and extracting the packet Decoding the packet header        Looking up the destination address in the packet header        Analyzing other fields in the packet        Sending the packet through the “fabric” interconnecting the ingress and egress interfaces        Processing and data link encapsulation at the egress interface        
Routers may have one or more processors. In a uniprocessor design, these performance parameters are affected not just by the processor speed, but by competition for the processor. Higher-performance routers invariably have multiple processing elements, which may be general-purpose processor chips or specialized application-specific integrated circuits (ASIC).
Very high performance products have multiple processing elements on each interface card. In such designs, the main, processor does not participate in forwarding, but only in control plane and management processing.
In the Internet Engineering Task Force, two working groups in the Operations & Maintenance Area deal with aspects of performance. The Interprovider Performance Measurement (IPPM) group focuses, as its name would suggest, on operational measurement of services. Performance measurements on single routers, or narrowly defined systems of routers, are the province of the Benchmarking Working Group (BMWG).
RFC 2544 is the key BMWG document. A classic RFC 2544 benchmark uses half the router's (i.e., the device under test (DUT)) ports for input of a defined load, and measures the time at which the outputs appear at the output ports.
Originally, all destinations were looked up in the RIB. Perhaps the first step in speeding routers was to have a separate RIB and FIB in main memory, with the FIB, typically with fewer entries than the RIB, being organized for fast destination lookup. In contrast, the RIB was optimized for efficient updating by routing protocols.
Early uniprocessing routers usually organized the FIB as a hash table, while the RIB might be a linked list. Depending on the implementation, the FIB might have fewer entries than the RIB, or the same number.
When routers started to have separate forwarding processors, these processors usually had far less memory than the main processor, such that the forwarding processor could hold only the most frequently used routes. On the early Cisco AGS+ and 7000, for example, the forwarding processor cache could hold approximately 1000 route entries. In an enterprise, this would often work quite well, because there were fewer than 1000 server or other popular destination subnets. Such a cache, however, was far too small for general Internet routing. Different router designs behaved in different ways when a destination was not in the cache.
A cache miss condition might result in the packet being sent back to the main processor, to be looked up in a slow path that had access to the full routing table. Depending on the router design, a cache miss might cause an update to the fast hardware cache or the fast cache in main memory. In some designs, it was most efficient to invalidate the fast cache for a cache miss, send the packet that caused the cache miss through the main processor, and then repopulate the cache with a new table that included the destination that caused the miss. This approach is similar to an operating system with virtual memory, which keeps the most recently used information in physical memory.
As memory costs went down and performance needs went up, FIBs emerged that had the same number of route entries as in the RIB, but arranged for fast lookup rather than fast update. Whenever a RIB entry changed, the router changed the corresponding FIB entry.
High-performance FIBs achieve their speed with implementation-specific combinations of specialized algorithms and hardware. Various search algorithms have been used for FIB lookup. While well-known general-purpose data structures were first used, such as hash tables, specialized algorithms, optimized for IP addresses, emerged. They include:                Binary tree        Radix tree        Four-way trie        Patricia tree [3]        
A multicore CPU architecture is commonly used to implement high-performance networking systems. These platforms facilitate the use of a software architecture in which the high-performance packet processing is performed within a fast path environment on dedicated cores, in order to maximize system throughput. A run-to-completion model minimizes OS overhead and latency.
Various forms of fast RAM and, eventually, basic content addressable memory (CAM) were used to speed lookup. CAM, while useful in layer 2 switches that needed to look up a relatively small number of fixed-length MAC addresses, had limited utility with IP addresses having variable-length routing prefixes (see Classless Inter-Domain Routing). Ternary CAM (CAM), while expensive, lends itself to variable-length prefix lookups.
One of the challenges of forwarder lookup design is to minimize the amount of specialized memory needed, and, increasingly, to minimize the power consumed by memory.
A next step in speeding routers was to have a specialized forwarding processor separate from the main processor. There was still a single path, but forwarding no longer had to compete with control in a single processor. The fast routing processor typically had a small FIB, with hardware memory (e.g., static random access memory (SRAM)) faster and more expensive than the FIB in main memory. Main memory was generally dynamic random access memory (DRAM).
Next, routers began to have multiple forwarding elements that communicated through a high-speed shared bus or through a shared memory. Eventually, the shared resource became a bottleneck, with the limit of shared bus speed being roughly 2 million packets per second (Mpps). Crossbar fabrics broke through this bottleneck.
As forwarding bandwidth increased, even with the elimination of cache miss overhead, the shared paths limited throughput. While a router might have 16 forwarding engines, if there was a single bus, only one packet transfer at a time was possible. There were some special cases where a forwarding engine might find that the output interface was one of the logical or physical interfaces present on the forwarder card, such that the packet flow was totally inside the forwarder. It was often easier, however, even in this special case, to send the packet out the bus and receive it from the bus.
While some designs experimented with multiple shared buses, the eventual approach was to adapt the crossbar switch model from telephone switches, in which every forwarding engine had a hardware path to every other forwarding engine. With a small number of forwarding engines, crossbar forwarding fabrics are practical and efficient for high-performance routing. There are multistage designs for crossbar systems, such as Clos networks.
A radix or Patricia (Practical Algorithm To Retrieve Information Coded in Alphanumeric) trie is a space-optimized data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of characters as well as single characters. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes.
It supports the following main Operations, all of which are O(k), where k is the maximum length of all strings in the set:
Lookup: Determines if a string is in the set. This operation is identical to tries except that some edges consume multiple characters;
Insert: Add a string to the trie. The trie is searched until no further progress can be made. At this point either a new outgoing edge is added, labeled with all remaining characters in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, it is split into two edges (the first labeled with the common prefix) and proceed. This splitting step ensures that no node has more children than there are possible string characters;
Delete: Delete a string from the trie. First, the corresponding leaf is deleted. Then, if its parent only has one child remaining, the parent is deleted and the two incident edges are merged;
Find predecessor: Locates the largest string less than a given string, by lexicographic order;
Find successor: Locates the smallest string greater than a given string, by lexicographic order.
Radix trees are useful for constructing associative arrays with keys that can be expressed as strings. They find particular application in the area of IP routing, where the ability to contain large ranges of values with a few exceptions is particularly suited to the hierarchical organization of IP addresses. They are also used for inverted indexes of text documents in information retrieval.
In computer science, an AVL (Adelson-Velskii and Landis) tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.
The balance factor of a node is the height of its left subtree minus the height of its right subtree (sometimes opposite) and a node with balance factor 1, 0, or −1 is considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance factor is either stored directly at each node or computed from the heights of the subtrees.
Basic operations of an AVL tree involve carrying out the same actions as would be carried out on an unbalanced binary search tree, but modifications are preceded or followed by one or more operations called tree rotations, which help to restore the height balance of the subtrees.
Routing is the process of selecting paths in a network along which to send network traffic. Routing is performed for many kinds of networks, including the telephone network (Circuit switching), electronic data networks (such as the Internet), and transportation networks. In packet switching networks, routing directs packet forwarding, the transit of logically addressed packets from their source toward their ultimate destination through intermediate nodes, typically hardware devices called routers, bridges, gateways, firewalls, or switches. General-purpose computers can also forward packets and perform routing, though they are not specialized hardware and may suffer from limited performance. The routing process usually directs forwarding on the basis of routing tables which maintain a record of the routes to various network destinations. Thus, constructing routing tables, which are held in the router's memory, is very important for efficient routing. Most routing algorithms use only one network path at a time, but multipath routing techniques enable the use of multiple alternative paths.
A routing table, or Routing information Base (RIB), is a data table stored in a router or a networked computer that lists the routes to particular network destinations, and in some cases, metrics (distances) associated with those routes. The routing table contains information about the topology of the network immediately around it. The construction of routing tables is the primary goal of routing protocols. Static routes are entries made in a routing table by non-automatic means and which are fixed rather than being the result of some network topology ‘discovery’ procedure.
A routing table utilizes the same idea that one does when using a map in package delivery. Whenever a node needs to send data to another node on a network, it must know where to send it, first. If the node cannot directly connect to the destination node, it has to send it via other nodes along a proper route to the destination node. Most nodes do not try to figure out which route(s) might work; instead, a node will send an IP packet to a gateway in the LAN, which then decides how to route the “package” of data to the correct destination. Each gateway needs to keep track of which way to deliver various packages of data, and for this it uses a routing table. A routing table is a database which keeps track of paths, like a map, and allows the gateway to provide this information to the node requesting the information.
The routing table consists of at least three information fields:
1. the network id: i.e. the destination network id.
2. cost: i.e. the cost or metric of the path through which the packet is to be sent.
3. next hop: The next hop, or gateway, is the address of the next station to which the packet is to be sent on the way to its final destination.
Depending on the application and implementation, it can also contain additional values that refine path selection:
1. quality of service associated with the route. For example, the U flag indicates that an IP route is up.
2. links to filtering criteria/access lists associated with the route.
3. interface: such as eth0 for the first Ethernet card, eth1 for the second Ethernet card, etc.
Shown below is an example of what the table above could look like on an average computer connected to the Internet via a home router:
Network Destination NetmaskGatewayInterface Metric0.0.0.00.0.0.0192.168.0.1 192.168.0.10010127.0.0.0255.0.0.0127.0.0.1127.0.0.1 1192.168.0.0255.255.255.0192.168.0.100192.168.0.10010192.168.0.100255.255.255.255127.0.0.1127.0.0.110192.168.0.255255.255.255.255192.168.0.100192.168.0.10010
The columns Network Destination and Netmask together describe the Network id as mentioned earlier. For example, destination 192.168.0.0 and netmask 255.255.255.0 can be written as network id 192.168.0.0/24. The Gateway column contains the same information as the Next hop, i.e., it points to the gateway through which the network can be reached. The interface indicates what locally available interface is responsible for reaching the gateway. In this example, gateway 192.168.0.1 (the Internet router) can be reached through the local network card with address 192.168.0.100.
Finally, the Metric indicates the associated cost of using the indicated route. This is useful for determining the efficiency of a certain route from two points in a network. In this example, it is more efficient to communicate with the computer itself through the use of address 127.0.0.1 (called “localhost”) than it would be through 192.168.0.100 (the IP address of the local network card).
Routing tables are generally not used directly for packet forwarding in modern router architectures; instead, they are used to generate the information for a smaller forwarding table. A forwarding table contains only the routes which are chosen by the routing algorithm as preferred routes for packet forwarding. It is often in a compressed or pre-compiled format that is optimized for hardware storage and lookup.
Internet Protocol version 4 (IPv4) is the fourth revision in the development of the Internet Protocol (IP) and the first version of the protocol to be widely deployed. IPv4 is a connectionless protocol for use on packet-switched Link Layer networks (e.g., Ethernet). It operates on a best effort delivery model, in that it does not guarantee delivery, nor does it assure proper sequencing or avoidance of duplicate delivery. These aspects, including data integrity, are addressed by an upper layer transport protocol such as the Transmission Control Protocol (TCP).
One current problem is the achievement of high data rates in IPv4 packet forwarding in medium scale (Enterprise level routers), where the entire IPv4 packet forwarding is performed by a processor enabled using Ethernet driver packet forwarding software, such as might be found in a Linux operating system (OS). The acceleration of IPv4 forwarding can be done using dedicated network processors, but it adds too much system cost and complexity, which is not acceptable in medium scale routers.
In IPv4 packet processing, there are two major operations involved: routing and forwarding. The entire IPv4 packet forwarding task can be divided into three subtasks: Input processing; Forwarding; and, Output processing.
Input processing: In this task, memory is allocated for the packet and the packet is received in the system. Then the packet is examined for its validity. If the packet is not valid, it is dropped, packet memory is freed, and statistics are updated.
Forwarding: In this task, the packet's header is used for lookup into the routing table. If the packet is sinking (terminating in the system), it is simply delivered to higher level for further packet processing. But if the packet is not sinking in the system, then the routing table is used for lookup. If the valid route is found, the packet is prepared for output processing with the appropriate interface information. If the route is not found, the packet is dropped, packet memory is freed, and an Internet Control Message Protocol (ICMP) route not found packet is sent to the source.
Output processing: In this task, some packet modifications are done. For example, appropriate MAC addresses in L2 header are modified. In the case of IPv4, the header is modified by computing a new time to live (TTL). Then, the packet is sent out on appropriate interface. After the packet is sent out of the physical interface, packet memory is de-allocated.
The routing is the control path, while the forwarding is the data path. As noted above, routing is used to create routes between nodes in the system. There are numerous algorithms available and implemented in software for routing, including but not limited to border gateway protocol (BOP) and open-shortest-path-first (OSPF), to name a few. The routes can also be created statically and permanently using ‘route’ utility in Ethernet driver software. The majority of routing tasks are done in software, even in today's systems. Operating systems like Linux have an IP forwarding and routing infrastructure, and routing demons keep track of adding/removing routes from the routing table.
The critical and latency sensitive task is the forwarding, which is called a data path. Once the routes have been established, the packets need to be forwarded to appropriate interfaces using these routes. When the packet ingresses into the system, it is handed to the software-driven IP forwarding stack. The IP forwarding stack looks up the routes and forwards the packet accordingly. These tasks are typically performed in hundreds of clock cycles. There is inherent latency involved in fetching each instructions and related data in software. The data rate is significantly lower when number of packets/second increases.
Alternatively, routing tasks can be performed by dedicated hardware devices like network processors, enabled with a narrowly focused microcode, to forward the packets. However, this approach means that there are two different processors in the system. The control plane central processing unit (CPU) controls the routes and is enabled by a Linux-like OS to implement all the control plane functionalities. The second, network processor, just manages the run-to-completion forwarding threads. It increases the system cost significantly to have two separate processors. Apart from that, it is also very difficult to program network processors as they have their own microcode architecture and don't run with C like high level language code. It is also not easy to synchronize control plane routing tables with data path forwarding, as they are two separate entities.
It would be advantageous if packet routing could be implemented to rely less upon the control plane microprocessor, without the use of an additional microprocessor to control the forwarding threads.