In data communication networks (Internet, ATM, Ethernet, token ring, and the like) address translation is an integral component of the system. In particular, there is a requirement of doing source and destination address lookups.
Using the Internet as a basis for discussion, three main factors contribute to the speed of traffic over the Internet: link speeds, router data throughput, and packet forwarding rates. Readily available solutions exist for the first two factors. For example, fiber-optics can provide faster links and switching technology can be used to move packets through a router at gigabit speeds. The present invention deals with the third factor, packet forwarding.
The most fundamental operation in any Internet Protocol (IP) routing product is the routing table search process. A packet is received with a specific destination address (DA), identified by a unique 32-bit field (in the current IP Version 4 implementation). The router must search a forwarding table using the IP DA as its key, and determine which entry in the table represents the best route for the packet to take in its journey across the network to its destination.
The search is made complex by the fact that entries in the table have variable lengths, and also that many entries may represent valid routes to the same destination. Unlike a simple search that seeks to find an exact match within a table, the routing table search algorithm must select the most specific route from a number of entries, i.e. the route that represents the longest network prefix for the given DA.
Specifically, a forwarding database in an IP router consists of a number of variable length network addresses. The length of the network address is given by a prefix length. When an IP router receives a packet, it must compute which of the network addresses in its database has the longest match when compared to the destination address in the packet. The packet is then forwarded to the output link associated with that prefix.
For example, a forwarding database may have the following network addresses (NA): NA1=0101, NA2=0101101 and NA3=010110101011. In this example the destination address is 16-bits long, the corresponding prefix mask values defines the network portion of the destination address: P1=1111000000000000, P2=1111111000000000 and P3=1111111111110000. An address whose first 12 bits are 010101101011 has longest matching prefix P1. An address whose first 12 bits are 010110101101 has longest matching prefix P2.
The route search operation is a very time-consuming operation, and typically defines the upper bound on the router's ability to forward packets. Many high speed routers augment the full-table search with a fast path process that performs an exact-match search in a small address cache, but must still default to the full-table search (the "slow path") for addresses not found in the cache.
Routing problems are escalating in view of data links now operating routinely at 600 Mbits/s (OC-12) that can generate over 1 million packets-per-second requiring routing. Further, new protocols, such as RSVP, require route selection based not only on destination address, but potentially also protocol number, source address, destination port and source port.
In addition, virtual networks, quality of service (QoS) guarantees, and layer four snooping (i.e., deriving routing criteria from layer 4, TCP, UDP etc. fields) will tend to increase the number of bits needed to be considered. Further, IP version 6 will increase the size of the address filed from 32 bits to 128 bits, with network prefixes up to 64 bits in length.
A popular algorithm used in routers is based on the Patricia Tree (Practical Algorithm to Retrieve Information Coded in Alphanumeric). The forwarding table that associates each prefix entry with an exit port and next-hop address is stored in a binary radix tree (BRT) form. The BRT is organized in a series of nodes, each of which contains a route of varying length, and each of which has two branches to subsequent nodes in the tree. At the ends of the branches are leaves, which either represent full 32-bit host routes (for devices attached directly to the router) or host-specific routes available to a particular subnet.
For example, a traditional implementation of Patricia trees for routing lookup purposes uses 24 bytes for leaves and internal nodes. With 40,000 entries (in a typical routing table), the resulting tree structure alone is almost 2 Mbytes, and in a perfectly balanced tree 15 or 16 nodes must be traversed to find a routing entry. In some cases, due to the longest matching prefix rule, additional nodes need to be traversed to find the proper routing information as it is not guaranteed that the initial search will find the proper leaf.
The Patricia Tree approach does not scale well to gigabit speeds. In particular, the data structure is too large and a worst-case lookup involves many memory accesses, taking far more time than is available to support gigabit rates.
Hashing is an alternative approach to searching that is used in some LAN switches and has been applied to routing table searches. In contrast to the Patricia Tree, hashing operates strictly on an exact-match basis, and assumes the number of IP DAs that the system must handle at any one time is limited to a few thousand. A hash function is a sort of compression algorithm that is used to condense each 32-bit host route (i.e., exact DA) in the table to a smaller-sized entry (8-10 bits typically).
Each host route maps to a unique location in a hash table, each entry in the hash table (termed a slot) corresponds to one or more host routes. The compression of hashing makes the table small enough to quickly search using simple hardware-based exact matching techniques.
When a packet is received, an equivalent hash value is computed quickly from its DA, and compared directly against all entries in the hash table. When a match is found, an exact match of the DA can be searched in the list of addresses for the hash slot. If the address is not found in either search, there is no match, and the packet must be sent to a CPU for software processing--its address must also be used to compute a new entry for the hash table.
The host-route hashing algorithm improves the average-case behaviour for small routing tables over the Patricia Tree. However, there continues to be a number of problems associated with host-route hashing. Since only host routes (i.e., full 32-bit addresses) are cached, the number of entries in the table grows linearly with the number of hosts on the network. This does not take advantage of the hierarchical address structure of IP, forcing tables to grow as rapidly as new users arrive on the Internet.
Another key disadvantage of storing host routes instead of network prefixes is the time required to update the routing table. Even a single network-level route change may require many host routes to be flushed from the table. This problem can be particularly severe for a very large hash table or in a backbone router, where multiple entries change after a routing update.
A multi-stage lookup method using a compact routing table is described in Small Forwarding Tables for Fast Routing Lookups, Degermark et al., http://www.cdt.luth.se/net/publications.html.
For the purpose of describing the small forwarding table data structure a representation of a binary tree 10 that spans the entire IP address space (Ipv4) is shown in FIG. 1. The height of the tree 10 is 32 bits, and the number of leaves is 2.sup.32, one for each possible IP address. The prefix of a routing table entry defines a path in the tree ending in a node. All IP addresses (leaves) in the subtree rooted at that node should be routed according to that routing entry. In this manner each routing table entry defines a range of IP addresses with identical routing information.
The forwarding table is a representation of the binary tree 10 spanned by all routing entries. This is called a prefix tree. The Degermark et al. data structure requires that the prefix tree be complete (i.e., that each node in the tree has either two or no children). A set of routing entries partitions the IP address into sets of IP addresses.
The binary tree 10 of FIG. 1 illustrates the three levels (16-8-8) of the data structure proposed by Degermark et al. In particular, level one of the data structure covers the prefix tree down to depth 16, level two covers depths 17 to 24, and level three depths 25-32. Wherever a part of the prefix tree extends below level 16, a level two section describes that part of the tree. Similarly, sections at level three describe parts of the prefix tree that are deeper than 24. The result of searching one level of the data structure is either an index into the next-hop table or an index into an array of sections for the next level.
The Degermark et al. forwarding tables can be built during a single pass over all routing entries in a routing table. However, there is no guarantee that the resulting forwarding tables are memory optimized since the source data set (i.e. the routing entries) vary widely over different routers and over different times within the same router.