An essential task of a packet router is to find the outgoing interface for each incoming packet by querying a routing table. In a typical Internet backbone router the routing table include more than 30,000 entries. In the near future the number of entries is expected to increase by an order of magnitude. Even though the system may run at several gigbits per second, the overall throughput is restricted because of slow look-up speed and therefore routing table look-up would appear to be a performance bottleneck. Accordingly, it would be extremely advantageous to make IP routing table look-up operate at the speed of the backplane throughput, which is translatable into more than 4 million look-ups per second for a 2 Gbps backplane.
An IP routing table comprises a set of routes each of which determines the outgoing interface for a set of IP destination addresses, represented by an IP address and a subnet mask. In IP version 4, both the address and the subnet mask are 32-bit numbers. If the K.sup.th bit of the subnet mask is 1, it indicates that the K.sup.th bit of the corresponding IP address is significant, otherwise not. For example, if the IP address 12345678 and FFFFF000 (both are in hexadecimal format) define a route, the set of addresses between 12345000 and 12345FFF belongs to this route. Each subnet mask always consists of contiguous ones from the most significant bit to identify the bits of the IP address known as the "prefix" which defines the route. In the early days of the Internet, the IPv4 unicast addressing had only three different prefixes: 8, 16, and 24 bits. The Classless Inter-Domain Routing (CIDR) were introduced to allow prefixes with any lengths (See G. R. Wright and W. R. Stevens, "TCP/IP Illustrated, Volume 2," Addison-Wesley Publishing Company, 1995).
Upon receiving an IP packet, the router consults its routing table to determine the route (or outgoing interface) based on the packet's prefix bits which potentially may match multiple routes. The following procedure is taken to determine the route for this packet:
a) If there is a route that is a perfect match, this route is taken. A route is a perfect match if the IP address defining this route is identical to the IP destination address. PA1 b) If there are some matches but no perfect match, the route with the longest match is taken. That is the route with the largest number of 1's in its subnet mask.
Several approaches have been proposed to support IP routing table look-up. As described in Wright & Stevens, supra, a special form of radix tree called a Patricia tree, (Practical Algorithm To Retrieve Information Coded in Alphanumeric), is used to represent the IP routing tables in both Net/3, FreeBSD and many existing high-end routers. The Patricia tree search algorithm supports any prefix length and can be used to search the longest matched prefix but may require backtracking during the search of an address, where backtracking means some nodes in the tree are visited more than once. The Patricia tree search algorithm may require about h.sup.2 /2 iterations per look-up in the worst case, where h is the height of the tree and can be as large as the address length (or 32 for IPv4 address)
A content addressable memory (CAM) which has multiple entries of IP addresses can be employed to cache the results of routing table look-up. Given an IP address, a CAM can search all of its entries in parallel. If there is a match after the search, a corresponding pointer to the matched address is obtained and is used to determine the output interface and some other information. Although a CAM is efficient in terms of its search speed, it is expensive and small in capacity as compared to a regular RAM. Moreover, since a CAM can only search on an entire address, it does not naturally support look-up with arbitrary prefix length. Without the assistance of other methods, a CAM supports only a small number of entries.
Another approach to speeding IP routing table look-up, which I call Hashed Radix Tree, is described in my co-pending patent application Ser. No. 09/003767 filed Jan. 7, 1998 in which the search, addition, and deletion times of a route are a function of the address length but are independent of the tree size. A typical look-up operation of the Hashed Radix Tree is described in FIG. 2. Each IP destination address is represented as 32-bit number, and is divided into two parts, the first K bits and the remaining (32-K) bits. The first K bits of the IP address are treated as an index to a RAM. Instead of putting all nodes into a single binary tree, the first K bits of IP address are hashed to a smaller tree in order to reduce the number iterations in the look-up. The output of the RAM is a pointer to the root node of a tree, denoted as the node labeled with "Bit K+1" in the figure. Each node in this tree has the same K-bit prefix as the first K bits of the IP address. If the (K+1).sup.th bit of the IP address is 0, the look-up proceeds to the left child of the root node; otherwise, look-up proceeds to the right node of the root node. Similarly, in the next iteration, if the (K+2).sup.th bit of the IP address is 0, the look-up proceeds to the left child node of the node labeled with "Bit K+2"; otherwise, look-up proceeds to the right node of the node labeled with "Bit K+2." The look-up process continues in like fashion by examining bit (K+3), . . . , 32 to determine whether to take the left or right child in each iteration. The look-up process stops when it finds a perfect match or reaches a node without any child node.
If a perfect match is found (as indicated in FIG. 2 at the node labeled with "Bit K+3"), a pointer is obtained to a separate memory to determine the output interface corresponding to this IP destination address. During the look-up process, several matches with various subnet masks may be found. The process will always keep the best match that it has found. When the look-up process reaches a node without any child node, the best match will be used as if it is the perfect match, and the output interface corresponding to this IP address can be determined. If there is no match at all during the look-up process, the look-up decides that there is no route entry for this IP address. Hardware implementation of the Hashed Radix Tree look-up is easy because (a) only about (33-K) iterations per look-up are required in the worst case; (b) both subnet mask look-up and maximum matching are supported and (c) search time is independent of the number of route entries, so a large number of routes can be supported.
Another approach which can support a prefix of arbitrary length is disclosed by Degermark et al in an article entitled "Small Forwarding Tables for Fast Routing Look-ups," Proceedings of ACM SIGCOMM'97 Conference. 27(4): 3-14, Cannes, France, Sep. 14-18, 1997. This a hierarchical encoding scheme to compress a routing table. The address space is partitioned into three levels, one level deals with the first 16 bits, another bits 17 to 23, and the last level is concerned with the last 8 bits. Each level uses a separate encoding scheme to compress the tree structure. The Degermark scheme focused on the number of memory accesses required during look-up, and the size of the data structure. In particular, their algorithm requires about 150-160 Kbytes to represent a table with 40,000 routes. Using a general-purpose processor, a typical search requires eight references to memory and accesses a total of 14 bytes. Since the table is reasonably small, it can fit into the data cache of a high-end processor and so search time is very short. However, insertion and deletion of route entries in the compacted table may incur the change of the entire table and so incremental changes to the table are not feasible.
Still another approach is disclosed by Waldvogel et al in an article entitled "Scalable High Speed IP Routing Look-ups", Proceedings of ACM SIGCOMM'97 Conference, 27(4):25-35, Cannes, FR, Sep. 14-18, 1997 which uses binary search on hash tables organized by the prefix lengths and which scales very well as address and routing table sizes increase. In particular, it requires a worst case time of log.sub.2 (address bits) hash look-ups, independent of the table size. Instead of searching from the longest prefixes or the shortest prefixes, the Waldvogel scheme starts the search at the prefixes of median length, say M. If there is a match at the median length, a subsequent search should be performed at the prefixes of the median length between the longest prefix length and M; otherwise, it should be performed at the prefixes of the median length between the smallest prefix length and M. Then, the search continues on different prefix lengths as if traversing a binary tree. The number of prefix lengths to be considered is reduced by half after each hash look-up. If proper hashing functions are employed and the computation of the hash functions can be accomplished efficiently, it is easy to show that if the organization of these prefix lengths is balanced, we have the worst case time complexity. As pointed out by the authors, insertion and deletion of route entries in the hash tables (while maintaining the proposed search time) may require that the major part of the tables be changed. So additional work is required to support incremental change to the tables. Since hashing functions tend to waste some memory if a non-perfect hashing function is employed, the Waldvogel approach requires more memory than other previous approaches. The cost to compute the hashing function is also an important issue to consider.
Given a set of prefixes, many existing search algorithms use some tree structure to represent these prefixes. A tree representation of prefixes enables efficient search, insertion, and deletion of any prefix in the set of prefixes. Both Patricia trees and Hashed Radix trees use a tree structure in which a prefix is not always expanded, but in which prefix information can be obtained in either leaf nodes (in Patricia Tree) or in all nodes (in Hashed Radix Tree). These approaches require additional memory access to obtain the prefix values of routes either at leaf nodes (in Patricia Tree) or in all nodes (in Hashed Radix Tree).