In packet networks, information is transferred through the network from a source computer to a destination computer using packets called datagrams. The source and destination computers are called hosts. The network is an interconnection of hosts and routers. Typically routers have many network interfaces or ports connecting to other routers and hosts. The routers have input ports for receiving incoming packets and output ports for transmitting outgoing packets. The packets include data from the source computer and a destination address. The routers route the packets to a host or to another router based on the destination address and information stored in a routing table.
In the Internet protocol (IP), a route is either an indirect route or a direct route. When a route is an indirect route, the next destination is another router. A routing table entry indicates the next router's IP address and related information, such as the network interface connecting to the next router. When a route is a direct route, the next destination is the destination host. In this case, the routing table entry indicates the network interface to which the destination host is connected.
A hop is a direct interconnection between two routers, two hosts, or a router and a host. An indirect route has more than one hop to a host, while a direct route has one hop to the host. A next hop is the router or host at the distant end of the hop. A next hop's IP address is the IP address of the router or host at the distant end of the hop.
In one routing table, the information in a route entry includes at least the following: a destination IP address, a prefix length, a next hop's IP address and address port information. The IP address has thirty-two bits. The prefix length specifies the number of leading bits of the IP address defining a network portion of the address. The remaining bits define a host portion of the address. The network portion of the address is often referred to as the IP network address. The entire IP address is usually referred to as the IP host address. For example, using standard Internet dotted decimal notation, 172.16.10.20/24 would indicate an IP prefix length of 24 bits, a network address of 172.16.10.0, and an IP host address of 172.16.10.20.
IP routing is based on either the IP network address or the IP host address. Routes specified with IP network addresses are called network routes. Routes specified with IP host addresses are called host routes. IP routers handle both network and host routes.
When a router receives a packet with a destination address for a host that is not connected to that router, the router routes the packet to another router. Each router has a routing table defining routes or ports to use to route the packet. The routing table stores routing table entries. Each routing table entry includes at least a destination IP address, the prefix length of that destination IP address, the next hop's IP address for that destination, and the network interface (port) to be used for sending a packet to the next router or host. When a routing table entry is a direct route, the next hop's IP address is typically stored as 0.0.0.0. When the route is a host route, the prefix length is set equal to thirty-two.
When searching for a route in the routing table, the router uses the destination IP address of each packet as a search key. Although all packets include a destination IP host address, no packets include the prefix length information. Therefore, routers need to determine which portion of the IP host address includes the IP network address for network routes.
To determine a route, one prior art routing table architecture uses a hash table. In hash-based routing tables, two tables and one special route entry are typically used. The first table, rt_host, is used for host routes and stores IP host addresses and output ports. The second table, rt_net, is used for network routes and stores IP network addresses and their route information. The special route entry specifies a default route. When a packet is being routed, the router searches the first table, rt_host, for host routes, if any. The router performs the search by comparing the destination address to the IP host addresses in the routing table. When no IP host address in the first table matches the destination address, the first table does not specify the host route and the search fails. When the search of the first table fails to find a host route, the router searches the second table, rt_net, to determine a network route, if any, using the destination address and the IP network addresses stored in the second table. When no IP network address in the second table matches the destination address, the second table does not specify the network route and the search fails. When the search of the second table fails to find a network route, the router uses the default route, if specified.
The first and second tables, rt_host and rt_net, respectively, are usually implemented as hash tables. For the first table, rt_host, routers use the entire destination IP host address in the incoming packet as a hash key to determine a starting pointer to a linked list in the first table. A linear search is performed through the linked list to determine whether the destination IP host address matches any entry in the linked list. If so, this matching entry, which has the host route, is returned.
For the second table, rt_net, routers use a set of leading bits of the destination IP host address in the incoming packet as a hash key to determine a starting pointer to a linked list in the second table. The set of leading bits of the destination IP host address is the destination IP network address. Routers determine the prefix length from the traditional IP address class information. The router uses the prefix length to determine the number of leading bits of the destination IP network address to apply as the hash table key. A linear search is then performed through the linked list to determine whether the destination IP network address matches any entry in the linked list. If so, this matching entry, which contains the network route, is returned.
In the second table, rt_net, the linked list is pre-sorted by IP prefix length in descending order. When the second table, rt_net, is searched, the first match will select the longest match of the network portion of the destination address.
The hash-based routing methods are slow because a linear search is performed through the linked list in the hash table. The amount of time to search for a route is a function of the number of entries in the linked list. Therefore, route lookup cannot be done in a predetermined, fixed amount of time. In other words, searches have no fixed upper bound on the amount of time to perform the search.
Another routing table that uses multiple levels of arrays (i.e, a Multi-Array Routing Table (MART)) has a low and deterministic search cost. The search cost of a multi-array routing table is typically two to four routing table memory accesses for Internet protocol version four (IPv4). One advantage of the multi-array routing table is that implementing the search function in hardware has less complexity. In addition, because the multi-array routing table search cost is deterministic, the multi-array routing table search hardware may be pipelined. However, the traditional multi-array routing table has a disadvantage—a highly expensive route update.
In a multi-array routing table described by Pankaj Gupta, Steven Lin, and Nick McKeown in Routing Lookups in Hardware at Memory Access Speeds, Proc. Infocom, April 1998, in a worst case, adding a single route incurs 32 million (M) routing table memory accesses (16 M reads and 16 M writes). Although the route update frequency of this multi-array routing table is low, an average of 1.04 updates per second with a maximum of 291 updates per second, a phenomenon known as “route flap” in the Internet core routers is not considered. Route flap causes entire border gateway protocol (BGP) routes to be deleted and added. As of June 2000, the number of BGP routes in the core Internet routes exceeds 52,000.
Consequently, more than 52,000 routes may be deleted and added in a single update even though the average route update frequency is low. Therefore the route update cost should be kept low.
FIG. 1 is a diagram of a traditional multi-array routing table 30 having three levels of arrays. The IPv4 destination address 32 has thirty-two bits and is used as the search key into the multi-array routing table 30. A level 0 array 34 has 65,536 (i.e., 64K) elements 36 and is indexed by the most significant 16 bits of the IPv4 address. A level 1 array 38 has 256 elements and is indexed by bits 8-15 of the destination address. A level 2 array 40 has 256 elements and is indexed by the least significant eight bits of the destination address. Each thirty-two bit IP address can be mapped to one element 36 of the level 0, level 1 or level 2 arrays, 34, 38, 40, respectively.
When a route is added to the multi-array routing table 30 of FIG. 1, all of the array elements corresponding to the destination IP prefix of the route are configured to point to the added route. A destination IP prefix has up to thirty-two bits and is represented by the following format: AA.BB.CC.DD/prefix length, in which each of AA, BB, CC and DD are represented by eight bits, and the prefix length follows the slash “/.” For example, in FIG. 1, the tables have been updated with a pointer to route A in accordance with the destination IP prefix of 10.1.1.128/25. Because the specified prefix length of twenty-five exceeds the number of specified bits of the prefixes in the level 0 34 and the level 1 38 tables, the pointer to route A is stored in the level 2 array 40. An index or address to the level 0 array 34 is determined by applying the following relationship to the first sixteen bits of the destination address, “10.1”:2,561=256×10+1.
A pointer 42 to the level 1 array 38 is stored at element 2,561. The next eight bits of the destination address, “1,” are used to generate the index into the level 1 array 38. In other words, the pointer 42 to the level 1 array is used as a base address and is added to the next eight bits of the destination address to determine the index 43 into the level 1 array 38. In this example, a pointer 44 to the level 2 array 40 is stored at address 1 in the level 1 array 38. The pointer 44 to the level 2 array 40 will also be added to the last eight bits of the destination addresses to generate an index into the level 2 array 40. Because the specified prefix length is equal to twenty-five, all routes associated with the first twenty-five bits of the destination address are updated with the pointer to route A. The level 0 and level 1 arrays, 34 and 38, respectively, are associated with the first twenty-four bits of the destination address. In this example, the last portion of the prefix, “128,” is specified, and the “128” in combination with the prefix length of twenty-five corresponds to “1xxx xxxx” in binary, in which the x designates that the state of the bit is unknown. Therefore, the “1” in the twenty-fifth bit is associated with a range of addresses—128-255. In the level 2 array 40, the elements from addresses 128 to 255 correspond to the address of 10.1.128/25 and have pointers to route A.
In an example of a search, when the search key is equal to 10.1.1.130, the level 0 array 34 and level 1 array 38 will be accessed as described above to determine the pointer 44 to the level 2 array 40. The index 45 to the level two array 40 will be generated as described above, and the pointer to route A at address 130 in level 2 array 40 will be returned. The multi-array routing table 30 of FIG. 1 always finishes a lookup with three or fewer memory accesses of the routing table.
Assume that a new route, referred to as route B, whose destination IP prefix is equal to 10/8 is to be inserted to the multi-array routing table 30. To determine the associated addresses in the level 0 table 34, the destination IP prefix of 10/8 is represented as “0000 1010 xxxx xxxx xxxx xxxx xxxx xxxx” in binary. Therefore, the prefix 10/8 is associated with a range of addresses, 2,560-2,815, in the level 0 array 34. The contents of the elements of the range of addresses and any arrays pointed to by those elements need to be examined and updated appropriately with the new route information. Pseudo code for adding the new route, route B, is shown below:
Pseudo-Code for adding a route to the multi-array routing table of FIG. 1
For i = 2,560 (10.0) to 2,815 (10.255)/* Set range of addresses in thelevel 0 table to be updated */              If level-0[i] is connected to a level 1 array then                        level-1[] = connected array                       For j = 0 to 255/* Access all elements of thelevel 1 table */              If level-1[j] is connected to a level 2 array then                   level-2[] = connected arrayFor k = 0 to 255/* Access all elements of thelevel 2 table */                        If level-2[k] is empty or                             level-2[k]'s prefix length < 8 then                          level2[k] = B                        Else if level-1[j] is empty or                             level-1[j]'s prefix length < 8 then                          level-1[j] = B              Else if level-0[i] is empty or                  level0[i]'s  prefix length < 8 then                level0[i] = B
The pseudo code compares the prefix length of the existing and new routes before a updating an element so that route pointers associated with the longest matching prefix length are stored in the routing table.
The cost of adding a route to the routing table is expensive using the pseudo code above. In the worst case, 16 M (256×256×256) routing table memory reads and 16 M routing table memory writes are performed to add route B to the multi-array routing table 30.
For an example of route deletion, assume now that route A is to be removed from the multi-array routing table 30. The contents of elements 128 to 255 of the level 2 array 40 are replaced with the new longest-matching route after route A is removed, which is route B. One technique of finding the newest longest matching route is to backtrack among the arrays and array elements, reading the contents of each element and comparing the contents of a most recently read element to a current longest-matching route to determine whether the most recently read element specifies the longest-matching route. Therefore, deleting route A requires numerous memory accesses and is expensive.
The paper of Pankaj et al. teaches that 99.93% of the prefix lengths of the MAE-EAST routing table data are less than twenty-four and assumes that the multi-array routing table 30 does not require a large number of deep level arrays. However, the MAE-EAST routing table data includes only BGP routes. In practice, Internet Service Provider (ISP) routers have both BGP and Interior Gateway Protocol (IGP) routes in their routing tables, and the prefix length of most IGP routes is longer than twenty-four. The number of IGP routes in an ISP's router is typically not disclosed because the size of their network and the number of their customers can be estimated from the number of IGP routes. Despite this lack of IGP data, it is likely that large ISPs may have more than 1,000 IGP routes, and therefore, the multi-array routing table 30 of FIG. 1 would have many deep level arrays. Increasing the number of deep level arrays increases the route update cost of the traditional multi-array routing table exponentially. Therefore, an apparatus and method that reduces the cost of updating a multi-array routing table is needed.