The present invention is related to the field of data networks, and more particularly to the routing of data packets from a source node to a destination node within a network.
One primary function of data networks is the routing of data packets or frames from a source network node to one or more destination network nodes. When a network device receives a packet or frame, the device examines the packet or frame in order to determine how the packet or frame is to be forwarded. Similar forwarding decisions are made as necessary at multiple intermediate network devices until the packet or frame is received at a desired destination node. This type of operation is in contrast to networks employing switching techniques, in which routes are pre-established as "circuits" and each network device simply forwards each received packet on its associated circuit. One example of a routed network is the Internet, which employs a protocol known as the Internet Protocol (IP) for routing data packets through the Internet.
There is a growing demand for Internet and other data network services. As a result, there is an increasing volume of routed data traffic such as IP traffic being carried on high-bandwidth data channels, such as the well-known T1 and T3 signals used to carry data and digitized voice in the public telephone system. Along with this increase in routed traffic is an increased demand for high-throughput routers that can make forwarding decisions at very high rates.
To accomplish the task of routing data packets through a network from a source node to a destination node, data networks commonly employ a distributed routing procedure. Network routers maintain routing tables to carry out the routing function. When a packet arrives at a router, an address contained within the packet (for example the destination address) is used to retrieve an entry from the routing table that indicates the next hop, or next node, along a desired route to the destination node. The router then forwards the packet to the indicated next hop node. The process is repeated at successive router nodes until the packet arrives at the desired destination node.
The routing tables in the routers are maintained according to any of a variety of distributed routing protocols. For example, one well-known routing protocol is known as OSPF, which is an acronym for "Open Shortest Path First". The routers collect information about the activation and deactivation of network links among neighboring nodes, and the information is communicated among the routers according to the routing protocol. Routes are created, updated, and deleted as needed according to network conditions. All of the pertinent routing-related information is contained collectively within the routing tables maintained at the routers.
A routing table entry includes a 2-part mapping between an address such as a destination address and an associated next hop address. It is common for the destination address portion to include a subnet mask value indicating that some of the address bits are to be matched precisely and others need not be. An example of an entry in an Internet Protocol (IP) routing table is the following:
128.4.0.0/16 100.0.0.0
This entry uses the known convention of representing a 32-bit IP address as a string of four bytes (most significant to least significant) separated by decimal points, where the value of each byte is given as a decimal equivalent. This entry indicates that any packet having a destination address whose 16 most significant bits are equal to 128.4 (1000000 0000100 binary), should be routed to the network node having IP address 100.0.0.0 (01100100 00000000 00000000 00000000 binary). An example of a matching destination address is 128.4.10.9; an example of a non-matching address is 128.120.0.0.
The example above illustrates the concept of aggregation of IP addresses for routing purposes. All IP addresses whose upper 16 bits are equal to 128.4 are routed to the same next hop node. Since IP addresses are 32-bit values, there are 2.sup.(32-16) =2.sup.16 =64K such addresses. These addresses are said to be aggregated in the routing table. It will be appreciated that shorter subnet masks correspond to greater aggregation, while longer subnet masks correspond to less aggregation. In addition, this format for a routing entry can also be used for route summarization, a technique similar to aggregation that is used by routing protocols.
The mapping from the set of all possible destination addresses to the set of all possible next hops can be represented as a binary tree, in which each bit of the destination address dictates which branch is taken at a corresponding level in the search for the next hop. For an n-bit address, a tree of height n is required. A fully populated tree has 2.sup.n distinct leaves at the end of 2.sup.n distinct search paths, where each leaf corresponds to a next hop value. However, a tree representing a set of routing entries typically contains far fewer leaves. The number of leaves required is influenced by the number of entries in the routing table, and also the degree to which network addresses are aggregated. If the network address space is divided into a relatively large number of sub-spaces each of which is assigned a different route, more leaves are needed than when the network address space is divided into a smaller number of sub-spaces having distinct routes. Most networks exhibit substantial address aggregation, so that even in large networks the mapping tree used for routing at a given node tends to be "sparse", i.e. not very fully populated. For example, the routing entry given above corresponds to a single leaf at location 16 of the tree, and it covers the range of 64K addresses from 128.4.0.0 through 128.4.255.255.
The simplest way conceptually to look up a next hop address is to use a conventional random-access memory having a binary address input and a data storage location associated with each unique address value. A next hop value is stored at the storage location corresponding to each address. The next hop is looked up in the memory by simply retrieving the value stored at the memory location indicated by the address included in a received packet. When a group of addresses are aggregated, such as in the above example, the next hop value used by the aggregation would be replicated at each aggregated address in the memory. Thus in the foregoing example the entry 100.0.0.0 would appear at locations 128.4.0.0 through 128.4.255.255 of such a memory.
While conceptually simple, such an approach is not practically feasible for typical network address spaces. The amount of memory required based on typical network address lengths is prohibitively large. For example, 4 billion memory locations are required to fully decode 32-bit IP addresses. Also, this approach is inefficient when the tree is even modestly sparse. For these reasons, network routers have generally employed alternative means of storing and retrieving the tree elements.
Many contemporary routers employ what is referred to as a Patricia tree representation of the mapping from destination addresses to next hops. During a search, a Patricia tree is traversed in binary fashion in the direction from most significant to least significant address bits. The Patricia tree structure achieves significantly greater storage efficiency than the simplistic approach described above. However, worst-case searches can potentially require 32 memory references. Thus the performance of a router using a Patricia tree is undesirably sensitive to network topology and address assignments.
The logical partitioning and layout of functional components within the router also affect router performance. A common configuration for a contemporary router is a collection of line cards interconnected by a switching fabric. Each line card has one or more ports each attached to a corresponding physical network medium. When a packet arrives at a line card port, a forwarding engine on the line card determines which port the packet should be forwarded to, and then forwards the packet to the corresponding line card through the switch fabric. The receiving line card then transmits the packet onto the appropriate network segment. The forwarding engine may be implemented using a general-purpose microprocessor executing special-purpose forwarding software, or may alternatively be implemented using special-purpose hardware. A software approach is favored when the speed of lookups is secondary to other considerations, such as ease of revision. A hardware approach is favored when the speed of lookups is paramount, for example on line cards used with very high-speed networks.
It is known to maintain the routing information within a centralized component such as a system controller within a router of the foregoing type, and for each forwarding engine to consult the system controller in order to obtain a route for each received packet. This approach has the advantage that only a single copy of the routing information is maintained within the router, so that the information can be updated readily and the most up-to-date information is automatically used for route determination. However, the system controller in such routers rapidly becomes a bottleneck, especially in light of the recent tremendous growth in the volume of network traffic.
To reduce the effect of a limited-capacity system controller on router performance, it has become more common for routing information to be distributed in multiple readily accessible locations in a router. In one approach a forwarding table is employed on the line cards to map the destination address of each received packet to the identity of the port to which the packet should be forwarded. The forwarding table contains a subset of the information from the routing table. The system controller updates the forwarding tables on the various line cards as changes to the routing table occur. The use of distributed forwarding tables increases parallelism in the router. Also, if the forwarding tables are small enough they can be placed into relatively fast-access storage on the line cards, which further enhances performance.
In some routers the forwarding tables are cached copies of one or more sections of the routing table. This technique exploits address locality appearing in the network traffic. Most of the next hop lookups are done on the line card when the hit rate in the cache is high. However, there are circumstances in which the hit rate in the cache cannot be maintained at an adequately high level. If the cache is too small relative to the number of different addresses received by the line card over a given interval, the cache may begin to thrash. When thrashing occurs, entries are repeatedly swapped out of the cache prematurely, substantially decreasing the hit rate. Each lookup that misses in the cache incurs delay while the needed entry is fetched from the system controller. As a result, overall performance of the router is degraded.
In a technique described by Degermark et al. in a paper entitled "Small Forwarding Tables for Fast Routing Lookups", small forwarding tables that contain all the necessary routing information are used in the line cards. A microprocessor on each line card executes a lookup algorithm using the data stored in the corresponding forwarding table. The technique uses a 3-level prefix tree representation of the mapping from destination network addresses to next hop addresses, and the inherent sparseness of the prefix tree is exploited to achieve considerable storage efficiency. Level 1 of the prefix tree is associated with bits &lt;31:16&gt; of the IP address from packets arriving at the router. Levels 2 and 3 of the prefix tree are associated with bits &lt;15:8&gt; and &lt;7:0&gt; of the IP address respectively.
In the technique of Degermark et al., routing entries that aggregate addresses having up to 16 of their most significant bits in common have corresponding entries in the level 1 tree, and require no space in either the level 2 or level 3 trees. Routing entries that aggregate addresses having between 17 and 24 of their most significant bits in common require space in both the level 1 and the level 2 trees. For these routing entries, the level 1 tree contains node entries that point to chunks in the level 2 tree that contain the corresponding leaves. For routing entries that aggregate addresses having between 25 and 32 most significant bits in common, the chunks in the level 2 tree contain node entries that point to chunks in the level 3 tree that contain the leaf entries. The levels are searched in order as deep as necessary using the respective bits of the IP address to retrieve the desired next hop value.
The technique shown in the Degermark et al. paper achieves considerable storage efficiency, so that small but complete forwarding tables can be stored on each line card. At each level of the prefix tree, storage is used only to store the required leaf and node information; little or no storage is left empty as a result of tree sparseness. A multi-level mapping structure within each level maps aggregated addresses to a single leaf or node entry used by all members of the aggregation. Thus for an exemplary routing entry such as (128.4.0.0/16--100.0.0.0), the Degermark forwarding table would contain a single leaf, and each address in the range from 128.4.0.0 through 128.4.255.255 would be mapped to the location of the single leaf.
While the technique shown in the Degermark et al. paper achieves considerable storage efficiency, it does so at the cost of complexity, notably in the multi-level mapping used at each level to extract the desired node or leaf based on the corresponding bits of the IP address. It would be desirable, however, for next hop lookups to be performed in a manner better suited to high-performance hardware implementation. Also, the Degermark et al. paper does not address performance issues that may arise from the manner of creating and maintaining the various data structures during dynamic network operation when routes are being added, deleted, or changed. A practical router must have an efficient means of re-generating the forwarding tables as necessary to keep up with changes in the routing topology as dictated by the routing protocol being followed.