A computer network is a geographically distributed collection of interconnected communication links used to transport data between nodes, such as computers. Many types of computer networks are available, with the types ranging from local area networks (LANs) to wide area networks (WANs). The nodes typically communicate by exchanging discrete packets or messages of data according to pre-defined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Computer networks may be further interconnected by an intermediate node, such as a router, to extend the effective “size” of each network. Since management of a large system of interconnected computer networks can prove burdensome, smaller groups of computer networks may be maintained as routing domains or autonomous systems. The networks within an autonomous system are typically coupled together by conventional “intradomain” routers. Yet it still may be desirable to increase the number of nodes capable of exchanging data; in this case, interdomain routers executing interdomain routing protocols are used to interconnect nodes of the various autonomous systems.
An example of an interdomain routing protocol is the Border Gateway Protocol version 4 (BGP), which performs routing between autonomous systems by exchanging routing (reachability) information among neighboring interdomain routers of the systems. An adjacency is a relationship formed between selected neighboring (peer) routers for the purpose of exchanging routing information messages and abstracting the network topology. Before transmitting such messages, however, the peers cooperate to establish a logical “peer” connection (session) between the routers. BGP establishes reliable connections/sessions using a reliable/sequenced transport protocol, such as the Transmission Control Protocol (TCP).
The reachability information exchanged by BGP peers typically includes destination address prefixes, i.e., the portions of destination addresses used by the routing protocol to render routing (“next hop”) decisions. Examples of such destination addresses include Internet Protocol (IP) version 4 (IPv4) and version 6 (IPv6) addresses. A prefix implies a combination of an IP address and a mask that cooperate to describe an area of the network that a peer can reach. Each prefix may have a number of associated paths; each path is announced to a BGP router by one or more of its peers. The BGP routing protocol standard is well known and described in detail in Request For Comments (RFC) 1771, by Y. Rekhter and T. Li (1995), Internet Draft <draft-ietf-idr-bgp4-22.txt> titled, A Border Gateway Protocol 4 (BGP-4) by Y. Rekhter and T. Li (April 2003) and Interconnections, Bridges and Routers, by R. Perlman, published by Addison Wesley Publishing Company, at pages 323–329 (1992), all disclosures of which are hereby incorporated by reference.
The interdomain routers configured to execute an implementation of the BGP protocol, referred to herein as BGP routers, perform various routing functions, including transmitting and receiving routing messages and rendering routing decisions based on routing metrics. Each BGP router maintains a routing table that lists all feasible paths to a particular network. Periodic refreshing of the routing table is generally not performed; however, BGP peer routers residing in the autonomous systems exchange routing information under certain circumstances. For example, when a BGP router initially connects to the network, the peer routers exchange the entire contents of their routing tables. Thereafter when changes occur to those contents, the routers exchange only those portions of their routing tables that change in order to update their peers' tables. These update messages are thus incremental update messages sent in response to changes to the contents of the routing tables and announce only a best path to a particular network.
Broadly stated, a BGP router generates routing update messages for an adjacency, also known as a peer router by “walking-through” the routing table and applying appropriate routing policies. A routing policy is information that enables a BGP router to rank routes according to filtering and preference (i.e., the “best path”). Routing updates provided by the update messages allows BGP routers of the autonomous systems to construct a consistent view of the network topology. The update messages are typically sent using a reliable transport, such as TCP, to ensure reliable delivery. TCP is a transport protocol implemented by a transport layer of the IP architecture; the term TCP/IP is commonly used to denote this architecture. The TCP/IP architecture is well known and described in Computer Networks, 3rd Edition, by Andrew S. Tanenbaum, published by Prentice-Hall (1996).
A common implementation of the BGP protocol is embodied as a single process executing on a single processor, e.g., a central processing unit (CPU), of the BGP router, while another known implementation provides multiple instances of the BGP process running on a single CPU. In this latter implementation, each BGP instance has its own routing table and chooses its own best path for a given prefix; each instance may further have a different best path for the same prefix. From the perspective of the protocol, each BGP instance is a separate router; yet, each router instance shares the same resources, e.g., the single CPU. Both BGP implementations store and process update messages received from their peer routers, and create and process update messages for transmission (announcement) to those peers. However, the amount of processing time (i.e., band-width) available on the single CPU is finite which, in turn, results in limitations on the number of routes the BGP implementations can handle and limitations on the number of peers/adjacencies that the BGP implementations can support.
Examples of factors that limit the number of adjacencies and routes that a BGP implementation can support include the physical amount of memory in the BGP router. The amount of memory the BGP router can support is important because secondary storage, such as disks, cannot be efficiently used to store update messages given the substantial read/write latencies involved with accessing the disks. Moreover, each adjacency maintained by the router has a certain minimum CPU cost associated therewith. Examples of such cost include sending “KeepAlive” messages at predetermined intervals, processing received update messages, and deciding whether to send update messages to peers whenever a change is made to the routing table.
Each implementation of the BGP protocol requires that a BGP router receive and store, for each routable prefix, a number of paths received from its peers (or generated locally). These paths are then compared using a best path selection algorithm that describes a relationship between paths, i.e., the algorithm specifies how the paths should be compared, along with the order in which they are compared, to select the best path. The best path is then used by the router to forward packets and is announced to the router's peers. Yet as the number of peers increases and the number of prefixes announced (advertised) by each peer increases, the amount of memory needed to store all received paths (i.e., a “load”) also increases. It is thus desirable to disperse this received path load among, e.g., various processing elements or nodes of a multi-node system, such that no single node stores all of the paths, thereby limiting the maximum memory needed on a single node.
In general, any relation that is transitive can be dispersed. In this context, transitive means, for example, that if A is less than B and B is less than C, then A is less than C. Using the transitive relation, the best path from each of a number of distinct subsets of paths can be selected and then those selected paths can be compared to select the overall best path. However, the best path selection algorithm (as specified in the BGP standard) is not transitive, i.e., if path A is better than path B and path B is better than path C, it is not necessarily the case that path A is better than path C. In the cases where this non-transitivity can occur, the BGP standard defines the order in which the paths must be compared to make the selection of the overall best path deterministic.