1. Field of the Invention
The present invention relates to using multicasts to distribute information to multiple nodes in a network; and in particular to using multicast subsets to improve performance during distribution of routing information among adjacent nodes.
2. Description of the Related Art
Networks of general purpose computer systems and specialized devices connected by external communication links are well known and widely used in commerce. The networks often include one or more network devices that facilitate the passage of information between the computer systems and devices. A network node is a network device or computer or specialized device connected by the communication links. An end node is a node that is configured to originate or terminate communications over the network. An intermediate network node facilitates the passage of data between end nodes.
Communications between nodes are typically effected by exchanging discrete packets of data. Information is exchanged within data packets (also called messages herein) according to one or more of many well known, new or still developing protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other based on information sent over the communication links. Each packet typically comprises 1] header information associated with a particular protocol, and 2] payload information that follows the header information and contains information that may be processed independently of that particular protocol. The header includes information such as the source of the packet, its destination, the length of the payload, and other properties used by the protocol. Often, the data in the payload for the particular protocol includes a header and payload for a different protocol associated with a different layer of detail for information exchange. For many protocols, the destination of a packet can include data that indicates a unique identifier for a particular destination node, such as a network address, and the packet is termed a unicast packet; or the destination can include a special code that indicates the packet is directed to any recipient node, and the packet is termed a “multicast” packet. Such a special code is called the multicast destination code.
The headers included in a packet traversing multiple heterogeneous networks, such as the Internet, typically include a physical (layer 1) header, a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header, as defined by the Open Systems Interconnection (OSI) Reference Model. The OSI Reference Model is generally described in more detail in Section 1.1 of the reference book entitled Interconnections Second Edition, by Radia Perlman, published September 1999, which is hereby incorporated by reference as though fully set forth herein.
The internetwork header provides information defining the source and destination address within the network. Notably, the path may span multiple physical links. The internetwork header may be formatted according to the Internet Protocol (IP), which specifies IP addresses of both a source and destination node at the end points of the logical path. Thus, the packet may “hop” from node to node along its logical path until it reaches the end node assigned to the destination IP address stored in the packet's internetwork header.
Routers and switches are intermediate network nodes that determine which communication link or links to employ to support the progress of data packets through the network. A network node that determines which links to employ based on information in the internetwork header (layer 3) is called a router.
Some protocols pass protocol-related information among two or more network nodes in special control packets that are communicated separately and which include a payload of information used by the protocol itself rather than a payload of data to be communicated for another application. These control packets and the processes at network nodes that utilize the control packets are said to be in another dimension, a “control plane,” distinct from the “data plane” dimension that includes the data packets with payloads for other applications at the end nodes.
A routing protocol only exchanges control plane messages used for routing data packets sent in a different routed protocol (e.g., IP). A portion of a network under the network administration of a single authority, such as an enterprise or Internet service provider (ISP) is called a domain or an autonomous system (AS). To reduce the consumption of network resources and improve scalability, some routing protocols send only sumnmarized routing information. Routing information for an AS is summarized at its boundaries with one or more other ASs at intermediate network nodes called border gateway nodes or border gateway (BG) routers. Routing information shared within the borders of one AS is exchanged using an interior gateway protocol (IGP). Example IGPs include the link state protocols such as the intermediate system to intermediate system (IS-IS) protocol and the open shortest path first (OSPF) protocol. Another IGP, developed by Cisco Systems of San Jose, Calif. for use in its routers, is the Enhanced Interior Gateway Routing Protocol (EIGRP). Some of the link-state protocols divide an autonomous system into multiple areas, flood all data for a unified routing database within an area, but send only summarized information between areas. Some IGPs, like EIGRP, send only summary information from each intermediate network node in the autonomous system.
EIGRP currently uses reliable multicast to transport routing information between a sending network node and all its adjacent neighbor nodes (sometimes called neighbors or peers) over one or more interfaces on the sending node. This reliable multicast system relies on the sending router sending a single multicast data packet, and waiting for some specified period of time called a multicast flow time (learned dynamically through network operation), for the neighbors that have received the routing information to acknowledge receipt of the information with an acknowledgement (ACK) data packet. Because receipt of the multicast data packet is acknowledged by the recipients with an ACK data packet, the multicast is called a reliable multicast.
If a neighbor does not acknowledge the receipt of this information within the multicast flow time, the neighbors that have replied are placed in a special state, called the conditional receive state, so they may continue to receive routing information through multicasts. Other routers are informed to ignore the additional multicasts.
That is, instead of waiting for all ACK messages before sending the next multicast, EIGRP paces multicast packets on the one or more interfaces with its neighbors using a timer called a multicast flow timer. The value indicated in the multicast flow timer is derived from the mean Smooth Round Trip Time (SRTT) of all neighbors on an interface. When there are large number of neighbors which have a wide range of SRTTs, the multicast flow timer value is large, forcing EIGRP to pace the multicast packets very slowly. As a result, the faster neighbors are penalized by the slower neighbors.
Under normal condition, EIGRP waits for acknowledgements from all neighbors before sending the next reliable multicast packet. If the multicast flow timer expires and EIGRP is ready to send the next packet when only a subset of neighbors have acknowledged the previous multicast packet, EIGRP enters a Multicast Exception condition. Under this condition, EIGRP continues to send the next multicast packet rather than waiting for all ACK messages. A method called Conditional-Receive (CR) is invoked to instruct the laggard neighbors to not accept the next multicast packet which is intended for the faster neighbors. Normal multicast resumes when the laggard neighbors catch up.
CR works by multicasting a special hello packet (sometimes called an unreliable hello packet because an ACK message is not returned by the recipient) to the neighbors. The unreliable hello packet has a variable-length data field holding data that indicates the addresses of the laggard neighbors and the sequence number of the next reliable multicast packet. The special unreliable hello packet is also called a sequenced hello. The next reliable multicast packet is sent with the CR bit set and has the same sequence number specified in the sequenced hello. This special reliable multicast packet is called a CR packet. The laggard neighbors that have the matching addresses specified in the sequenced hello discard the CR packet without further processing. The faster neighbors go into the CR mode and accept the CR packet. Unicast packets without the CR bit are sent to the laggard neighbors until they catch up.
This mechanism works well in networks where a single router can reach all the neighbors attached to a single interface through a link that is similar in speed for each of those neighbors, and when these links are relatively lossless, and bandwidths are relatively high compared to the amount of routing information to be transferred.
However, on networks with a large number of neighbors, reachable through links with varying speeds, this system presents a number of problems, including the following.    (1) CR divides the neighbors into two subsets, a multicast subset and a unicast subset. This is not efficient when there are many neighbors on an interface. The increased number of neighbors increases the range of travel times and increases the average travel time, thus increasing the value of the multicast flow timer. Many fast neighbors may be penalized by waiting too long for the multicast flow timer.    (2) However, if the flow timer is set at a smaller value, EIGRP frequently invokes the CR method and increases the number of laggard routers. When the number of laggard neighbors is large, unicasting the same routing information to many of them is not efficient. As EIGRP is required to support thousands of neighbors per interface, it clearly requires a more efficient delivery method.    (3) When there are many neighbors on an interface, the list of laggard neighbor addresses in the sequenced hello may become large. The interface maximum transmission unit (MTU), which specifies the maximum size of a data packet on an interface, may not be large enough for the sequenced hello to contain all needed neighbor addresses. EIGRP currently only supports an MTU of 1500 bytes which has enough room for less than 300 neighbor addresses. As a result, EIGRP replicates a packet that indicates a sequence number to be ignored by a laggard neighbor and unicasts the packet to each laggard neighbor that has an address that is not included in the multicast sequenced hello.    (4) The large sequenced hello packets contribute to interface congestion and router load when processing long lists of neighbor addresses.
Based on the foregoing, there is a clear need for techniques to multicast routing information, which techniques do not suffer one or more deficiencies of past approaches. In particular, there is a need to reduce laggard neighbors of a sending node to fewer than 300 to properly implement CR in EIGRP and to reduce the congestion on a link caused by a large number of unicasts to laggard routers. There is also a particular need to shorten the value in the multicast flow timer for the fastest neighbors of a sending node.