In a distributed communication network, the time at which each individual component, such as nodes, access points, and routers, acts upon a common event is not synchronous. One of the biggest problems with data routing in a distributed network is that not all nodes have the same view of the network at the same time. There is an inherent delay involved in distributing notification of an event throughout the entire network. Examples of events which may cause time delays include network failure, deliberate changes in the network structure, and basic laws of physics.
At any given time, each node in a network is aware of the status of all other active nodes. Whenever data is available for distribution, each node determines a route for forwarding data through the network based on that node's perception of the present condition of the network. A number of factors determine the route chosen by the node, including which nodes and links are active, link utilization, the traffic flow/distribution requirements, etc. Ideally, if all the nodes have the same view of the network, at any given instant, each node would choose to route the data according to the same paths through the network.
In reality, delays within the system often cause the nodes to have different views of the network, resulting in the nodes choosing different i.e., non-optimal, paths for routing a particular set of data. Any time differences result in poor quality or incorrect routes, with the worst case being looped traffic. A routing loop may form when individual nodes compute the path's next hop base on differing network topology views. In a classic example, as shown in FIG. 1, for a network having three nodes (A, B, and C), node A transmits data to node C through node B. If the link between nodes B and C is broken, but node A has not yet learned of the breakage, node A transmits the data to node B assuming that the link A-B-C is the optimal route. Node B knows of the broken link and tries to reach node C via node A, thus sending the original data back to node A. Furthermore, node A receives the data that it originated back from node B and consults its routing table. Node A's routing table will say that it can reach node C via node B (because it still has not been informed of the break) thus sending its data back to node B creating an infinite loop. Routing loops unnecessarily tie up network resources and available bandwidth that would otherwise be free to route traffic.
For multicast traffic, route looping can be catastrophic. Using multicast, a source only has to send a packet once, even if the packet is to be delivered to a large number of receivers by following a tree like structure rooted at the origin. The nodes in the network replicate the packet as necessary to reach multiple receivers. A looping multicast packet continuously generates copies at it loops, which in turn generate additional copies. In the worst case, when looping occurs in this situation, thousands or even millions of copies of the same data packets can be continuously bounced around between nodes until the entire system is completely saturated and is unusable for actually routing other data traffic.
To combat the above-mentioned problems, a network will sometimes be configured to deliberately react slowly, or will require that the flow of certain traffic be disabled while the network “converges.” At the control level, the only remedies or preventative measures currently in place include trying to process messages as fast as possible, attempting to reduce packetization delay for control packets, etc. One “band-aid fix” for the looping problem is to insert a “time to live” (“TTL”) factor in data packets, which limits the amount of time or number of iterations or that a data packet can experience before it is discarded. Because the TTL value is decremented once per hop, network designers set the TTL relatively so that the packet reaches its destination before the TTL value reaches zero and is discarded. However, the TTL value does not prevent looping or incorrect routing; it only minimizes the damage experienced by the network when these events occur. Also, not every protocol has a TTL field. For example, Ethernet frames do not include a TTL value.
Some protocols may disable multicast traffic for some given time period, in particular immediately following a topology change, to wait for the network to converge. This approach is a “safe” mechanism chosen by spanning tree protocols used by Ethernet, but the traffic is turned off considerably longer than necessary (e.g., seconds). In effect, this remedy discourages multicast traffic because it either prevents the broadcasting of multicast traffic at unpredicted times or creates a backlog of messages to be delivered when the restriction is lifted.
In addition, Reverse Path Forwarding Checks (“RPFC”) may be used to prevent loops by ensuring that the path back to the source of the packet is consistent with the interface where the packet arrived. Basically, RPFC causes the receiving node to look backwards to where the packet came from to verify that the source node of the packet is reachable from the present node via that particular interface, i.e., it checks the reverse path of the packet. If the check passes, the packet is forwarded on towards its destination node, otherwise, the packet is dropped. However, this approach requires a reverse path forwarding table in the routing hardware, which uses memory and consumes backplane transmission time. Additionally, RPFC does not guarantee 100% results and there are still rare cases where looping is possible. In particular, this approach is exceptionally susceptible to “headless” router operation where the router control plane can “die” while forwarding is still permitted. Headless operation is highly desirable for networking equipment and cannot be simply disallowed.
Therefore, what is needed is a method and apparatus for routing packets in a distributed communication network which allows for the efficient and effective routing of packets through the network with minimal probability of looping.