1. Field of the Invention
The present invention is directed to networking technology. In particular, the invention is directed to providing a backup service to a group of computing devices located within a logical space.
2. Background
Dynamic routing occurs when routing devices talk to adjacent routing devices, informing each other of what network each routing device is currently connected to. The routing devices must communicate using a routing protocol that is running by an application instantiating the routing function, or a routing daemon. In contrast to a static protocol, the information placed into the routing tables—the routes—are added and deleted dynamically by the routing daemon as the routes messages take in the system change over time. Additionally, other changes can occur to the routing information over time. For example, route preferences can change due to changes in network conditions such as delays, route addition/deletions, and network reachability issues.
The routing daemon adds a routing policy to the system, thus choosing which routes to insert into a routing table maintained by the system. In the case of multiple routes to the same destination, the daemon chooses which route is the best to use under the circumstances. On the other hand, if a link goes down, the daemon can then delete any routes connected to that link. The daemon can find alternative routes if need be.
Thus, a change in a network topology typically impacts the performance of the system. First, when a link goes down, the other routing devices must determine that the link has ceased to operate. Next, the other routing devices in the system typically search for new routes that do not use the link that has been determined to be down. This possibly entails a rebuild of the internal routing tables for neighboring routing devices to the link. In addition to link failures, changes may also be precipitated by such events as node failures, or network failures.
Next, when the link again becomes operational, the routing daemon on neighboring routing devices may need to place the route information back into the tables that they maintain. Again, this may entail effort in the form of determining metrics for routes associated with the again-operational link as well as rebuilding any internal routing tables at the neighboring nodes or routing devices upon the reinstantiation of service of the routing device at a node that the routing device services.
Open Shortest Path First (OSPF) is a link-state protocol that implements dynamic routing on routing devices. In a link-state protocol, each routing device actively tests the status of its link to each of its neighbors, and sends this information to its other neighbors. This process is repeated for all the routing devices for nodes in the network.
Each routing device takes this link-state information and builds a complete routing table. This method can be used to quickly implement a dynamic routing system, especially in case of changes in the links in the network.
Such a routing system can also perform other functions relating to the network. These features can include calculating separate sets of routes for each type of service, load balancing between equal-cost routes to a destination, and/or calculation of routing metrics based on a cost. This cost can be based on throughput, round trip time, reliability, or other factors. In this manner a separate cost can be assigned for each type of service.
The dynamic routing devices using such mechanisms as link state protocols, like OSPF, can be used to produce a more stable network by using the routing devices to act on network changes in predictable and time effective manner. Typically, the routing device can collect and advertise information about its neighboring routing devices. The routing device calculates and sorts its neighbors, finding all the reachable routing devices and the preferred path to those other routing devices.
Each routing device contains a routing mechanism that controls the forwarding process. Typically, the information is stored in a routing database. This database contains information about interfaces at the routing device that are operable, as well as status information about each neighbor to a routing device. This database is the same for all participating routing devices, and is promulgated by messages to the other routing devices. The routing mechanism may be implemented in software, hardware, or any combination thereof running on the routing device.
The information contained in the routing database focuses on the topology of the networks as a directed graph. Routing devices and networks can form the vertices of such a graph. Periodically, the information is broadcast (flooded) to all the routing devices in the system. An OSPF routing device also computes the shortest path to all the other routing devices in the system regarding itself as the working node (or the root node).
In some typical systems, when the routing device that is a gateway to a number of other devices fails, a backup routing device is then brought online. However, in this case, the failure of the first device to respond to network traffic and/or routing configuration requests may cause the other routing devices in the network to determine that the routing device associated with a node that is no longer reachable.
Typically, upon an indication of a cessation of operation of a routing device associated with a node, the dynamic process of regaining contact to the lost portions of the network may be initiated by the other routing devices that are coupled to the now nonfunctioning routing device. If a backup routing device responds, the neighboring routing devices in the system may have to recompute the values of the connection to the backup routing device now servicing the node. Based upon the new values available to the neighboring routing device, these neighboring routing devices may have to recalculate routing costs and rebuild their internal routing tables.
Upon rebuilding the internal databases, these neighboring routing devices may then distribute the entries associated with the backup routing device to the other routing devices in the interconnected network. Further, upon receipt of the new values associated with the “new” connection to the now functioning backup routing device, the other routing devices in the defined network may then also have to recalculate routing costs and rebuild their routing tables.
During steady-state operations, the routing devices simply exchange update messages, like OSPF Hello messages. These messages are typically small datagrams that advertise only that a particular routing device is still up and running. During synchronization operations, however, a variety of complex messages can be exchanged, depending on the event that occurred, the state of the database, and other factors.
If an interface changes state, only a small amount of database activity is required to fully integrate the information into the area databases on all the routing devices within that area. If a new routing device is brought online, however, that routing device will have to discover all the routing devices, networks and interfaces within its area, and this process can consume a significant amount of network resources.
On broadcast and multi-access networks, many dynamic routing mechanisms support the use of a designated routing device. This lets new routing devices obtain complete copies of the database with minimal network impact. On point-to-point networks, however, each routing device has to obtain data from each of the other routing devices independently. This database-synchronization model represents what is perhaps the greatest challenge with running dynamic routing mechanisms in large, complex networks, since a significant amount of time can be spent maintaining database synchronization in the face of network stability problems.