This invention relates generally to generation and distribution of routing information in a communications network to create a fault-tolerant network and more particularly to methods of rerouting protected connections in a communications network in the event of a failure of a link or switch of the network.
Dependence on communications networks today is increasing as both voice, video and data traffic over these networks increases. Communications networks, once thought to be a convenience, are now becoming a necessity. However, as this dependence on communications networks increases, the reliability of the communications network becomes a major issue. Failures in the network can cause the loss of large amounts of information and impact a significant amount of commerce that relies on the integrity of the network.
Recognition of the failure condition of a communications network and a response to the failure condition are important functions that must be integrated into the operations of any communication network. Parameters such as the time to recover from a failure condition, the probability of recovering from a failure condition, and the amount of capacity in the telecommunications network that must be dedicated to handling a failure condition give an indication of the quality of fault tolerance for a communications network.
One kind of communications network that is hard hit by a failure is a label switched network. Such a network has the characteristic that a circuit is used to interconnect source and destination elements outside the network. A circuit, in this context, is a physical or virtual connection established over a series of links and switches in the network. When a circuit is disrupted by a failure, all of the communications traveling over the circuit are disrupted and some kind of failure recovery system is needed to permit the original circuit traffic to flow through the network from its source to its destination.
Use of spare physical links in the network, which are normally idle, is one category of failure recovery method. This technique takes advantage of unused capacity in the network to bypass physical links in the network that have been determined to be out-of-service due to some failure condition.
An aim of the physical recovery technique is to be transparent to individual connections so that equipment located at the ends of a connection need not take any part in failure recovery. This type of recovery is also very fast because the amount of failure reporting and failure processing is independent of the number of connections traversing the network and because restoration capacity can be provided based on the physical configuration of the network rather than on the actual pattern of connections.
One drawback of the physical recovery technique is that double the capacity is required so that all connections using a particular failed link can be moved onto the spare link. The spare link remains idle during the time that it is not being used to restore a failure condition in the network. This limits flexibility in optimizing network usage and limits the sharing of the back-up link for different, independent failure conditions (known as restoration bandwidth multiplexing). Also, there is no ability to give higher priority to restoring some of the connections which required the use of the failed link. Thus, the physical recovery technique requires excess capacity in the network, capacity which could be used to improve the overall utilization of the network.
Another category of failure recovery is a technique by which one end of a connection takes responsibility for recovery of the failure interrupting the connection by re-signaling the connection through the network. Connection-based recovery allows for more efficient use of network resources because a series of links used for the recovery route can be completely independent of the original set of links in the primary route and because the granularity of recovery is smaller. A connection uses only a fraction of each link""s capacity in its route through the network.
This category of recovery also has certain drawbacks. One drawback is the appreciable size of the processing and communication load placed on the network to re-establish all of the failed connections. Another drawback is the time to effect a restoration of the connections, typically more than a round trip time for each connection through the network from the source switch to the destination switch. However, this method does not require that the network have the capacity to reroute all connections, and so does not suffer from the disadvantage of idle capacity dedicated to failure recovery. If dedicated failure recovery capacity does exist in the network for this category of recovery, it can be shared for different connections and different failure conditions.
Based on the above there is a need for a failure-recovery method and apparatus that rapidly restores one or more selected connections from a failure condition and that does not require double the capacity in the communications network to effect the restoration of disrupted connections.
The present invention is directed to satisfying the above need. A method in accordance with the present invention includes, in a communications network having switches and transmission links that connect to the switches, the steps of providing mapping information to each switch to implement primary and recovery plans for the connection, where the recovery plans include routes derived under the assumption that switches associated with the failure are not present in the network and then transmitting a report from each switch that detects a failure. Next, at each switch a failure condition is determined from the failure report, where the failure condition is a switch associated with the failure and at each switch and based on the failure condition, mapping information is selected that implements a recovery plan for the connection.
Another method in accordance with the present invention includes, in a communications network having switches and transmission links that connect to the switches, the steps of creating a representation of the topology of the communications network and then injecting a failure into the network representation. Next, the representation is modified by removing from the representation a switch that is associated with the failure and all links connected to the switch. A recovery route through the representation of the modified network is then determined after which recovery actions, in the form of mapping information, are derived for each switch in the recovery route.
An advantage of the present invention is that the time-consuming process of finding recovery routes through the network for failed switches or links is performed during a pre-failure planning process, thus eliminating this time from an actual recovery and speeding the recovery process when a failure occurs. Independent recovery actions performed at each switch allow for restoration to proceed in parallel in each switch.
Another advantage is that half the capacity in the network does not need to be set aside to handle failures. Instead, the pre-failure planning uses the existing capacity and a smaller amount of idle capacity in the network to find recovery routes. Additionally, this method and can pre-plan the successful rerouting of all connections, guaranteeing that there exists backup capacity to re-route all failed connections.
Another advantage is that either a central site or each switch in the network can implement the recovery plan because the recovery action at each switch is simple, being based only on the failure reports reported by switches in the network that detect a failure.