With the widespread deployment of fiber optic transmission systems in communications networks and the alarming rate of outages due to fiber cuts, there is a serious need to improve the process of restoring traffic disrupted by network failures. So, too, with recent advances in digital crossconnect switch systems (DCSs), there is an increasing interest in utilizing DCSs in network restoration. Such DCS based communications networks are generally organized into a mesh topology, as compared to some other network topologies, so as to realize economic benefits from greater sharing of the transmission facilities in the network.
Automatic restoration techniques for networks having a mesh topology are broadly grouped into two categories: (a) centralized and (b) distributed.
A centralized DCS-based network restoration method requires a central network control center having a database containing derailed information on the network topology and the available transmission resources within the network, and reliable communications links between the DCS nodes and the network control center. Network restoration is achieved by the central control center, which figures out the alternate paths to be used in the event of a failure in the network, communicating appropriate instructions to the participating nodes once it has isolated a failure. The centralized approach, in general, takes longer to restore failed connections than a distributed approach.
There are two basic distributed approaches for mesh network restoration. They are link restoration and path restoration.
The link restoration approach attempts at replacing the affected link segment of a disrupted channel with one or more alternate route segments between the two end nodes of the disrupted link, irrespective of the number of traffic paths or circuits supported by the disrupted link. The path restoration approach, on the other hand, attempts at restoring each disrupted path or circuit within a failed communications link independently of the other disrupted circuits; and generally can provide better utilization of the spare capacity than the link restoration method. A hallmark of the distributed restoration approaches is that, in general, they require very little information at every node in the network for the purposes of recovery from a failure; and by and large, rely upon flooding of messages throughout the network, upon locating the point of failure, to seek alternate paths and to reserve spare capacity for restoring the disrupted connections.
A distributed link restoration approach is described in U.S. Pat. No. 4,956,835, issued Sep. 11, 1990 in the name of Wayne D. Grover. Grover teaches a method whereby, upon locating the failed link, one of the custodial nodes bracketing the link assumes the role of a sender and the other a chooser. Then, on spare channels emanating from it, the sender sends out forward flooding or restoration signatures (or messages), each signature having an unique index, which are rebroadcast by the intermediate nodes and ultimately reaching the chooser. As the forward flooding signatures travel on particular spare links, those spare links are reserved for potential use in restoration of the interrupted connections between the sender and the chooser. Each forward flooding signature arriving at the chooser signifies one potential alternate path that could be used for such restoration. From the available alternate paths the chooser makes one or more selections as may be necessary to restore the traffic lost on the failed link.
An inherent characteristic of the distributed restoration technique such as the one described above is that, in general, far more spare capacity than necessary gets reserved for the restoration of a failed link. Because of this phenomenon, a shortcoming of the aforementioned link restoration technique is that, when a single failure cuts across multiple links such that multiple sender--chooser pairs simultaneously invoke recovery, even when there is sufficient spare capacity to restore traffic on all of the failed links, not all sender--chooser pairs might succeed in their quest to restore traffic on their respective failed links. This is due mainly to the misappropriation of the spare capacity.
This shortcoming is illustrated with the network in FIGS. 1A to 1B. In FIG. 1A, a cut at point 170 severs links 121 and 123 and consequently severs working channels 131 and 133, along with spare channels 151 and 153. For sake of clarity, all working channels and spare channels are each assumed to possess transmission capacity of one trait. The object of restoration, as discussed herein, is to find alternate paths with sufficient spare capacity for rerouting the traffic on the severed working channels only. No attempt is made to restore the failed spare channels.
Sender 101 and chooser 104 in combination attempt to restore the traffic on working channel 131, whereas sender 102 and chooser 103 attempt to restore the traffic on working channel 133. Upon locating the fault, sender 102 sends out forward flooding or restoration signature 181 on spare channel 154, signature 182 on spare channel 155, and signature 183 on spare channel 156. Similarly, sender 101 sends out forward flooding signature 191 on spare channel 152. As a signature travels on a spare channel, the particular spare channel is reserved for the alternate path, if any, that the signature might ultimately help create.
The forward flooding signatures get rebroadcast as shown in FIG. 1B, where node 104 forwards signature 181 on spare channel 157 and node 105 forwards signature 182 on spare channel 158. Signatures 183 and 191 do not advance as there are no free spare channels available. At the next step, signature 182 also gets blocked due to lack of a spare channel. In FIG. 1B, forward flooding signature 181 finally reaches chooser 103 signifying that the path travelled by that signature, namely the path comprising spare channels 154 and 157, can be used to restore the traffic carried by working channel 133. However, the flooding actions of sender 102 are responsible for reserving additional spare channels 155, 156, and 158, which although not used in the restoration of traffic on working channel 133, do block restoration of traffic on working channel 131. Putting it simply, the process of simultaneous restoration for the cut express link between node pair 101 and 104 is blocked by the restoration process of node pairs 102 and 103. This blockage may be a temporary phenomenon where some fixes can be devised, but cases can be demonstrated in the network where such a fix is not possible, thus rendering the method to be unreliable.