With the widespread deployment of fiber optic transmission systems and the alarming rate of outages due to fiber cuts, there is great interest in improving the process of restoring disrupted traffic from minutes to sub-seconds following a fiber cut. Automatic protection switching probably is the fastest technique and can switch the disrupted traffic to dedicated spare links in under 50 milliseconds. However, it also requires high dedicated spare channel capacity. With recent advances in digital cross-connect systems (DCS), there is increasing interest in using DCS in network restoration. But the centralized DCS-based network restoration method requires reliable telemetric links between the DCS nodes and the network operation center. Moreover, it is slower than distributed DCS-based network restoration, where the affected DCS nodes exchange messages directly to restore the disrupted traffic. The hybrid preplanned approach proposed in Bellcore's NETSPAR uses a distributed topology update protocol to identify the fault and then downloads a precomputed routing table according to the fault. The problem with the NETSPAR approach is that a great amount of memory is required for storing the routing tables.
There are two basic approaches to reroute the disrupted traffic due to a fiber span cut. The link restoration approach replaces the affected link segment of a disrupted channel by a spare path between the two disrupted ends. The path restoration approach releases each disrupted channel and lets the source and destination end of the channel re-establish the connection. With the additional release phase the path restoration approach takes more time than the link restoration. However, the path restoration approach can find more efficient spare paths with fewer link segments and can handle the node failure situation with the same logic. To achieve fast network restoration, the link restoration approach is used.
Routing algorithms proposed for computer networks find the shortest paths from each node to all other nodes in a network. The approximate distributed Bellman-Ford algorithms has polynomial message complexity and very fast response time. However, the goal of routing algorithms is quite different from that of distributed network restoration, which is to find the shortest paths between two disrupted nodes. The response time requirement of the network restoration algorithm is almost two orders of magnitude of that imposed on routing algorithms.
The network restoration problem can be formulated as a maximum flow problem (See book by T. H. Cormen el. al. entitled "Introduction to Algorithms", The MIT Press, 1990; and "Distributed Link Restoration with Robust Planning" by J. E. Baker in Proceedings of GLOBACOM '91, pp. 306-311, Dec. 1991) where the goal is to find all the disjoint paths with the maximum flow between the two disrupted nodes. The efficient distributed maximum flow algorithm proposed by Goldberg and Tarjan ("A New Approach to Maximum-Flow Problem" Journal of the Association for Computing Machinery, Vol.35, No. 4, pp.921-940, Oct. 1988) requires O(n.sup.2) waves of messages and O(n.sup.3) messages, where n is the number of nodes in the network. The Goldberg el. al. algorithm, in finding the potential paths for restoration, can only be applied in small networks and is used in conjunction with a path selection algorithm to generate results as a benchmark to compare the efficiency of other network restoration algorithms.
The first distributed network restoration method for a DCS-based fiber network was proposed by W. D. Grover in "The Self-healing Network: A Fast Distributed Restoration Technique For Networks Using Digital Cross-Connect Machines", Proceedings of GLOBECOM '87, pp. 28.2.1-28.2.6, 1987 and detailed in his 1989 Ph.D. dissertation for the Department of Electrical Engineering at University of Alberta entitled, "Self Healing Networks: A Distributed Algorithm For K-Shortest Link-Disjoint Paths In A Multi-Graph With Applications In Real Time Network Restoration". The protocol associated with the algorithm is called the Self-Healing Network (SHN) protocol. In the SHN protocol, one of the two DCS nodes, on detecting the fiber cut, becomes the Sender based on some arbitration rule, such as larger DCS network ID, and the other becomes the Chooser. Then, the request messages, called signatures, are sent out along all the spare channels on all outgoing fibers. These signatures each bear different indices and are broadcasted to the intermediate nodes between the Sender and the Chooser. On receiving a signature, the Chooser checks the index. If it is the first time that the Chooser has received a signature with this index number, then a reply signature is sent back through the same request path. Upon receipt of a reply signature, each of the intermediate nodes generates a switch command to the DCS to connect the ports of the two spare channels. This is called reverse linking. When the Sender receives the reply signature, it reconnects one of the disrupted channels to the new spare path and sends the information of the restored channel ID through the spare path back to the Chooser. The Chooser, on receiving the restored channel ID information from the spare path, reconnects the corresponding ports to restore the disrupted channel. Thus, the Grover protocol basically requires three message transmissions between the Sender and the Chooser for each disrupted channel.
For SHN, that one signature is sent out for each spare channel connected to Sender and is indexed for distinction. On the same route, these signatures compete for computation resources in each DCS node for processing.
The execution steps of Grover's approach for a single channel restoration route is shown with reference to FIG. 1. As shown, at Step 1, the Sender broadcasts a signature towards the Chooser. At Steps 2 and 3, the signature is propagated along the available spare channels on other outgoing links. At Step 4, after receiving the signature, the Chooser sends a reply signature back towards the Sender. At Steps 5 and 6, on receiving the reply signature, the intermediate node makes the DCS connection and forwards the reply signature along the restoration path. At Step 7, the Sender receives the reply signature and selects one disrupted channel to connect the restoration path. A mapping information message including the ID of this disrupted channel is then sent through the connected restoration path directly to the Chooser without the need for message processing in the intermediate nodes. Finally, at Step 8, on receiving the mapping information message, the Chooser connects the restoration path to the corresponding disrupted channel to complete the restoration.
Another distributed network restoration process for DCS-based fiber networks has been proposed by Yang and Hasegawa in "FITNESS: Failure Immunization Technology for Network Survivability," Proc. of GLOBECOM '88, pp. 47.3.1-47.3.6, Nov. 1988. This method became known as Bellcore's FITNESS approach. It also uses a Sender Chooser relationship for the nodes adjacent to the cut fiber link. The FITNESS approach reduces the potentially large number of signatures that may be generated in SHN by requesting the aggregated maximum bandwidth that is allowed on a restoration route. Specifically, the restoration process is initiated by the Sender's broadcast of restoration request messages, called help messages, on all links which contain spare channels. Each help message contains the Sender address, Sender Chooser pair ID, source of the message, destination of the message, requested bandwidth and hop count. Requested bandwidth is the minimum of the working channels lost due to the fiber link cut and the spare capacity of the particular link over which the specific help message is being broadcasted.
Help messages are selectively broadcasted by intermediate nodes, each of which maintains a table of the help messages it has received. This table contains the source of the help message, requested bandwidth and hop count of the path from the Sender. The first help message received is always broadcasted. Successive help messages are broadcasted only if the requested bandwidth is greater than all previously received messages. Received help messages with a requested bandwidth that equals to earlier messages but with 1 lower hop count are not broadcasted. Instead, table hop count and source entries are modified to reflect the discovery of the shorter path. Help messages with lower bandwidth than any table entries or equal bandwidth and higher hop count are ignored. The requested bandwidth in help messages broadcasted by intermediate nodes are the minimum of the arriving messages' requested bandwidth and the spare capacity of the link over which the help message is being sent.
On detection of fiber link failure, the node which becomes the Chooser sets a fixed time-out. The length of time for time-out is determined empirically, with optimal choice being in the range of 250 to 350 msec. During time-out, the Chooser maintains a table of all received help messages. On the termination of time-out, the Chooser selects the table entry corresponding to the largest requested bandwidth and sends an acknowledgment message to the source of the selected help message.
On receipt of an acknowledgment message, each intermediate node replies with a confirmation message and then sends the acknowledgment message to the next node along the path to the Sender. As this process continues, on receipt of a confirmation message, each node makes cross connections to restore lost working channels. If a single restoration path provides insufficient bandwidth to affect full restoration of all lost working channels, the Sender initiates a new wave of help messages. This process is repeated until all channels are restored, or no new paths can be found.
FIG. 2 shows the execution steps of the FITNESS algorithm on a restoration route. Note that these steps are similar to those of SHN except that at Step 4 the Chooser makes the DCS connection according to the available bandwidth and sends the mapping information message with the IDs of the restored channels to the Sender. This allows the FITNESS approach to have one less step than SHN. The messages in the FITNESS approach are longer than those in SHN. The request message contains the maximum capacity and hop count of the restoration route being explored. The acknowledgment message in the FITNESS approach contains the mapping information of a set of restored channels instead of one in SHN. The message transmission time in the FITNESS approach therefore is longer than that of SHN but the message processing frequency in each node is greatly reduced.
RREACT is another distributed approach to network restoration and is described in detail by some of the co-inventors of the instant invention in "RREACT: A Distributed Protocol for Rapid Restoration of Active Communication Trunks", UCCS Tech Report EAS-CS-92-18, Nov. 1992. This method also uses a Sender-Chooser and flooding approach as in the FITNESS and Self-Healing approaches. What distinguishes this method from the other approaches is that the restoration request messages, called seek messages, include information about the path which each seek message has traversed from the Sender node to the Chooser node. Seek messages are propagated at each node as they arrive. As they are transmitted from node to node, the node ID's of the nodes they have visited and the number of spare channels available over each link they have traversed are added to a variable length field in the seek message format. This path information is inspected at each node as a seek message is received. If the message has visited the node before, it is discarded.
The effect of this flooding approach is that all possible paths between the Sender and Chooser nodes are discovered and information is received at the Chooser node to enable it to create a matrix describing the current topology of the network. As each seek message arrives at the Chooser node with a new path description, the matrix is updated and the path is selected, if possible, for use as the restoration path. The matrix is again updated to account for the spare channels taken to create the new restoration path.
The actual path building process of the RREACT approach is very similar to that of the Fitness approach shown in FIG. 2.
Most of the existing distributed network restoration algorithms therefore utilize a three phase restoration process. Needless to say, such three phase process requires time. Thus, in order to enhance the operational efficiency of a telecommunications network, the time required to restore a network from disrupted traffic needs to be dramatically reduced.