1. Field of the Invention
The present invention relates generally to fault recovery techniques for a communications network such as a mesh network.
2. Description of the Related Art
In the current communications network, a fault recovery system such as Automatic Protection Switching (APS) and ring fault recovery have been extensively used. APS is described in Chapter 3 of “Fiber Network Service Survivability,” T. Wu, Artech House, 1992. Ring fault recovery is described in Chapter 4 of the same publication. APS concerns fault recovery for a link connecting adjacent nodes, and a working ring and a protection ring are provisioned in advance. When a failure occurs in the working ring, communication is restored by switching traffic to the protection ring. The ring fault recovery scheme is used in a mesh network where a plurality of nodes are interconnected by rings. The network is segmented into a number of rings. When a failure occurs in the network, fault recovery action is performed independently on a per ring basis. While the APS method is only capable of recovering a network from link failure, the ring fault recovery scheme is capable of recovering from both link failure and node failure.
Attention is recently focused on the mesh fault recovery scheme in which the whole network is treated as a single mesh, rather than as multiple rings. While in the ring fault recovery scheme a number of rings cannot share a common backup resource, the mesh fault recovery scheme allows any combination of multiple paths in a mesh network to share a common backup resource if they meet some criteria. Therefore, in most cases the mesh fault recovery method requires less backup resource as compared to the ring fault recovery scheme.
These fault recovery schemes are implemented primarily according to SDH (synchronous digital hierarchy) and SONET (synchronous optical network) standards. However, the recent tendency is toward integrating the control plane of MPLS (multi-protocol label switching) technology with SDH/SONET transport networks. Known as GMPLS (generalized MPLS), routers in the GMLS network make their forwarding decision according to timeslots, wavelengths or physical ports. Mesh fault recovery scheme can be implemented using the GMPLS technology.
In a GMPLS network, each node uses a routing protocol for advertising link-state information indicating the identity of its neighbor and its available network resource to every other nodes of the network. Each node has its own topology database in which the advertised link-state information is stored and maintained. When a path is established, the initiation node of the path references its topology database and performs a route calculation for a possible route to the termination node of the path. When a route is determined, the initiation node sends a signaling message along the route so that the message is able to reach every node on the route.
A mesh fault recovery using GMPLS is described in Internet draftlang-ccamp-recovery-01.txt submitted in IETF by Jonathan P. Lang. In this document, fault recovery is classified into path level recovery and span level recovery. The path-level fault recovery is performed by initiation and termination points of a path and the span-level fault recovery is performed between adjacent nodes of a link. Fault recovery mode is classified into a 1+1 protection mode in which the traffic is simultaneously sent to working and protection routes, a 1:1 protection mode in which the traffic is only sent to working route, and a shared mode such as 1:N and M:N protection modes. When a failure occurs on a path, the network performs a fault location process for locating the failure. The initiation node of the faulty path selects an alternate route so that data may be rerouted around the trouble spot. The alternate route may pass through a node that shares the troubled path.
If a failure occurs on an incoming link to a node where data is split into working and protection paths, the network first identifies the troubled link and then proceeds to perform a fault recovery operation on that faulty link in a span protection mode. If the working path fails, the network identifies the faulty path first and then proceeds to perform a path protection mode, in which the termination node of these paths switches to the protection path so that data is rerouted around the faulty spot. Because of the differences in fault recovery mode depending on the location of path failure, the fault locating process is an important requirement for the prior art communications network.
However, the need to perform a fault locating process places a burden on a network, particularly on optical networks where “optically transparency” is an important consideration for designing optical cross-connect systems. Specifically, if an optical network is required to identify the location of failure on an optical path, the optical path must be monitored at strategic points along the path and the number of such monitoring points would result in an add complexity to the design of optical system with an attendant increase in cost.
Another shortcoming of the prior art is that with the span-level recovery and path-level recovery schemes the network cannot recover from such a failure that occurs in an intermediate or transit node.
Additionally, in a large mesh network, many working and protection paths will be provided and configured in a complex pattern. The topology database of each node would be required to furnish the information as to the node identities and node functions of whether they are initiation or termination points of working and protection paths. Since the network topology tends to vary with time, the topology database must be updated in response to each topology variation. However, this is a formidable task to implement.
A further shortcoming of the prior art is that since the attributes of the path that carries user's traffic vary with time depending on which of the available routes it takes to destination, the prior art path protection method is complex from the view point of quality management for user services.
A still further shortcoming of the prior art is that it cannot perform fault recovery with a desired level of resource granularity. In a GMPLS network, in particular, a single optical link may carry sixty-four WDM channels each transporting sixty-four TDM channels of multiple packet transmission paths each. If the optical transmitter of certain wavelength should fail, sixty-four TDM paths will be lost. If the granularity of fault recovery is equivalent to a TDM path, recovery process will be repeated sixty-four times to restore all TDM paths. If fault recovery is performed at the granularity of a packet transmission path, the recovery process must be repeated a greater number of times. Since the recovery process of a single path involves exchanging of signaling messages, the signaling traffic would be enormous. Usually the signaling channel has a narrow bandwidth. Hence it takes a long recovery time. This is particularly true of shared path protection. On the other hand, if fault recovery is performed for a single TDM path failure at the granularity of a wavelength channel, the process may be simple and free from overwhelming signaling traffic. However, sixty-three other TDM paths would also be switched over to backup network resource, resulting in a substantial wastage of network resource.
Additionally, in a multi-domain network, precision topology data is not exchanged between different domains. There exists a need for a mechanism to implement a fault recovery system that enables each domain to collaborate in a consistent manner when the network is affected by an inter-domain path failure.