In communications networks, there are two types of mechanisms for handling network failures: protection and restoration. Protection usually denotes fast recovery (e.g., <50 ms) from a failure without accessing a central server or database or attempting to know the full topology of the network. Typically, protection can be achieved either by triggering a preplanned action or by running a very fast distributed algorithm. By contrast, restoration usually denotes a more leisurely process (e.g., minutes) of re-optimizing the network after having collected precise topology and traffic information.
Protection can occur at several different levels, including automatic protection switching, line switching and path switching. The most basic protection mechanism is 1:N automatic protection switching (APS). APS can be used when there are at least N+1 links between two points in a network. N of these links are active while one is a spare that is automatically put in service when one of the active links fails. APS is a local action that involves no changes elsewhere in the network.
Line switching is another protection mechanism which is similar to APS except that the protection “line” is actually a multi-hop “virtual line” through the network. In the case of line switching, all of the traffic using the failed line is switched over the protection “virtual line”, which can potentially cause traffic loops in the network. An example of line protection switching occurs in the case of a SONET (synchronous optical network) bidirectional line switched ring (BLSR).
A third protection mechanism is path switching. In path switching, the protection that is provided in the network is path specific and generally traffic loops can be avoided. Path switching is generally the most bandwidth efficient protection mechanism; however, it suffers from the so-called “failure multiplication” problem wherein a single link failure causes many path failures. There are two approaches to path protection: passive and active.
In the passive approach, data is transmitted in parallel on both a working path and a protection path. The destination node selects between the two paths, without requiring any action from upstream nodes. Passive path switching is prevalent in the case of a SONET unidirectional path switched ring (UPSR) in which all of the traffic goes to (or comes from) a hub node. One drawback with the passive approach is that it wastes line and switch capacities.
In the active approach, a message is sent toward the source (starting from the point of failure) to signal the failure and to request a switchover to a protection path at some recovery point. There are two basic ways of signaling the failure: explicit and implicit.
In the explicit method, the node discovering the failure sends a message upstream on all paths that use the failed element. This message should eventually reach a recovery point. Unfortunately, the process of scanning lists and sending numerous distinct messages (possibly thousands in a large network) can be time consuming. In the implicit method, the node discovering the failure broadcasts a notification message to every node in the network. That message contains the identity of the failed element. Upon receiving such a message, a node scans all the protection paths passing through it and takes appropriate actions for paths affected by the failure.
Except in very large networks where the number of links vastly exceeds the number of paths per link, the implicit method is generally faster because it requires fewer sequential message transmissions and because the propagation of messages takes place in parallel with recovery actions. However, having a node find out which of its paths uses a failed network element can be a lengthy process, potentially more demanding than finding all paths using a failed network element.