This invention relates to a connection-oriented network, and more specifically to a method and an apparatus for resuming traffic rapidly after a failure in a network element.
Routing traffic using shortest path algorithms (e.g., Interior Gateway Protocol (IGP) such as implemented by Open Shortest Path First (OSPF) or Intermediate System to Intermediate System (IS-IS)) contributes significantly to congestion problems in a network. Because IGP is topology-driven, bandwidth availability and traffic characteristics are not considered when making routing decisions. An overlay model, such as using EP-over-ATM or IP-over-Frame-Relay, provides a virtual topology on top of the physical topology and can alleviate traffic congestion. The overlay model supports constraint-based routing to configure and maintain a virtual topology.
In a connection-oriented network, such as X.25, Frame Relay, or ATM networks, packets are routed based on a virtual topology consisting of virtual circuits (routes). At the beginning of a transmission, a connection is established and every packet belonging to a given connection is transmitted through the same established route. In practice, a communications protocol, such as RSVP, signals a router to reserve bandwidth for real-time transmission.
In conventional connection-oriented networks, such as IP-over-ATM, each node communicates with every other node by a set of permanent virtual circuits (PVC) that are configured across the ATM physical topology. In the conventional model, the nodes only have knowledge of the individual PVCs that appear to them as simple point-to-point circuits between two nodes. Furthermore, the physical paths for the PVC overlay are typically calculated by an offline configuration utility on an as-needed basis, such as when congestion occurs, or a new link is added, etc. The PVC paths and attributes are globally optimized by an offline configuration utility based on link capacity and historical traffic patterns. The offline configuration utility can also calculate a set of secondary PVCs that is ready to respond to failure conditions.
The connection-oriented network has an advantage over other types of network models in that it does not require complete address information for every packet after the connection has been established. Instead, only a short connection identifier is included with each packet to define the virtual circuit to which the packet belongs. For example, in a Multiprotocol Label Switching (MPLS)framework, a label is attached to a packet as it enters the network. Forwarding decisions are based on the attached label without consulting the original packet headers.
Internet Protocol traffic is widely carried over the Synchronous Optical Network (SONET) lines, either using ATM as a management layer or over SONET directly. In such a network, failure of a network element will cause a loss of service until a new connection can be established.
SONET uses a self-healing ring architecture capable of rerouting traffic if a line goes down. The restoration time is on the order of 50 milliseconds. For service providers who need to provide voice over IP and other high reliability services, a fast reroute time compatible with the SONET restoration time of 50 milliseconds is required.
There are generally two conventional approaches to providing fast reroute, both requiring the use of signaling protocols. One approach is to signal the failure back to an ingress node where the packet enters the network. The ingress node recomputes and establishes an alternative virtual circuit as soon as possible. However, given that the signaling time required to propagate a signal for a round-trip across the continental United States is about 75 milliseconds, this approach is too slow to be compatible with SONET's restoration time of 50 milliseconds.
In a second conventional approach, a master server monitors the network and pre-establishes alternative virtual circuits. The master server is notified of a failure and directs traffic to an alternative virtual circuit. However, the signaling between the master server and the failed elements still causes delay. Furthermore, if the failed node or link carries multiple virtual circuits, multiple signaling can create a peak in both the processing requirements and the bandwidth utilization.