The present invention relates to network fault recovery method and apparatus and is particularly concerned with recovery at higher layers from physical layer faults.
Currently, the traffic reliability of large telecommunications networks such as core networks used for Internet service providers (ISPs) or for major corporate backbones is dependent upon the traffic protection resources built into the network elements. To ensure that the desired availability of network connections is maintained and protected, it is standard practice in the telecommunications industry to rely on routing algorithms for handling link or equipment failures. However, with a typical failure reaction time of 30 seconds, conventional routing protocols are inherently too slow for today""s high speed networks. This results in inappropriate transmission down time, particularly for video and voice transmission.
A faster solution conventionally used to protect network connections consists of implementing protection in the physical layer (layer 1) of the network by installing redundant equipment so that if one physical link fails, another can rapidly be switched into place.
By contrast to relying on the routing protocols for protecting the availability of network connections, the installation of redundant equipment results in a much faster failure reaction time which, for example in SONET rings is usually in the neighbourhood of 50 milliseconds.
Redundancy of equipment has long been accepted by carrier grade networks as a way to ensure availability and reliability. However networks not requiring carrier grade protection, still desire rapid recovery from physical failures, particularly in high throughput links such as carried in optical fiber, e.g. OC-192.
However, the use of redundant layer 1 equipment for protection presents a number of disadvantages. First, more network links must be installed. For example, current protection configurations which require the installation of additional fiber links between network nodes include dedicated protection (1 protection fiber for each fiber link also referred to as 1:1 protection), shared protection (1 protection fiber for N fiber links or 1:N protection) and ring protection.
The accommodation of multiple fiber links necessitates replicating some of the equipment relating to optical link budgets at each network node. Duplicating this equipment may prove to have a major impact on the overall cost of the network.
In addition to the high cost associated with installing additional equipment for traffic protection, another drawback of the use of redundant layer 1 equipment is that the additional bandwidth capacity created therefrom is exclusively dedicated to traffic protection and remains unused, or is pre-emptable, in the absence of network failures. This increases the cost of the bandwidth.
In view of the slow reaction time of the routing protocols, the high cost and the inefficient bandwidth management associated with the use of additional layer 1 equipment, it is desirable to provide a cost-effective and efficient protection mechanism which provides adequate reaction time to failures and maximizes the utilization of the available resources present in the network.
An object of the present invention is to provide an improved network fault recovery method and apparatus.
In accordance with the present invention L1/L2/L3 Integration and L1 cut-through path utilization are provided in an apparatus and method of fault recovery.
In accordance with an aspect of the present invention there is provided a switch which combines an IP router with L2 capabilities, and an L1 cross connect (optical or electrical).
In accordance with another aspect of the invention there is provided a network in which switches are configured with label switched paths (LSPS) that correspond to layer 1 (L1) cut-through paths.
Conveniently, a layer 2 (L2) cut-through path is over laid on the L1 cut-through path and the L2 cut-through path is used for IP data flows.
Preferably, the L2 cut-through paths are defined as label switched paths (LSPs). And the L1 cut-through paths are each an end-to-end path established with L1 cross connects associated with each switch.
In accordance with another aspect of the present invention a method is provided in which upon failure of a physical link, all LSP endpoints associated with affected L1 cut-through paths are notified by physical detection methods.
Preferably, label switch paths are defined corresponding to a respective L1 cut-through path, the MPLS entity managing an LSP is notified of LSP failures that correspond to L1 cut-through path failure, and backup procedures are then executed to restore IP forwarding.
According to an aspect of the present invention there is provided a method of fault recovery for a network including the steps of establishing a physical topology for the network, aligning a logical topology for the network with the physical topology, and using a fault indication from the physical topology to effect fault recovery in the logical topology.
In accordance with another aspect of the present invention there is provided an apparatus for data networking comprising a cross connect for switching at a physical layer, a router for redirecting data packets at a logical layer coupled to the cross connect, and a fault recovery mechanism responsive to a fault indication in the physical layer for effecting a recovery in the logical layer.
Conveniently, the router includes an internetworking protocol (IP).
Preferably, the internetworking protocol includes multi-protocol label switching (MPLS).
In accordance with another aspect of the present invention there is provided a network comprising a plurality of nodes, each node including a cross connect for switching at a physical layer, a router for redirecting data packets at a logical layer coupled to the cross connect and a fault recovery mechanism responsive to a fault indication in the physical layer for effecting a recovery in the logical layer, a plurality of physical connections between nodes via the respective cross connects, a plurality of logical routes between nodes via the respective routers, and an alternative logical route for use by the fault recovery mechanism.
In accordance with another embodiment of the present invention there is provided in a network including a plurality of nodes and having a plurality of communications layers, a method of providing fault recovery comprising the steps of aligning at least a first and second layer of the plurality of communications layers, for a given path in the first layer, defining a corresponding path in the second layer and an alternative path in the second layer, the alternative path in the second layer corresponding to an alternative path in the first layer disjoint from the given path, and on detection in the first layer of a fault in the given path, switching in the second layer from the corresponding path to the alternative path, whereby fault recovery in the network is provided
Advantages of the present invention include faster recovery from layer 1 failure than provided by L3 routing algorithms and integration of the layers 1, 2 and 3 networks into a common topology (a network management simplification and potential equipment cost saving).