In today""s high-speed communication networks, each cable or fiber carries several thousand voice or data circuits. Such large network capacity provides advantages in terms of lower costs and greater payload flexibility. Fiber optic networks enjoy such advantages and other additional advantages, such as improved transmission quality. Whatever the cable technologyxe2x80x94wire or fiberxe2x80x94interruptions in communication service are not uncommon. Networks have been known to suffer damage from backhoes at construction sites, power augers, lightning, rodents, fires, train derailments, bullets, vandalism, car crashes, ship anchors, trawler nets, and diverse other mishaps. Given the ubiquitous nature of communications today with its intimate involvement in business, medicine, finance, education, air traffic control police and other government agencies, and other aspects of modern life, it is imperative that interruptions in network operations be restored as quickly as possible in the event of a failure.
The precarious integrity of networks has been recognized, and approaches have been developed for effecting restoration of cable breaks. One such approach has been the implementation of automatic protection switching (APS). APS systems restore service by switching to a dedicated standby system. That is, there are two complete sets of links installed in an APS system so that each link has a back-up link ready and waiting to serve.
In the context of this specification, the term xe2x80x9clinkxe2x80x9d is intended to indicate any communication path intermediate two adjacent nodes, or communication units, in a communication network. Adjacent nodes are communication units, such as cross connect systems, that are connected by a span. There can be more than one link in a span; a span is the set of all links in parallel between two adjacent nodes.
Another recovery system approach is the self-healing ring. Self-healing rings (SHR) vary in the details of their implementation, but they can be conceptualized as an extension of either a 1:1 (100% redundancy) or a 1:N (greater than 100% redundancy) APS system. APS and SHR systems can effect recovery in 50 to 150 milliseconds. Such rapid recovery is very good, but the cost of such systems is prohibitively high except or the most critical of networks, such certain banking, medical, or stock market systems.
Mesh networks have been recognized a useful in providing flexibility in recovering from network interruptions. A mesh network is a network in which each node may be connected to all other nodes in the network via links to adjacent nodes. By using intelligent internetworking devices, such as nodal multiplexers in a T-carrier network, transmissions may be routed over an alternative path should the primary (direct) path between two sites be interrupted. Such interruption may be occasioned by congestion, or by a physical or electrical failure.
Centralized restoration in a mesh network has been attempted, with the calculation of a restoration path being effected at a central location within the network using data stored at that central location. After determination of the restoration path, the information is promulgated throughout the network for implementation. Such centralized restoration systems have not succeeded in restoring network communications in less than times in excess of one minute. With the high capacity time-sensitive information being carried on networks today, such a recovery time is unacceptably slow.
Distributive restoration in a mesh network is another approach that has been discussed in attempting to accommodate restoration of a network. This distributed approach recognizes that digital cross-connect switches employed at nodes in a mesh network are computers, and they collectively represent considerable processing power embedded in a fabric of multiple communication links. In such a distributed approach, every node (digital cross-connect switch) will perform to effect restoration as required in an apparently isolated manner, with no network-wide knowledge of the system. The independently deduced cross-connection decisions of each node will, in the aggregate, collectively constitute effective multipath rerouting plans.
Most of the distributive restoration systems are less costly than an APS or an SHR system. However, the trade-off is that recovery time is not nearly as fast with the duplicative recovery systems. This stands to reason since there is no dedicated link to which traffic can be routed with very little delay. Most distributive restoration systems depend upon a flooding of the network with messages once an interruption is detected. The flooding messages explore all routes then viable in the network. The route (a series of spans denoted by a concatenation of nodes that establishes a way through the network) is sen according to some predetermined route-choosing criteria. Such criteria may include the first (shortest) path identified, the greatest-capacity path, the inclusion of specified nodes within the path, the greatest path length efficiency, the fastest path, or other parameters.
Distributed restoration systems that determine restoration paths after a failure is detected rarely are capable of effecting restoration in less than one second. Such a delay is still unacceptable.
Grover (W. D. Grover, xe2x80x9cDistributed Restoration of the Transport Networkxe2x80x9d, IEEE Network Management Into the 21st Century, Chapter 11, February 1994) proposes distributed preplanning for restoration using a digital restoration algorithm. According to Grover""s proposal, a self-healing network protocol is executed for each possible span failure in the network. This is accomplished by a full execution of the self-healing network protocol, but without actually making any cross-connections to effect rerouting. Instead, each node is to record the cross-connections it would have made according to the self-healing network protocol, and save those cross-connections in a table. In such manner, each node will have stored in a table the instructions for that node""s portion of the response to the self-healing network protocol for each and every respective span of the network. When a failure occurs, the network promulgates an alerting message and any alerted node having non-null actions in its respective table makes the internal cross-connections between spare ports that are listed in its table.
Grover""s proposed distributed preplanning involves storing in a table at each node each and every connection that node must participate in for each and every failure case. Such a table takes a significant amount of computing to amass, and a significant amount of time to complete. Grover himself acknowledges that there is a window of vulnerability on the order of seventeen minutes for a 100-span network. According to Grover, alerting can be accomplished either by an activation loop established through all digital cross-connect system (DCS) nodes, or by disseminating the alert through simple flooding. By either of Grover""s alerting schemes full promulgation of that message necessary to effect restoration configuration, as each node xe2x80x9cconsultsxe2x80x9d its respective table to determine how to participate in the restoration evolution, takes time and network capacity as well. The complexity of constructing Grover""s all-connection tables is also further cause for concern as the more complex an operation is, the more aught with opportunity for error it is. Said another way, as a general rule, the more complex a system, the less robust and reliable it is.
Further, Grover does not address how or when the system updates its information regarding which links in the network are actually spare links and available for use in restoration operations. He provides that links used in a restoration path are identified as xe2x80x9cin usexe2x80x9d, but no allowance is made to identify when a link is not available for restoration operations for any other reason, such as a system reorientation, new subscribers on the system causing use of an additional (previously unused) link, or similar situations.
There is a need for a restoration system for a communication network that is robust and reliable. Such a system should have a relatively simple and efficient approach to identifying restoration paths through the network, alerting appropriate network locations of the need for their participation in restoration on a timely basis, and automatically updating the restoration information periodically in a self-learning mode of operation.
The invention is a method and apparatus for distributed managing of restoration paths within a communication network. The network includes a plurality of nodes connected by a plurality of internodal links, selected links of the plurality of internodal links being assigned links, other links being unassigned links. The preferred embodiment of the method of the present invention comprises the steps of: (a) establishing for selected nodes of the plurality of nodes a spare link catalog identifing each extant unassigned link connecting each selected node with an adjacent node; (b) operating each selected node as a probe originating node to send a probe message to each adjacent node over at least one unassigned link, preferably over one unassigned link; (c) evaluating the probe message at each adjacent node according to predetermined message handling criteria; (d) at each adjacent node, forwarding the probe message as a forwarded probe message to subsequent nodes adjacent to each adjacent node, or discarding the probe message, such forwarding or discarding being determined by the evaluating; (e) appending message content to the forwarded probe message indicating the node-to-node path sequence traversed by the probe message proceeding through the network; (f) evaluating the forwarded probe message at each subsequent node according to the predetermined message handling criterion; (g) repeating steps (d) through (f) for each forwarded probe message until the forwarded probe message is discarded or is received by the probe originating node as its own probe message; (h) the probe originating node noting each message receipt of its own probe message from other than an adjacent node; each message receipt including a recital of a restoration path in the node-to-node path sequence; (i) recording each message receipt in a restoration path register in a data store at a plurality of selected storage nodes of the network.
The method of the present invention may comprise the further steps of: (j) on detection of a link failure intermediate a first node and a second node, designating one of the first and second nodes as a sender node, and the other node as a receiver node; (k) operating the sender node to choose a selected restoration route from its restoration path register according to predetermined route selection criteria; (l) building a connection message at the sender node identifying the selected restoration route; (m) conveying the connection message to the receiver node; (n) establishing a bidirectional connection in each node intermediate the sender node and the receiver node in the selected restoration path; and (o) cooperatively orienting the sender node and the receiver node to effect communications via the selected restoration route.
The apparatus of the present invention is a communication network system having a distributed restoration capability, the system comprising: a plurality of communication nodes for generating and handling messages, at least some of the nodes including a data store; a plurality of internodal communication links connecting the plurality of nodes; a spare link catalog distributively stored in at least some of the data stores and connected with the plurality of internodal links. The spare link catalog identifies each extant unassigned link connecting first selected nodes of the plurality of nodes with each node adjacent to each first selected node. The apparatus further comprises a restoration path register distributively stored in at least some of the data stores and connected with the plurality of internodal links. The restoration path register identifies alternate paths assignable from second selected nodes of the plurality of nodes to each node adjacent to each second selected node. The restoration path register includes node sequence information relating paths traversed by full-circuit probe messages sent by probe originating nodes of the plurality of nodes. Full-circuit probe messages are those probe messages which have been dispatched by a respective probe originating node, traversed more than one node other than the respective probe originating mode, and returned to the respective probe originating node.
The system responds to a disruption of communications on one of the internodal communication links intermediate a first node and a second node of the plurality of nodes by operating one node of the first and second node to choose a selected restoration route from the restoration path register according to predetermined route selection criteria. The one node communicates the selected restoration route to the other node, and the one node and the other node orient to cooperatively effect communications via the selected restoration route.
It is, therefore, an object of the present invention to provide a method and apparatus for distributed managing of restoration paths within a communication network that are robust and reliable.
A further object of the present invention is to provide a method and apparatus for distributed managing of restoration paths within a communication network which are simple and efficient in their approach to identifying restoration paths through the network.
Yet a further object of the present invention is to provide a method and apparatus for distributed managing of restoration paths within a communication network which are simple and efficient in their approach to alerting appropriate network locations of the need for their participation in restoration on a timely basis.
A further object of the present invention is to provide a method and apparatus for distributed managing of restoration paths within a communication network which are simple, efficient and fast in establishing restoration paths through the network.
Still a further object of the present invention is to provide a method and apparatus for distributed managing of restoration paths within a communication network which are simple and efficient in their approach to automatically updating restoration information periodically in a self-learning mode of operation.
Further objects and features of the present invention will be apparent from the following specification and claims when considered in connection with the accompanying drawings, in which like elements are labeled using like reference numerals in the various figures, illustrating the preferred embodiments of the invention.