1. Field of the Invention
The invention is related to the field of communications, and in particular, to fault recovery for communication networks comprised of switch nodes that establish communication paths.
2. Description of the Prior Art
FIG. 1 illustrates communication network 100 in an example of the prior art. Communication network 100 includes switch nodes 111-135. Within communication network 100, switch nodes 111-135 are coupled together by network links 301-340. Access to communication network 100 is provided by access links 201-210 that are coupled to respective switch nodes 111-115 and 131-135. Most of the switch nodes will typically have access links, but the number of access links depicted on FIG. 1 is restricted for the clarity.
Switch nodes 111-135 could be optical cross-connect systems. Links 201-210 and 301-340 could be optical fibers or light waves carrying SONET/SDH signals. Switch nodes 111-135 could communicate over links 201-210 and 301-340 using Wave Division Multiplexing (WDM) systems and the Synchronous Optical Network (SONET) protocol. The SONET protocol may carry other protocols, such as Multi-Protocol Label Switching (MPLS), Asynchronous Transfer Mode (ATM), Internet Protocol (IP), Frame Relay (FR), Time Division Multiplex (TDM), or the like.
Communication network 100 also includes network control system 105. Network control system 105 exchanges control signals 106 with switch nodes 111-135. This exchange of control signals 106 could occur through intermediate systems that are not shown for clarity. Typically, the control signals 106 that are transferred from switch nodes 111-135 to network control system 105 include network information. The network information may include network topology, performance, and alarm information. The performance information may indicate or be used to determine: network efficiency, network availability, restoration success rate, restoration time, latency, jitter, and the like. The alarm information may indicate or be used to determine faults and other network problems.
Typically, the control signals 106 that are transferred from network control system 105 to switch nodes 111-135 include switching information. The switching information typically directs switch nodes 111-135 to change where they route incoming SONET signals. For example, switch node 122 may indicate to network control system 105 that link 312 has a fault, and in response, network control system 105 may direct switch node 122 to re-route SONET signals off of link 312 and onto links 307, 311, and/or 316.
In operation, communication system 100 establishes communication paths that are comprised of various links. A link extends between two adjacent switch nodes, and a communication path extends between two or more access links. For example, communication network 100 could establish a communication path from access link 202 to access link 207 through links 310-313 and switch nodes 112, 117, 122, 127, and 132.
Link-Based Fault Recovery
FIG. 2 illustrates link-based protection for communication network 100 in an example of the prior art. For clarity, many components of communication network 100 that were shown on FIG. 1 are not shown on FIG. 2. Communication network 100 has established a communication path from access link 202 to access link 207 through links 310-313 and switch nodes 112, 117, 122, 127, and 132. The communication path could be a SONET OC-48 signal that contains 48 constituent STS-1 signals. The STS-1 signals may carry protocols such as MPLS, ATM, IP, FR, TDM or the like.
Consider that a fault occurs on link 312 (indicated by the “X” mark). Switch node 122 and/or switch node 127 detect the fault and transfer control signals 106 indicating the fault to network control system 105.
Network control system 105 is configured to provide link-based protection over pre-determined back-up communication routes. With link-based protection, bandwidth over predetermined back-up routes is reserved to protect each link. Note that for link-based protection, the back-up routes extend from one of the endpoints of link 312 to the other. The endpoints for link 312 are switch nodes 122 and 127.
In this example, link 312 is protected by three back-up communication routes that extend between link-endpoint switch nodes 122 and 127. A first back-up route uses switch nodes 121 and 126, and links 307, 303, and 308 to transfer five of the constituent STS-1 signals between switch nodes 122 and 127. A second back-up route uses switch nodes 123 and 128, and links 316, 321, and 317 to transfer ten of the constituent STS-1 signals between switch nodes 122 and 127. A third back-up route uses switch nodes 123, 124, 129, and 128 and links 316, 325, 330, 326, and 317 to transfer 33 of the constituent STS-1 signals between switch nodes 122 and 127. Thus, all 48 constituent STS-1 signals can be transferred between switch nodes 122 and 127 over the back-up routes in the link-based protective scheme.
In response to the alarm information indicating the fault on link 312, network control system 105 transfers switching information to switch nodes 121-129 to implement the reserved bandwidth over the three pre-determined back-up routes. In response to the switching information, switch nodes 121-129 implement the three back-up routes as described above. After implementation of the back-up routes, the communication path from access link 202 to access link 207 is again operational.
Alternatively, network control system 105 could be configured to provide link-based restoration instead of link-based protection. With restoration instead of protection, the bandwidth is not reserved before the fault, and network control system 105 must determine if the needed bandwidth is available on the back-up routes after the fault. Network control system 105 could determine that the needed bandwidth is available on the three back-up routes (even though the bandwidth was not reserved), and once this determination is made, the link-based restoration would be implemented in a similar fashion to the link-based protection example described above.
Alternatively, network control system 105 could be configured to provide dynamic link-based protection/restoration as opposed to pre-determined link-based protection/restoration. With dynamic restoration/protection, the back-up routes are not pre-determined before the fault, and network control system 105 must determine the back-up routes after the fault. After the fault, network control system 105 could dynamically select the three back-up routes that are described above, and once this determination is made, the dynamic link-based protection or restoration would be implemented in a similar fashion to the link-based protection example described above.
Path-Based Fault Recovery
FIG. 3 illustrates path-based protection for communication network 100 in an example of the prior art. For clarity, many components of communication network 100 that were shown on FIG. 1 are not shown on FIG. 3. Communication network 100 has established a communication path from access link 202 to access link 207 through links 310-313 and switch nodes 112, 117, 122, 127, and 132. The communication path could be a SONET OC-48 signal that contains 48 constituent STS-1 signals.
Consider that a fault occurs on link 312 (indicated by the “X” mark). Switch node 122 and/or switch node 127 detect the fault and transfer control signals 106 indicating the fault to network control system 105.
Network control system 105 is configured to provide path-based protection over pre-determined back-up communication routes. With path-based protection, bandwidth over predetermined back-up routes is reserved to protect each communication path. Note that for path-based protection, the back-up routes extend from one of the endpoints of the communication path to the other. The endpoints for the communication path are switch nodes 112 and 132.
In this example, the communication path is protected by three back-up communication routes that extend between path-endpoint switch nodes 112 and 132. A first back-up route uses switch nodes 111, 116, 121, 126, and 131 and links 305, 301, 302, 303, 304, and 309 to transfer five of the constituent STS-1 signals between switch nodes 112 and 132. A second back-up route uses switch nodes 113, 114, 119, 124, 129, 128, and 127 and links 314, 323, 328, 329, 330, 326, 317, and 313 to transfer ten of the constituent STS-1 signals between switch nodes 112 and 132. A third back-up route uses switch nodes 113, 118, 123, 128, and 133 and links 314, 319, 320, 321, 322, and 318 to transfer 33 of the constituent STS-1 signals between switch nodes 112 and 132. Thus, all 48 constituent STS-1 signals can be transferred between switch nodes 112 and 132 over the back-up routes.
In response to the alarm information indicating the fault on link 312, network control system 105 transfers switching information to switch nodes 111-114, 116-119, 121-124, 126-129, and 131-133 to implement the reserved bandwidth over the three pre-determined back-up routes. In response to the switching information, switch nodes 111-114, 116-119, 121-124, 126-129, and 131-133 implement the three back-up routes as described above. After implementation of the back-up routes, the communication path from access link 202 to access link 207 is again operational.
Alternatively, network control system 105 could be configured to provide path-based restoration instead of path-based protection. With restoration instead of protection, the bandwidth is not reserved before the fault, and network control system 105 must determine if the needed bandwidth is available on the back-up routes after the fault. Network control system 105 could determine that the needed bandwidth is available on the three back-up routes (even though the bandwidth was not reserved), and once this determination is made, the path-based restoration would be implemented in a similar fashion to the path-based protection example described above.
Alternatively, network control system 105 could be configured to provide dynamic path-based protection/restoration as opposed to pre-determined path-based protection/restoration. With dynamic restoration/protection, the back-up routes are not pre-determined before the fault, and network control system 105 must determine the back-up routes after the fault. After the fault, network control system 105 could dynamically select the three back-up routes that are described above, and once this determination is made, the dynamic path-based protection/restoration would be implemented in a similar fashion to the path-based protection example described above.
Sub-Path-Based Fault Recovery
FIG. 4 illustrates sub-path-based protection for communication network 100 in an example of the prior art. For clarity, many components of communication network 100 that were shown on FIG. 1 are not shown on FIG. 4. Communication network 100 has established a communication path from access link 202 to access link 207 through links 310-313 and switch nodes 112, 117, 122, 127, and 132. The communication path could be a SONET OC-48 signal that contains 48 constituent STS-1 signals.
Consider that a fault occurs on link 312 (indicated by the “X” mark). Switch node 122 and/or switch node 127 detect the fault and transfer control signals 106 indicating the fault to network control system 105.
Network control system 105 is configured to provide sub-path-based protection over pre-determined back-up communication routes. With sub-path-based protection, bandwidth over predetermined back-up routes is reserved to protect sub-paths of a given communication path. Note that for sub-path-based protection, the back-up routes do not extend between the link endpoints (nodes 122 and 127), and the back-up routes do not extend between the path endpoints (nodes 112 and 132). Instead, the back-up routes extend from at least one switch node that is not a path endpoint or a link endpoint.
In this example, a sub-path formed by links 311-312 is protected by a back-up communication route that extends between switch nodes 117 and 127. The back-up route uses switch nodes 118, 119, 124, 129, and 128 and links 315, 324, 329, 330, 326, and 317 to transfer all 48 of the constituent STS-1 signals between switch nodes 117 and 127.
In response to the alarm information indicating the fault on link 312, network control system 105 transfers switching information to switch nodes 117-119, 122, 124, and 127-129 to implement the reserved bandwidth over the pre-determined back-up route. In response to the switching information, switch nodes 117-119, 122, 124, and 127-129 implement the back-up route as described above. After implementation of the back-up route, the communication path from access link 202 to access link 207 is again operational.
Alternatively, network control system 105 could be configured to provide sub-path-based restoration instead of sub-path-based protection. With restoration instead of protection, the bandwidth is not reserved before the fault, and network control system 105 must determine if the needed bandwidth is available on the back-up route after the fault. Network control system 105 could determine that the needed bandwidth is available on the back-up route (even though the bandwidth was not reserved), and once this determination is made, the sub-path-based restoration would be implemented in a similar fashion to the sub-path-based protection example described above.
Alternatively, network control system 105 could be configured to provide dynamic sub-path-based protection/restoration as opposed to pre-determined sub-path-based protection/restoration. With dynamic restoration/protection, the back-up route is not pre-determined before the fault, and network control system 105 must determine the back-up route after the fault. After the fault, network control system 105 could dynamically select the back-up route that is described above, and once this determination is made, the dynamic sub-path-based protection/restoration would be implemented in a similar fashion to the sub-path-based protection example described above.
Problems
For illustrative purposes, communication network 100 has been used above to demonstrate various multiple fault-recovery techniques (link, path, or sub-path). In practice, communication network 100 would only implement a single one of these fault-recovery schemes. As such, communication network 100 would not effectively benefit from the capability offered in a hybrid mix of link, path, and sub-path fault-recovery techniques. In addition, communication network 100 does not process network information to adaptively modify coordination among link, path, and sub-path fault-recovery schemes.
Communication network 100 carries numerous protocols that correspond to multiple different service offerings. For example, one of the STS-1 link constituents in the above examples could transfer digital voice information between telecommunication platforms, while another one of the STS-1 constituents could transfer IP packets between Internet routers. Due to the diverse services supported by the different link constituents, the link constituents have differing Quality-of-Service (QoS) requirements. For example, the amount of time that is allowed to recover from a fault may be much lower for the voice traffic than for the IP traffic. Unfortunately, communication network 100 cannot effectively benefit from an adaptive mix of fault-recovery techniques that are capable of meeting the QoS requirements of the link constituents on the communication path.