The present invention relates generally to telecommunications systems and their methods of operation and, more particularly, to a method and system for dynamically restoring communications traffic through a telecommunications network and, even more specifically, to a distributed restoration method and system for restoring communications traffic flow in response to sensing a failure within spans of the telecommunications network.
Whether caused by a backhoe, an ice storm or a pack of hungry rodents, losing a span or bundle of communication channels such as DS3 and SONET telephone channels means losing significant revenues. After the first 1.5 seconds of an outage, there is also a significant risk that the outage may disable one of more local offices in the network due to an excess of carrier group alarms.
Several techniques are commonly used to restore telecommunications networks. Three of these are well known. The first of which is called route diversity. Route diversity addresses the situation of two cables running between a source and a destination. one cable may take a northward path, while the other takes a southward path. If the northward path fails, traffic may be sent over the southward path, or vice-versa. This is generally a very high quality restoration mechanism because of its speed. A problem with route diversity, however, is that, generally, it is very expensive to employ. The use of rings also provides for network restoration. This is particularly attractive when a large number of stations are connected together. These stations may be connected in a ring. Thus, if any one connection of the ring fails, traffic may be routed in a direction other than the one including the failure, due to the circular nature of the ring. Thus, a ring may survive one cut and still be connected. A disadvantage with rings, is that the nodes of telecommunication networks must be connected in a circular manner. Without establishing the circular configuration that a ring requires, this type of restoration is not possible.
The final method of network restoration, mesh restoration, entails re-routing traffic through the network in any way possible. Thus, mesh restoration uses spare capacity in the network to re-route traffic over spare or under utilized connections. Mesh restoration generally provides the lowest quality of service in the sense that it generally requires a much longer time than does route diversity or ring restoration to restore communications. On the other hand, mesh restoration has the attraction of not requiring as much spare capacity as do route diversity or ring restoration. In performing network restoration using mesh restoration, two techniques are possible. One is known as centralized restoration, the other is known as distributed restoration. In centralized mesh restoration, a central computer controls the entire process and all of the associated network elements. All of the network elements report to and are controlled by the central computer. The central computer ascertains the status of the network, calculates alternative paths and sends commands to the network elements to perform network restoration. In some ways, centralized mesh restoration is simpler than distributed mesh restoration. In distributed mesh restoration, there is no central computer controlling the entire process. instead, the network elements, specifically the cross-connects communicate among themselves sending messages back and forth to determine the optimum restoration path. Distributed mesh restoration, therefore, performs a level of parallel processing by which a single restoration program operates on many computers simultaneously. Thus, while the computers associated with the network elements are geographically distributed, parallel processing still occurs. There is yet one set of instructions that runs on many machines that are working together to restore the network.
The present invention thus comprises the concept of connecting a plurality of nodes such as cross-connects in a communication circuit network with control channels interconnecting all nodes, and with there being spare capacity between a sufficient number of nodes to accommodate at least some rerouting of traffic as quickly as possible upon detection of a break in a traffic span in the network so as to restore circuit continuity within a predetermined maximum time.
It is thus an object of the present invention to provide an improved communication failure detection, isolation and recovery scheme or algorithm.