Today's communication systems for delivering data from a source to a destination are typically high-speed, high capacity systems that handle many different types of user services. One example of such a system is an optical fiber network used for telephone traffic or for other data exchange. These advanced networks can transport vast amounts of information at any given time. The networks are conventionally made of links connected to nodes which route information over the network to a desired destination.
In currently deployed telecommunication networks, failures occur often, and sometimes with serious consequences. An analysis of failures in the Public Switched Telephone Network (PSTN) has shown that human error, acts of nature, equipment failure and overloads are the major sources of failure. The impact of the failures can be measured in terms of how many times a particular failure occurred, duration of the outage, number of customers, and number of customer minutes (a customer minute is defined as the outage in minutes times the number of customers affected) affected during that outage. During the period of April 1992 through March 1994, the average number of customers affected due to cable cuttings or cable component failures was 216,690 costing 2,643 million customer minutes. Similarly, the average number of customers affected due to an equipment failure was 1,836,910, costing 3,544.3 million customer minutes. Cable cuttings and equipment failures account approximately for half of the failures encountered in the network during that period.
The recent interest in restoration techniques for fiber optic networks is largely because fiber optic transmission systems are cable-based technologies that are subject to frequent damage. Furthermore, due to the introduction of wavelength division multiplex (WDM) in commercial networks, each fiber can carry an extremely high volume of traffic and such failures can potentially affect a large number of customers, causing devastating effects to the network. Commercially available fiber optic transmission systems can currently use 2.5 Gb/sec/channel in each fiber and up to 64 channels per fiber. This translates to potentially 160 Gb/sec/fiber. Taking into account that on the average each cable carries 96 fibers, there is a possibility that a cable cut will result in the loss of about 15 Tb/sec. This in turn translates to the loss of 240 million voice circuits. These numbers can potentially grow much larger for future networks with .ltoreq.400 fibers/cable with 80 (and possibly more) channels at OC-192 (10 Gb/sec) data rates. Thus, for a WDM transport infrastructure to become a reality, it is imperative that the reliability issue is studied and resolved in advance.
The most prevalent form of communication failures is the accidental disruption of buried telecommunication cables. Aerial cables are also affected but not at the same rate as the ones that are in the ground. Fiber cuts may result, among other reasons, from construction work, lightning, rodent damage, fires, train derailments, vandalism, car crashes and human error. One of the main reasons why telecommunication cables in the ground are so susceptible to failures is the fact that they are buried in the same public rights-of-way as are all other utility transport media (water, gas pipes, television cables, etc.). As a result, when craftspeople for one utility company try to work on their transport medium, they usually affect the rest of the buried transmission media as well.
Examples of fiber cuts that severely affected the network operation can be found throughout the short history of the fiber optic networks. During the 1980's in particular, when all the major telecommunications companies were laying most of their fibers in the ground, cable cuts were almost an everyday occurrence. One of the most devastating cable cuts occurred in 1987, when a 12-fiber lightwave cable which was a major backbone carrier facility (the term facility is defined as a fiber system that is used to transport optical signals) was severed. Because of that cut, an estimated 100,000 connections were almost immediately lost. Manual restoration of some of the capacity on physically diverse routes took several hours, and the failure cost the telephone company millions of dollars. A fault recovery system for link failures (as well as node failures in networks with planar topologies) is addressed in commonly assigned U.S. application Ser. No. 09/331437 identified above.
Equipment failures, even though not as common as cable cuts, can also devastate the network. The term equipment encompasses anything from transmitters, receivers, simple network switches, and all other peripheral equipment in a network node (such as protection switches, power devices, network management devices, etc.) up to a Central Office. In general, equipment failures affect many more customers than cable cuts, since all the connections passing through the failed network node are lost. Equipment failures can result from natural phenomena (earthquakes, floods, fires), human error, as well as hardware degradation. Examples of well-known equipment failures that had a devastating effect on the network include the 1988 fire that destroyed Illinois Bell's Hinsdale switching office, the 1987 fire that damaged N.Y. Telephone's Brunswick switching office and the 1987 AT&T switching computer failure in Dallas. The 1988 Illinois fire is rated as the worst telecommunications disaster in U.S. history since it happened in one of the busiest days of the year (Mother's day) and took more than a month to completely repair. All three failures cost the telephone companies millions of dollars in business lost by the customers during the outage and more importantly diminished the customer's confidence in the companies' services.
Management of these networks is an important but difficult problem. Even though failures cannot be avoided, quick detection, identification and restoration of a fault make the network more robust with highly reliable operations and ultimately increase the level of confidence in the services it provides. Failure restoration in particular is a crucial aspect for the successful deployment of today's telecommunication networks. A network fault that goes unattended for a long period of time can cause both tangible and intangible losses for the company that provides the service, as well as for its clients. A long outage may cause extensive revenue and asset losses, and seriously affect the services provided, and thus damage the credibility and reputation of an organization. A prolonged outage is considered so important that in 1992 it was specified that any outage resulting to the loss of telephone service to 50,000 or more customers for more than 30 minutes has to be reported to the FCCl. While prolonged outages can be particularly harmful, even brief ones can be troublesome. This is not only due to the economic impact of the outage time but also because of the vital services that are currently supported in the network. Whereas previously in telephone networks an outage meant that a telephone caller had to hand up and try again later, an outage nowadays may have a devastating impact. Thus, the current trend is for more and more networks that are virtually uninterruptible.
Fault restoration is defined as the process of re-establishing traffic continuity in the event of a failure condition affecting that traffic, by rerouting the signals on diverse facilities when a failure occurs. A network is defined as survivable if it is capable of failure restoration in the event of a failure condition. The degree of survivability is determined in part by the network's ability to survive network switch equipment failures. In order for a network to be survivable, it must topologically be able to allow rerouting around a fault condition.
It would be advantageous to provide a network fault recovery system and method which corrects for switch failures so that information originally routed through the failed switch would be properly transmitted from its source to its destination. It would be further advantageous to have the fault recovery system and method apply to any type of network topology (planar and non-planar topologies).