The Internet has migrated from a best-effort service model to an integrated service model to support data, voice, and video applications such as, for example, Internet Protocol (“IP”) television (“IPTV”), Voice-over-IP (“VoIP”), and video on-demand. Given the immense scale of networks and the monetary cost of Internet downtime, service survivability issues are of paramount importance to any network provider. For some financial and retail businesses, the cost of Internet downtime can be in excess of a million dollars an hour. Considering the enormous amount of data carried on the Internet, there are tremendous needs for robustness and reliability in any modern network.
Recovery schemes have been defined and analyzed for network fault tolerance. Network fault tolerance is primarily concerned with the smallest amount of damage that can disconnect a network or reduce its performance to unacceptable levels or result in a total network failure. One of the most popular fault tolerance schemes is path protection switching. Path protection switching utilizes pre-assigned capacity between nodes for protection. In dedicated protection, the resources for the recovery entity are pre-assigned for the sole use of the protected transport path. In shared protection, the resources for the recovery entities of several services are shared. The resources may be shared as 1:n or m:n and are shared on individual links.
There are two major issues with previous popular fault tolerance schemes. One issue is that it is difficult to strike a balance between network resource efficiency and the simplicity of protection schemes. In other words, previous fault tolerance schemes are either too costly in terms of network resource usage or too complicated to be practically applied. For example, a dedicated path protection scheme is the simplest scheme that has a single entity for each working entity, but the dedicated path protection scheme produces a 100% protection overhead. Even with improvements a dedicated path protection scheme remains resource intensive. Protection costs generally are around 80% of the cost for working connections. Given the limited resources and ever-growing user demands, it is practically impossible to provide dedicated protection for each connection within a given network. On the other hand, network operators prefer to use simple and straightforward approaches. Shared path protections, which can save network resources by sharing the protection bandwidth of disjointed connections, are generally very complicated and difficult to implement. Moreover, the recovery time of shared path protection schemes could be as high as 5 seconds or more, while the typical required recovery time is around 50 milliseconds. Thus, shared protection is rarely implemented in practice and more focus is on providing end-to-end connection availability based on dedicated protection. It is desirable for network service providers to have a protection scheme that is simple, easy to implement, and efficient in terms of running time and network resource usage.
The second major issue with previous protection schemes is the lack of consideration of traffic differentiation for protection. This is because not all network locations or connections are equally important. Some network locations or connections are more significant than others and should be given higher protection priorities. For example, the 9/11 tragedy in New York City has made clear the extent to which an increasing dependence on telecommunication networks permeates day-to-day operations. After the collapse of the World Trade Center towers, three New York counties lost their connection to the statewide computer system when a major telecommunication hub located at ground zero failed. With a combined population of 3.7 million, all three counties had significant interaction with the state, making the service failure a noteworthy interruption during this tragedy. The communication hub at ground zero therefore was much more important than the equipment operated at less populous areas.
Targeted attacks at the most connected nodes have the potential to fragment scale-free networks, revealing a significant lack of network survivability. Clearly, some network locations/connections deserve higher priority and better protections than others. Moreover, most previous protection and restoration schemes were designed for the all-or-nothing protection. These schemes are overkill for data traffic. Although the provisioning of two disjointed paths provides better network survivability, this provisioning scheme imposes at least a 100% protection bandwidth overhead. Not all of the applications or transmissions require the same level of fault tolerance performances. While voice generates constant bit rate traffic, data traffic is bursty giving the advantage that data applications can continue operation, possibly at a lowered performance, even if the capacity along the path is reduced. For example, a wide-area enterprise storage network, while slowing down, can still function if failures reduce the underlying network capacity by 50%. In other words, unlike voice that has a binary service up or down condition, data services can survive gradual degradation as the available bandwidth is reduced. In many practical situations, it is helpful to execute an application with reduced quality of service—for example, a black-and-white video conference tool may still be very useful if there is not sufficient bandwidth for full color video, or likewise a decrease in resolution. For such applications, instead of providing fast and full protection, a main goal for network service providers is to provide adaptive and reliable connections with finer granularity of protection.