The present invention relates generally to techniques for improving network survivability and more particularly to techniques for determining a spare capacity allocation and techniques for optimizing a network restoration scheme.
Telecommunication networks occupy a critical role in today""s society. The failure of a network, such as a telephone network, a medical database network, a banking computer network, a military network, an air traffic control network or the Internet, among others, could have catastrophic consequences.
Most networks are comprised of multiple nodes (such as computers, routers, servers, and switching devices, among others) interconnected by multiple links (such as fiber optic cables and wireless relay stations, among others). Information (such as network status information, user data, and component address information, among others) originates at a source node, flows over a set of sequentially connected links, and terminates at a destination node. A node may act as both a source node and a destination node. For example, a computer (source node) requests information from a router (destination node) over a fiber optic cable (link). The router (now acting as the source node) sends the requested information back over the fiber optic cable (link) to the computer (now acting as the destination node). A source node and its corresponding destination node are referred to as a xe2x80x9cnode pairxe2x80x9d. It should be noted that the terms xe2x80x9cdata,xe2x80x9d xe2x80x9ctraffic,xe2x80x9d and xe2x80x9ctraffic demandxe2x80x9d are intended to be synonymous with the term xe2x80x9cinformationxe2x80x9d hereinafter.
The route traveled by the information from the source node to the destination node is called a xe2x80x9cpath.xe2x80x9d A path may include the source and destination nodes and one or more links. Furthermore, the path may contain intermediate nodes which pass the information to the next link or node. The path that is normally used for communication by a node pair is called the working path. Most networks establish the shortest path between the source node and the destination node as the working path. Thus, a route requiring one link to connect a node pair is preferred as the working path over a route requiring two links to connect the node pair.
A failure in any network component along a path, however, may prevent communication between the node pair. For example, a link that has failed within the working path will not allow information from the source node to reach the destination node. One or more component failures can easily cripple a network, thereby causing widespread communication failures. Network designers, to create a xe2x80x9csurvivable networkxe2x80x9d, establish backup paths which act as detours when a component in the working path fails. The ability of a network to continue operating after a failure is known as a network""s xe2x80x9csurvivabilityxe2x80x9d or xe2x80x9csurvivability levelxe2x80x9d.
Network designers must concentrate on providing cost-efficient spare capacity reservation at an acceptable survivability level. The xe2x80x9csurvivability levelxe2x80x9d gauges the percentage of network traffic that can be restored upon a failure. The ratio of the total spare capacity over the total working capacity, called the xe2x80x9cnetwork redundancy,xe2x80x9d is used to measure the cost efficiency of spare capacity allocation. Network redundancy is highly dependent on the network topology, as well as the algorithms used to determine the amount and placement of the spare capacity. It should be noted that xe2x80x9cdeterminexe2x80x9d, as referred to in this disclosure, is intended to include calculating, inferring, and assuming, among others. The goal of survivable network design is to provide the maximum network survivability at the minimum network redundancy.
Many techniques have been developed and implemented to maximize network survivability. Traditional network survivability techniques have two components: survivable network design and network restoration. These components are complementary to each other and cooperate to achieve seamless service operation (i.e., the prevention of service disruptions) when a network component fails.
The first component of traditional network survivability techniques is survivable network design. Survivable network design refers to the incorporation of survivability strategies into the network design phase to mitigate the impact of a set of specific failure scenarios. A major component related to the creation of a survivable network is called spare capacity allocation (xe2x80x9cSCAxe2x80x9d). SCA refers to providing adequate resources within the network which enable traffic to be rerouted when a specific component fails (a link failure for example).
Designers face two major challenges related to SCA. The first challenge is determining how much spare capacity should be provisioned for the network. The second challenge is determining where that spare capacity should be located within the network. Several algorithms have been developed to assist designers in meeting these challenges. These algorithms are either classified as centralized or distributed algorithms. Centralized algorithms are implemented by a central controller, or processor, which has global information regarding the network. Centralized networks, although easy to implement, fail to adequately deal with the dynamic bandwidth provisioning and traffic fluctuations present in current networks such as ATM backbone networks and the Internet. Distributed algorithms, on the other hand, are implemented within each node in which network traffic travels. Distributed algorithms adequately address dynamic bandwidth provisioning and traffic fluctuations, but generally require more resources to implement and are not easily scaled to a changing network. Current algorithms require a large amount of computational time to achieve the desired result. Furthermore, the results obtained may not accurately approximate the optimal spare capacity requirement.
Therefore, there exists a need for a backup path routing scheme that is feasible, scalable, adaptive, much faster, and near global optimal in redundancy reduction.
The second component of traditional network survivability techniques is network restoration. Network restoration refers to rerouting traffic flow that has been affected by a network device failure. The affected traffic is detoured using either pre-planned spare capacity routing or dynamic fault-tolerant routing. In pre-planned spare capacity routing, the affected traffic is detoured to backup paths that have adequate spare capacity provisioned in the SCA design phase. Pre-planned spare capacity routing guarantees that service restoration occurs and minimizes the duration of the failure impact. Pre-planned spare capacity routing, however, requires allocating additional spare capacity, some of which may never be used. As such, the cost of implementing pre-planned spare capacity routing is relatively high.
Dynamic fault-tolerant routing, on the other hand, does not have spare capacity pre-allocated in anticipation of a specific failure. Instead, dynamic fault-tolerant routing establishes a backup path after a failure occurs using any available spare capacity. Although service restoration is not guaranteed and the duration and range of the failure impact is not minimized, dynamic fault-tolerant routing reduces the amount of spare capacity needed, thereby, reducing the implementation cost for the network.
With both pre-planned and dynamic routing, affected traffic is detoured using path restoration, link restoration, and fragment restoration. Path restoration refers to rerouting traffic within the end nodes (i.e., the source and destination nodes). Path restoration spreads the failure influence to a larger area within the network, but has a slow failure response speed. Link restoration refers to rerouting the traffic in the nodes adjacent to the failure (i.e., not necessarily within the source and destination nodes). Link restoration has a faster failure response speed, but has a significant impact on the area within the network that is close to the failure. Link restoration only patches the xe2x80x9cholexe2x80x9d introduced by the failure. Finally, fragment restoration reroutes traffic within the nodes between the traffic end node (i.e., the source or destination node) and the node adjacent to the failure. Fragment restoration falls somewhere between link and path restoration for both the restoration speed and the area impacted by a failure.
The selection of backup paths in a network restoration scheme is classified either as failure dependent or failure independent. Failure dependent restoration depends on the failure state (i.e., which node or link has failed), meaning that different network failures are protected by different backup paths. It requires the network nodes to save additional network state information to achieve better utilization. Failure independent restoration, on the other hand, does not depend on the failure state, and therefore is easier to implement than failure dependent restoration. However, failure independent restoration usually requires additional spare capacity to implement.
A need exists, therefore, for a network restoration scheme that is adaptable to the current operational state of the network and is capable of providing a guaranteed level of service restoration without requiring additional spare capacity to implement.
The discussion of the present invention focuses on mesh-type networks. It should be noted, however, that the present invention is applicable to other network types and that the use of mesh-type networks is in no way intended to limit the scope of the present invention. A mesh-type network refers to an at least two-connected plain (or flat) network. Mesh-type networks exist mainly in backbone and interconnection networks and possess great advantages with respect to flexibility, providing survivable intelligence, and the ability to improve utilization.
Traditional network designers, given traffic demand locations and network QoS requirements, are responsible for several tasks related to mesh-type networks. One task, called xe2x80x9ctopology designxe2x80x9d, requires a designer to distribute and interconnect the network nodes and links. Topology design establishes the node locations and the link connectivity within the network""s topology. Nodes and links must be located such that, should a portion of the network fail, the remaining portion of the network remains operational. This network characteristic is defined as two-link, or two-node connectivity. A two-link(node) connected topology has at least two link(node)-disjoint paths between any two origin-destination node pairs. Link-disjoint is discussed later in reference to FIG. 1.
Another important task, called xe2x80x9cnetwork synthesisxe2x80x9d, requires the network designer to provide sufficient resources (such as bandwidth, buffers, etc.) to transport all of the traffic demand with the guaranteed QoS requirements from the source node to the destination node while minimizing the total network cost. Network synthesis determines the traffic routing and resource dimensioning for the given network topology. Two problems related to network synthesis are resource design (or, capacity design when capacity is the main parameter of interest) and flow assignment. Multi-commodity flow (MCF) models (as are known in the art) are used for these problems. Additional constraints (such as traffic QoS guarantees, node buffer thresholds and link capacity limits, among others) can be applied to the model as well. In many cases, the MCF model is NP-hard (i.e., it is topology-dependent and does not scale with the number of link, number of nodes, and number of node pairs supporting traffic flow). Therefore, the scalability and application of the MCF model requires a good approximation method to find a near optimal solution for the network within a short period of time.
Additionally, designers use approximation methods to solve traditional network design problems and models. Traditional approximation methods include flow deviation, greedy construction, and Lagrangian methods. Modern heuristic methods include Simulated Annealing (xe2x80x9cSAxe2x80x9d), Genetic Algorithm (xe2x80x9cGAxe2x80x9d) and Tabu Search (xe2x80x9cTSxe2x80x9d) methods. All of these methods are used for network synthesis problems mentioned above. Designers also use mixed integer programming (MIP) models to formulate the networks link/node disjoint requirements, especially the link/node disjoint requirements between a working route and its backup routes. Link/node disjoint characteristics are additional constraints introduced by survivability requirements in the traditional network design problem. All of these approximation methods, however, suffer from one or more drawbacks, such as large computational time requirements and insufficient spare capacity allocation optimization results among others.
Therefore, a need exists for an approximation method that can quickly and accurately approximate the optimal spare capacity requirement for a network.
The present invention offers a method for deriving a backup path routing resource provisioning template for a network. The method is feasible, scalable, adaptive, much faster, and near global optimal in redundancy reduction. The method includes determining a working path for each traffic flow in the network, aggregating the working paths into a first matrix, determining a backup path for each traffic flow, aggregating the backup paths into a second matrix, and deriving the resource provisioning template from the first and second matrices.
Additionally, the present invention offers a method for determining a minimum spare capacity required for one of a plurality of links in a network. The method includes creating a spare provision matrix and determining the minimum spare capacity required for one of the plurality of links in the network using the spare provision matrix.
The present invention also offers a method for determining a total spare capacity allocation for a mesh-type network. The method includes creating a spare provision matrix related to the network, and deriving the spare capacity allocation for the network from the spare provision matrix.
The present invention offers a means for successively approximating the optimal spare capacity allocation needed by a mesh-type network. The method includes selecting a traffic flow and determining the link cost associated with the traffic flow, where the link cost is associated with an incremental spare capacity. The method further comprise determining a backup path using the link cost, and notifying the rest of the network of the backup path. The method is then repeated for a next traffic flow.
The present invention further offers a method for dynamically creating a fault management table, indexed by link 1, that is used by a network having a total number of links equal to L. The method includes storing a failure impact matrix Ml within the fault management table, storing a vector Vl within the fault management table, and storing a spare capacity sl within the fault management table.
Additionally, the present invention offers an apparatus for operation within a network having a plurality of nodes interconnected by a plurality of links to facilitate a plurality of traffic flows within the network. The apparatus includes an input link carrying information related to the nodes, links, and traffic flows, a processor operable to receive said information from the input link and operable to derive a spare provision matrix from the information, and an output link operable to receive information from the processor and carry the information related to the spare provision matrix to at least one other node.
Finally, the present invention offers an apparatus for operation within a network having a plurality of nodes interconnected by a plurality of links to facilitate a plurality of traffic flows within the network. The apparatus includes an input port for carrying information related to the nodes, links, and traffic flows, a processor to receive the information from the input link, to derive a fault management table from the information and to dynamically implement a successive survivability restoration scheme, and an output port to receive information from the processor and output the information related to the fault management table and the successive survivability scheme to at least one other node.