This is the first application filed for the present invention.
Not Applicable.
The present invention relates to data communications networks having a mesh topology, and in particular to optimized fault notification in an overlay mesh network by correlating knowledge of network topology to paths traversing the network.
Modern data communications networks are designed using a layered architecture, in which each layer provides a constellation of services and functionality, most of which are based upon a particular view of the network topology. A first layer of the network architecture (typically referred to as layer-1, or the physical layer) provides services and functionality related to the physical transport of data through physical links (e.g. optical fibre, co-axial cable etc.) between physical nodes of the network. The layer-1 view of the network topology involves knowledge of physical nodes and links which in optical networks are typically arranged in a Bi-directional Line Switched Ring (BLSR) topology. Within this environment, the Synchronous Optical Network (SONET) protocol (and/or its European equivalent, the Synchronous Data Hierarchy (SDH) protocol) is generally employed to accomplish transport of data through the network.
Typically, at least two logical layers (commonly referred to as layer-2 and layer-3) overlay the physical layer. These logical layers provide enhanced connectivity services, and are typically based on a mesh view of the network, in which every node may, in principal, have a direct connection to every other node. As is well known in the art, such a mesh topology, in combination with applicable communications protocols (e.g. Internet Protocol (IP) and Asynchronous Transfer Mode (ATM)) can provide a highly flexible and fault tolerant network architecture.
The mesh network architecture has been proposed as means for improving physical layer performance, with IP, ATM or some other mesh-based communications protocol (e.g. Multi-Protocol Label Switching (MPLS) of Constraint based Routing-Label Distributed Paths (CR-LDP)) used to control the transport of data. However, an advantage of the BLSR topology is that failure of physical network resources (e.g. the severing of an optical fibre link) can be detected very rapidly, and data traffic rerouted to restore communications paths with minimal delay (typically on the order of 50-100 mSec.). By contrast, layer-2/3 mesh protocols generally require a very long time to detect a physical resource failure (on the order of seconds), and still more time (again, on the order of seconds) to discover an alternate communications path for restoring communications. Such a long restoration delay is frequently unacceptable to network service providers. Consequently, efforts have been made to improve the restorative performance of layer-2/3 protocols. For example, IETF draft  less than draft-shew-lsp-restoration-00.txt greater than , dated October 1999, proposes that alarm messages generated in layer-1 can be passed to layers 2 and 3 at intersection points of the layer-1 (typically BLSR) and layer-2/3 (typically mesh) networks. This facilitates rapid notification of layer-1 network resource failures and allows more rapid response of the layer-2/3 local path repair functionality. However, this draft does not address the slow response of layer-2/3 local path repair algorithms, and restoration delays remain unacceptably long. Accordingly, while the mesh topology is used for logical connectivity in layers 2 and 3, the BLSR network topology remains popular for physical data transport.
While the ring topology is capable of rapidly detecting physical resource failures and re-routing traffic to restore communications, ring networks suffer from the disadvantage that this functionality necessitates only partial utilization of network resources. In particular, in order to ensure that traffic can be re-routed around a failed resource, the total network capacity of ring networks is divided between working resources (i.e. resources that are utilized for data transport during normal operations) and protection resources, which are held in reserve and are only used to carry data traffic in the event of failure of a physical resource in some portion of the network. Because layer-1 protocols cannot distinguish between,different types of data, all of the data traffic within a ring must be treated with the same priority and guaranteed level of service. This requirement results in a 1:1 ratio between working resources and protection resources in the network, so that the utilization of network resources under normal operating conditions is about 50%. Since physical resource failures are generally quite rare, it is common for approximately 50% of the total network transport capacity to be idle at any one time.
Due to increasing user demand for bandwidth, the costs associated with provisioning a ring network are forcing network service providers to search for alternative means of providing protection resources, so that the proportion of the total network resources devoted to failure protection can be reduced. Various methods for reducing protection resources have been proposed. For example, co-pending and co-assigned U.S. patent application Ser. No. 09/471,139, filed on Dec. 23, 1999 and entitled METHOD OF DEACTIVATING PROTECTION FIBER RESOURCES IN AN OPTICAL RING NETWORK, teaches a method of reducing overall provisioned protection resources by eliminating duplication of protection fibre on spans that carry traffic of two adjacent rings. On such spans, a single protection fiber can be provided. The single protection fibre is shared between both rings, thereby providing a 2:1 ratio of working bandwidth to protection fibre on those spans. The success of this arrangement relies on the low probability that simultaneous physical failures will cause both rings to attempt to switch their respective working traffic onto the single protection fibre. On spans that are not shared between adjacent rings, which may comprise a majority of spans within the network, a 1:1 ratio must still be maintained, and this tends to diminish the overall improvement in the utilization of the total network bandwidth capacity.
Accordingly, a method and apparatus for rapidly compensating for physical network resource failures, while allowing efficient utilization of network resources during normal operations, remains highly desirable.
An object of the present invention is to provide a method and apparatus for rapidly compensating for physical network resource failures.
A further object of the present invention is to provide a method and apparatus for compensating for physical network resource failures, in which a requirement for explicit provisioning of protection bandwidth is substantially eliminated.
A further object of the present invention is to provide a method and apparatus for compensating for physical network resource failures in which a graceful degradation of service is possible in an event of simultaneous physical network resource failures.
Accordingly, an aspect of the present invention provides method compensating for network resource failures in a communications network. A set of at least two communications paths are established across the network between a source node and a destination node. The set of communications paths are at least partially physically diverse. At the source node, a path database is established to maintain information identifying network resources traversed by each communications path. The source node also monitors a status of each one of the set of communications paths, and the data traffic is load-balanced across those ones of the set of communications paths having an operational status.
A further aspect of the present invention provides a node of a communications network adapted for compensating for network resource failures in the communications network. The node comprises: means for establishing a set of two or more communications paths across the network to a destination node, the set of communications paths being at least partially physically diverse.; a path database adapted to maintain information identifying network resources traversed by each communications path; means for monitoring a status of each one of the set of communications paths; and means for load-balancing the data traffic across those ones of the set of communications paths having an operational status.
Another aspect of the present invention provides a system for compensating for network resource failures in a communications network comprising a plurality of nodes interconnected by links. The system comprises: means for establishing a set of two or more communications paths across the network between a source node and a destination node, the set of communications paths being at least partially physically diverse.; a path database adapted to maintain information identifying network resources traversed by each communications path; means for monitoring a status of each one of the set of communications paths; and load-balancing the data traffic across those ones of the set of communications paths having an operational status.
In embodiments of the invention, each communications path may be any one of an ATM-SVC, a Label Switched Path (LSP) and a SONET/SDH path-level connection.
Each one of the set of communications paths may be established by launching a path setup message from the source node. A resource allocation message may be received at the source node, the resource allocation message including information identifying a network resource traversed by the communications path. A path array may be updated on the basis of the information identifying the network resource.
The resource allocation message may be generated by a node in the communications path as network resources of a downstream hop are allocated to the communications path.
In embodiments of the invention, the path array comprises, for each communications path, a respective path record having a path identifier field and a plurality of resource fields, each resource field corresponding to one or more network resources which may be traversed by a communications path. Each network resource may include any one or more of: a physical network link; a physical network node; a logical connection between a pair of network nodes; a SONET/SDH section; and a SONET/SDH line. Each resource field may correspond to a single network resource, or alternatively may correspond to a logical combination of two or more network resources.
The path array may be updated by inserting an indicator flag into the respective resource field corresponding to the resource identified in the resource allocation message. The indicator flag may be a binary xe2x80x9c1xe2x80x9d.
In embodiments of the invention, the physical diversity of each of the set of communications paths can be verified. Verification of the physical diversity of each of the set of communications paths may be accomplished by adding the resource fields of the respective path records.
In embodiments of the invention, the status of each one of the set of communications paths may be monitored by: receiving a failure notification message including information indicative of a failed network resource; and setting a non-operational status of each communications path that traverses the failed network resource.
The failure of a network resource may be detected at a detecting node proximal the failed resource, possibly using a conventional layer-1 link failure detection technique. The failure notification message may be generated in the detecting node and propagated toward the source node. The failure notification message may be a substantially conventional layer-1 link failure alarm message, including a link identifier indicative of the failed resource. The failure notification may be propagated within layer-1 until it is received at the source node. At each node within the network, the failure notification may be launched into every upstream link from which the node is receiving traffic, so that the failure notification is propagated toward every edge node sourcing traffic destined for points downstream of the failed resource.
A non-operational status of each communications path that traverses the failed network resource may be set by: searching the path array to identify each path that traverses the failed resource; and updating a respective status field associated with each identified path to indicate a non-operational status of the corresponding path. Alternatively, the path array can be searched, and a respective path record associated with each identified path deleted from the path array.
Searching of the path array may be accomplished by means of conventional searching algorithms using a resource identifier of the failed resource as an array index. Thus the identifier of the failed resource can be used to identify a resource field corresponding to the failed resource, and then the path array searched to identify path records having a resource indicator in the identified resource field. Alternatively, the path array can be searched by: defining a mask of the failed resource; and ANDing the mask with the path array.
In embodiments of the invention, the data traffic may be load-balanced by prioritizing the traffic based on one or more of predetermined service level agreements and QoS requirements. A flow of lower-priority traffic may be reduced such that a total bandwidth requirement is less than or equal to an aggregate bandwidth capacity of those ones of the set of communications paths that are in an operational state. The flow of lower priority traffic may be reduced to a minimum amount calculated using a predetermined protection guarantee, which may form part of a service level agreement.
An advantage of the present invention is that data traffic is continuously load-balanced across the available operational communications paths. As a result, data traffic can be automatically routed away from non-operational communications paths, thereby providing automatic compensation for physical network resource failures, without having to specifically provision protection bandwidth. By placing the path topology information (i.e. the path array), and the load-balancing functionality within the source node, the time required to respond to a network resource failure is minimized, and the need for centralized network management is reduced.