With respect to data communications, as represented in FIG. 1, a data network 10 may be generally characterized by a plurality of nodes 12 interconnected, for example, by trunks 14 and/or by virtual circuits 16 through which two nodes are communicatively coupled, whereby data is transferred between the nodes 12 of the network 10. The nodes 12 embody processors which effect various functions within the network 10. Certain nodes 12 may be provisioned with links 18 each of which functions as a data communications service access line, typically used by customer premise equipment (not shown) to access the data network 10. Frame Relay Services carrying data between network nodes 12 manifested by Nortel's Magellan Passport product is an example of conventional data communications.
Data networks 10 have evolved rapidly in the last few years, for instance, increasing by a 2 or 3 order of magnitude in both the speed of data transfer (e.g., from 9600 bits/second to 50 megabits/second) and the number of virtual circuits 16 that are present in such networks (e.g., from 500 to 500,000 or higher). As a consequence, technical problems need to be solved in order to create reliable data networks 10 that are so much faster and higher in complexity, from the technology base of earlier, smaller networks.
A significant problem is the amount of processing demands on the network nodes 12 to supervise and control the state of a large number of network entities, such as, the trunks 14, virtual circuits (VCs) 16 and corresponding data link connection identifiers (DLCIs) paired with each VC, links 18 (i.e. access lines), and the like.
Network speeds are much faster. As the volume of control messages is relatively low in comparison to activity each can produce, with respect to high speed networks, demands on a single node 12 for control of large amounts of network entities seem to arrive instantaneously at the network node 12 which may not have sufficient processing resources or capacity to process the large number of demands, thereby overloading that node.
The various nodes 12 in the data network 10 have no easy means to communicate with each other in relation to the large numbers of network entities which require supervision. Also, modern software practices typically implement each network entity in software as an object, that act independently of other objects, even in the same processor. Furthermore, all demands on load typically are generated as quickly as possible because, according to conventional wisdom, it is preferred to have the network 10 react as quickly as possible to network affecting events.
Network events can occur that trigger supervisory activities on large numbers of the network entities. These activities typically are initiated in response to some event occurring within the data communications network 10 in respect of a particular node and that node in turn may broadcast respective control messages for the network entities affected by the event, to one or more other nodes. Examples of possible network events include:
activation or outage of a data communications service containing a large number of DLCIs and corresponding VCs 16, for example, a link 18 which previously was not functioning now becoming operational or the link 18 which previously was functioning now becoming non-operational; PA1 disconnection of the trunks 14 connecting a network node 10 to other node and its subsequent restoration; and PA1 reset of a network node 12 that handles a large number of network entities and its subsequent restoration.
Such events can easily produce very large demands, in terms of processing resources, on different nodes 12 in the network 10. If not handled properly, overloading of these nodes can either cause failure in the restoration of the required activity, for example, by the activity taking so long that it exceeds the time-out values for replies or in extreme cases, failure of other nodes 12 by exhaustion of memory or queue resources on those nodes.
Since such drastic actions are usually triggered by initial errors, this instantaneous large resource demand, referred to herein as tidal waves, makes required stability of data networks--the ability to keep operating and the ability to recover from failures--difficult to achieve. For example, initial failure of a particular network node 12 can produce tidal waves though the network 10, which waves then cause other network nodes 12 to fail, thereby resulting in complete paralysis of the entire network 10.
Therefore, measures need to be taken to safeguard large data networks from such drastic failures.
A direct approach to prevent tidal waves is for network nodes 12 that are low in resources to inform other nodes. Such congestion messages should cause the potential originator node 12 in the generation of a tidal wave to slow down and prevent tidal waves that are damaging. However, technology to monitor and inform on resource utilization, especially on a fast and transient basis, does not exist in standard form across the data network 10. The overloading can occur so fast that messages to ask the originator node 12 to slow down can easily be too late. Also, since there are so many network entities, such resource backing messages can easily themselves cause tidal waves if each network entity is responsible for checking resources.
If some supervisory entities (a smaller number of entities) in a network node are responsible, then it is very costly for that supervisor to find exactly which of the numerous entities are responsible for resource exhaustion and to ask them to desist. The complexity of any such system is very high, and it is very costly to produce and to verify their correct operation.
Another approach to prevent tidal waves is for any generating network node to query the receiver network node and make sure that it receives a proceed signal before starting the activity that involves the numerous network entities. This approach generates traffic and resource demand in itself, during times when the data network may be severely stressed, and may thus itself be a destablizing factor. Since there are numerous network entities that are independent of each other, the cost of such an authorization system will be as costly to network resources as the original activity. A query has to be sent to authorize each transaction. Such a system will slow down significantly the speed within which the needs of the numerous network entities are serviced. Large amounts of complexity is introduced to handle cases where proceed signals are not acknowledged. The sender will have to go through its lists of network entities and try again several times.
Hence, a methodology that prevents generation of tidal waves within a data communications network is desired.