A complex network typically has built into its functionality the ability to maintain and control the connections that it supports. For example, when a user effectively asks to send information to a particular destination (e.g., through the sending of a “connection request” to the network), a network should be able to intelligently inquire as to whether or not sufficient resources exist within the network to transport the information; and, if so, establish the connection so that the information can be transported. Moreover, the network should also be able to monitor the status of the connection (and, on a larger scale, the network itself) so that if an event arises that causes the connection to be interrupted—the network can take appropriate action(s) (e.g., re-route the connection, teardown the connection and ask the user to resend the information, etc.).
The equipment that forms the nodes of the network (e.g., the routers and/or switches that accept customer traffic from various copper and/or fiber optic lines and re-direct the customer traffic onto copper and/or fiber optic lines) are typically constructed with specific functional capabilities that allow these intelligent tasks to be performed. Typically, each network node is designed to have a “signaling control unit” that is responsible for processing connection setup/teardown procedures as well as connection maintenance procedures. Often, although not a strict requirement, the signaling control unit is also responsible for the execution of a routing algorithm that allows its corresponding node to “figure out” (in light of the network's overall topography/configuration (or changes thereto)) where received traffic is to be forwarded.
The signaling control units of the various node are designed to send “signaling” messages to one another so that the network as a whole can successfully perform these connection and network related configuration and maintenance tasks. A problem may arise, however, if a certain type of event (or chain of events) causes a “flood” of these messages to be sent to a particular signaling control unit (e.g., the signaling control unit of a specific node within the network) in a short amount of time. Specifically, if the magnitude of the incoming flood of messages exceeds a signaling control unit's capacity for handling these messages, the signaling control unit is likely to fail in the performance of its connection and/or network management related services.
FIG. 1 illustrates one type of event where a “flood” of signaling messages are sent to a particular signaling control unit. According to the example of FIG. 1, network node 1011 is communicatively coupled to nodes 1012 through 101N through networking lines 1022 through 102N−1, respectively. According to the simple example of FIG. 1, the “primary” signaling control function 1051 of node 1011 includes, amongst its various tasks and responsibilities, a smaller sub-function that may be referred to as the Received Status Request Function 106. A status request is a type of signaling message that asks (the node to which the message was sent) for a report (for the node that sent the message) as to the status of a particular connection. The status request includes an embedded entry that identifies the particular connection to which the status request pertains.
Under normal operating conditions, the Received Status Request Function 106 is responsible for handling every status request that node 1011 is expected to respond to. Note that the Received Status Request Function 106 includes a queue 107 and a status request engine (SRE) 108. As a status request can be sent to node 1011 from any of nodes 1012 through 101N, queue 107 is responsible for gathering and queuing each received status request regardless of its sending source (a feature that FIG. 1 attempts to capture through input flow 115). Whenever the status request engine 108 is able to handle a “next” status request, a “next” status request is removed from the queue 107 and is processed by the status request engine 108.
The processing of a status request as performed by the status request engine 108 entails: 1) inquiring, internally within node 1011, into the status of the connection to which the status request referred (a process flow that FIG. 1 attempts to capture through the “Connection OK?” request flow 109); and, 2) once an understanding of the status of the connection at issue is gained, initiating the formation of a signaling message (that is to be sent to the node that sent the status request) that reports the status of the connection from the perspective of node 1011 (a process flow that FIG. 1 attempts to capture through response flow 110).
Note that node 1011 is implemented with redundant signaling control functions 1051 and 1052. In a typical implementation, control function 1051 is implemented with a first electronic card and control function 1052 is implemented with a second electronic card. Under normal operating conditions, one of the control functions (e.g., signaling control function 1051) is deemed “primary” and the other control function (e.g., 1052) is deemed “inactive” or “on standby”. Redundant signaling control functions are used because of the importance of signaling to a working network. Here, if the “primary” control function 1051 suffers a significant failure (e.g., if a semiconductor chip used to implement the primary control function 1051 stops working), node 1011 is designed to automatically “switchover” to control function 1052 for the implementation of its signaling control tasks. That is, upon a significant failure by primary control function 1051, control function 1052 is converted from being a secondary/standby control function to the primary control function of node 1011.
Because the switchover to a new primary control function (and/or the failure of the elder control function) may cause temporary disruption to the signaling tasks of node 1011, node 1011 broadcasts to its neighboring nodes 1012 through 101N that it has undergone a “switchover” to a new primary control function. The broadcast is illustrated in FIG. 1 by the sending of N−1 signaling messages 1031 through 103N−1 to each of nodes 1012 through 101N, respectively. According to various signaling control implementations, the receipt of a signaling message that indicates a node has undergone a control function switchover causes a recipient of such a signaling message to send a status inquiry, to the node that underwent a control function switchover, for each connection that is carried by both the recipient of the signaling message and the sender of the signaling message.
According to the example of FIG. 1, this causes a “flood’ of status request messages (represented collectively by status request message trains 1041 through 104N−1) to be sent to node 1011, as a status request message for each connection carried by node 1011 and nodes 1012 through 101N collectively is sent from nodes 1012 through 101N collectively to node 1011. As a consequence, in many instances, the queue of control function 1052 that is equivalent to queue 107 of control function 1051 (not shown in FIG. 1) is not designed with a depth that is sufficient to queue all of the incoming status request messages; and/or, the status request engine of control function 1052 does not have the processing power to process the flood of status request messages within a reasonable amount of time.
According to various signaling control function implementations, if a response to a status inquiry is not received within a specific amount of time, the sending node of the status inquiry message is designed to teardown the connection on the assumption that the connection has already been dropped (on the assumption that the node that failed to respond to the status inquiry message is no longer supporting the connection). In the example of FIG. 1, the failure of control function 1052 to adequately handle the flood of incoming status inquiries should cause nodes 1012 through 101N to begin to drop those connections whose corresponding status request messages were not responded to or were not responded to on time. Note that, in such a situation, these connections are apt to be dropped inadvertently. That is, the connections themselves are fully operational (i.e., were not catastrophically affected by the switchover event) and therefore should not be dropped; and, it is merely the shortcoming in the capacity of the Received Status Request Function of control function 1052 to handle the flood of status requests that has caused these properly functioning connections to be dropped.