1. Field of the Invention
The invention relates to the field of networks. More specifically, the invention relates to network elements.
2. Background of the Invention
A network element hosts multiple processes to maintain data for network communication. These processes relay information to each other with inter-process communication (IPC). The middleware of the network element will maintain process identification numbers for the processes running on the network element. One process will communicate directly with another process using these process identification numbers. Often within a network element, multiple processors run different operating systems.
If a process wants to communicate with a process that is dead, the process continues passing requests to the dead process. The requesting process detects the failure of the dead process through a response to the request or via timeouts. Although the operating system can detect when a process dies, it does not immediately communicate state of the process to other processes.
One method of IPC utilizes heartbeat messaging between processes. Once communication is established between two processes on a network element, the two processes periodically transmit heartbeat messages or signals indicating that they are alive and running. Death of one of the processes is detected by the other process when a heartbeat message has not been received within a given time period. Once a process is dead, however, the living process is ignorant of the dead process restarting. In addition, if both communicating processes die, when they restart different scenarios can occur. If both processes restart within the same time period, then they will both send requests. If one process restarts while the other remains dead, then the requesting process will repeatedly transmit requests to the dead process until it restarts.
Processes communicate with each other to disseminate information. One process on a network element may gather information about the interfaces of the network element while another process gathers routing information. This information is exchanged and/or passed on to other processes to facilitate processing and transmission of network traffic.
When a process requires information from another process, the process will send an IPC message to the other process requesting information or data. The other process will then pass a response back to the requesting process with the requested data.
If a requesting process does not receive a response within a certain time period, then the requesting process will mark the data from the timed out process as stale. Since the requesting process is unaware of the state of the timed out process, it sets a long timer on the stale data. When the timer expires, the stale data is removed.
Unfortunately, without information about the state of the timed out process, the requesting process cannot function intelligently. The stale data may be used beyond its life. Traffic processed with the stale data may be dropped or delayed. The length of time the data should be considered stale begins at some point before the timeout until the time expires. The amount of traffic impacted increases in proportion to this length of time.