In computer networking, keep-alive (KA) messages or packets (also sometimes referred to as hello messages) are commonly used for a variety of different purposes including to check connectivity and the health of network devices. For example, a particular network device may transmit keep-alive messages to other network devices (e.g., to the neighbors of the particular network device) at regular time intervals. A network device receiving the keep-alive messages may use the messages to determine the health of the sender of the messages and also to check connectivity to the sender of the messages (e.g., check whether a link between the particular network device and the network device receiving the messages is operational), and the like. If a network device, such as a router, stops receiving keep-alive messages from a neighbor, after a set period (sometimes referred to as the dead interval), the router may assume the neighbor network device has gone down or there is something wrong with the connectivity to the neighbor network device, and take responsive actions. For example, if the recipient network device determines that a link is down due to not receiving keep-alive messages from a particular network device, the recipient network device may use a different path to route data until the link is up again.
A network device may receive and transmit different types of keep-alive messages corresponding to different protocols that involve sending of keep-alive messages. Examples of protocols that involve transmission of keep-alive messages at regular intervals include Intermediate System-Intermediate System (IS-IS), Resource Reservation Protocol (RSVP), Multiple Spanning Tree Protocol (MSTP), Link Aggregation Control Protocol (LACP), Open Shortest Path First (OSPF), Unidirectional Link Detection (UDLD), Generic Routing Encapsulation (GRE), Rapid Spanning Tree Protocol (RSTP), and others. A network device may open and maintain a session (“keep-alive network session”) to facilitate the transmission of keep-alive messages. Different such keep-alive network sessions may be opened and maintained by a network device for different protocols. Several of the sessions may be maintained in parallel. For each session, the network device is configured to transmit keep-alive messages at regular pre-defined time intervals specified by the protocol associated with the session. A keep-alive message transmitted for a session may identify the associated protocol and may also comprise information identifying the session for which the message has been transmitted.
As indicated above, keep-alive messages for a session have to be sent at predefined time intervals, where the duration of the time interval is typically defined by the keep-alive protocol corresponding to that session. For example, for the OSPF protocol, keep-alive messages have to be transmitted every ten seconds. As another example, for the IS-IS protocol, keep-alive messages have to be transmitted every ten seconds. For some other protocols, keep-alive messages may have to be transmitted every second.
As networks have gotten faster and for detecting network problems faster, the time intervals for sending keep-alive messages have gotten shorter. These periodic time intervals can be in the order of milliseconds (msecs) or even faster. For example, for the UDLD protocol, the periodic time interval is 500 msecs. In another example, some protocols may have a periodic time interval of 100 msecs. Such reduced time intervals are becoming problematic for network device that are not capable of handling the transmission of keep-alive messages within such short time intervals.
The problem is further compounded for network devices that provide high availability (HA) by supporting non-stop routing (NSR) and/or non-stop forwarding (NSF). In such a network device, the data forwarding or routing functionality provided by the network device is expected to continue without much impact even when the network device experiences certain events (e.g., a soft reboot, software upgrade, certain component failures) that impact the functionality of the network device. Such NSR or NSF functionality is typically provided using redundant subsystems. In a typical setup, a network device provides redundant subsystems for performing data forwarding or routing functions that are configured to operate according to the active-standby model of operation. In such implementations, one of the subsystems operates in an “active” mode and performs a set of networking functions while the other subsystem operates in a “standby” mode in which the set of functions performed by the subsystem operating in the active mode are not performed. In response to certain events, a failover or switchover may occur that causes the subsystem previously operating in the standby mode prior to the failover to start operating in the active mode and take over performance of the functions performed in active mode. The previous subsystem operating in active mode may operate in the standby mode. This enables the set of networking functions performed by the network device to continue to be performed without significant interruption.
In conventional network devices, transmission of keep-alive messages is handled by the subsystem operating in standby mode. However, the failover or switchover itself may take a few seconds or even a few minutes. During this time period keep-alive message may not be sent by the network device until the new active subsystem becomes fully functional (because the previous active subsystem is no longer active and the previous standby subsystem is in the process of being “brought up” in active mode). This can be problematic, for example, for keep-alive protocols requiring keep-alive messages to be sent in time intervals in the order of milliseconds. This may cause one or more devices in the network receiving the keep-alive messages to incorrectly assume that a particular keep-alive network session is no longer active or has been dropped, that the sender network device is down or a link is no longer operating.