Operators of both enterprise and service provider networks strive to deliver 100% user access to data resources. Network devices, such as switches and routers, that are deployed in critical parts of enterprise and service provider networks must achieve close to 100% high availability (HA).
In order to achieve HA, network devices include redundant route processors. Redundancy on a network device requires two route processors per chassis and a redundancy protocol to synchronize information between these two instrances.
The route processor that boots first becomes the Active or in the case where the boot together they negotiate which is the Active. The Active is responsible for control-plane and forwarding decisions. The second processor is the Standby, which does not participate in the control-plane or data-plane decisions. The Active synchronizes configuration and protocol state information to the Standby. As a result, the Standby is ready to take over the Active responsibilities if the Active fails. This “take-over” process from the Active to the Standby is referred to as switchover.
The switchover time has 3 (or 4, depending on implementation) parts. The first and most important part is the time it takes to continue forwarding after a switchover event. This is hardware architecture dependent but can be as little as zero seconds and as much as 2 seconds. The second part is how long it takes the protocol applications to greet their peers after the switchover event. This part is secondary because it is “protected” by timers that allow a certain amount of time to elapse without hearing from a peer before the neighbor declares them down. This time is typically 5-6 seconds and the timers are typically greater than that. The third part is how long it takes NSF routing protocols to reconverge. This time is hidden by continuing to forward using the Forwarding Information Base that was current at the time of failure until the tables are updated due to convergence. The optional fourth part depends on configuration and whether an upgrade is occurring that requires line cards to be reloaded or not. This part does not apply to multi-chassis configurations. It is the time it takes to update and reconfigure the line cards after a switchover with new images and data when required. This part is referred to as “Minimum Disruptive Reload” (MDR) and can be as little as 0.5 seconds to several seconds. This disrupts forwarding and affects the first part although they the parts are not concurrent (i.e., they typically appear as two separate small interruptions of service)
Various techniques have been developed to reduce the switchover time. The assignee of the present application has developed a set of features called NonStop Forwarding (NSF) with Stateful SwitchOver (SSO).
NSF with SSO reduces the mean time to repair (MTTR) by allowing extremely fast switchovers with 0 to 3 seconds of packet loss. NSF with SSO can be deployed in the most critical parts of an enterprise or service provider network. It is an essential feature for single points of termination in the network, and it minimizes downtime when voice over IP (VoIP), video, and other packet loss-sensitive applications are involved.
In-Service Software Upgrade (ISSU) is SSO for different versions of the Internet Operating System (IOS®). For example, a Standby could be upgraded on a network device that runs a later version of IOS® than that run by the Active.
ISSU provides a process and supporting infrastructure to allow customers to easily manage an upgrade or downgrade and preserves the existing SSO execution model. An important feature of ISSU is allowing applications to add, delete, and modify message types and data content as required and maintaining interoperation between different releases of the system and of the applications.
This is achieved by providing an ISSU infrastructure that allows applications to provide dynamic message transformation functions to upgrade or downgrade messages as appropriate based on session negotiation results. The ISSU infrastructure provides services for managing sessions between peer endpoints to establish a connection and message transformation services for transforming messages between versions as required.
The requirement for message transformation services is illustrated in FIG. 1A. FIG. 1A depicts an example of an ISSU client, i.e., a component that utilizes services of the ISSU infrastructure, that has a message of type 4 which has four versions. Note that version 2 includes fields “E” and “F” that are not present in version 1. Thus, if a first endnode sends version 1 of message 4 to a second endpoint expecting version 2 of message type 4, a message transformation function must be called to transform version 1 into version 2 by adding the fields “E” and “F”. This all takes place automatically due to the negotiation done when the message session is established between the two endpoints.
Another problem occurs for messages that include “foreign-owned fields” as depicted in FIG. 1B. The “foreign-owned field” problem arises when a message that an ISSU client owns contains one or more data fields that are managed and versioned by some other outside entity, referred to here as a “foreign owner”. The format and exact contents of the “foreign-owned field” are unknown to the using ISSU client. It only knows the definition of the field and typically it will get this from an opaque typedef in a header file that the owner publishes. The field value(s) will often be used to access or reference data owned and managed by another feature or service and is used by passing the field value(s) to a service or feature for its use. All of this occurs with the user(s) of the foreign-owned field data treating it as an opaque value.
A concrete example of such a relationship exists for all routing clients that make use of a foreign-owned identification field (FO_id) in their messages. They obtain the value and use it, but have no knowledge of what it should look like on the peer when different versions of the image are being executed on the Active and Standby unit during an ISSU upgrade. Any message based communication system can use this infrastructure, including two different processes executing on the same processor that communicate using messages.
FIG. 1B shows an example of an ISSU client that has a message of type 4 which has four versions. Version 1 of the client message uses version 1 of the FO_id, version 2 of the client message uses version 2 of the FO_id, and versions 3 and 4 of the client message use version 3 of the FO_id. The version of the FO_id that is required for a specific version of the client message was determined by the implementation of the foreign owner of FO_id and the client in the first image that supported the client with the particular version of its message type 4.
In this example, the first IOS images that were released supported client message type 4, version 1. Then, in a subsequently released IOS image the client message type 4 version 1 was changed to version 2 where the fields “E” and “F” were added. Coincidentally, the FO_id was also changed to version 2 which is used in the client's type 4, version 2 message.
In yet a later released version of IOS and of the ISSU client, the client's message type 4 is upgraded to version 3. Coincidentally, the FO_id version was also changed to version 3 in that release. As shown, version 3 of the client message uses version 3 of the FO_id and had its message version incremented because the size of the foreign-owned field, “C”, has changed between client message version 2 and client message version 3. The client message had to be upgraded to version 3 because the FO_id changed size and therefore the offsets and possibly the alignment in the client message also changed. For that reason the message had to be versioned to the next version number, otherwise its peer would not know that the offsets and alignment had changed.
In the next IOS release, the client upgrades its message type 4 to version 4 because it added field “G”. However, the version of the FO_id did not change in that release, so the client type 4 message version 4 still uses version 3 of the FO_id.
The “foreign-owned field” versions are managed separately and may be different from the user message versions. The user of the field has no context as to how to version the “foreign data”. If the using client should version its message containing a “foreign-owned field” and send it to the peer, the value, size and format of the “foreign-owned field” may be incorrect if not transformed. But there is no knowledge of how to accomplish this task in the using client transformation functions. In order to allow the using client to transform the foreign-owned data, the data could be tracked by each using client in concert with the foreign owner. But this process would be unscaleable, error prone and inelegant.
Thus, the challenges of managing versioned messages when various fields are managed by different components and therefore are versioned without respect to one another require a scalable, non-error prone solution.