As the number of Internet users and Internet-based mission-critical applications increase daily at an unprecedented pace, service-provider and enterprise customers are demanding greater reliability and availability. When every minute of downtime can mean millions of dollars in lost revenue and embarrassing headlines, companies are eagerly looking for solutions to make their systems highly available. Thus, High-Availability (HA) networking products help customers increase uptime and protect financial performance, reputation, and customer loyalty.
Redundancy is one of the key methodologies used to increase system availability. One HA feature is to include both Active and Standby route processors in the router. When the Active Route Processor (RP) fails or is requested to switch over, the Standby Route Processor takes over so that the system continues processing and forwarding. Switchover occurs when system control and routing protocol execution is passed from the Active RP to the Standby RP. A “hitless” switchover implies no loss of sessions and continued forwarding of traffic during the switchover. The system maintains the appearance of a single router with a single management interface to the outside world at all times so that when a failure occurs, migrating control to another processor is not visible.
Existing systems employ such methodology to deal with route processor failure and increase the system availability. However, there are still areas in this process that can be optimized. In the switchover process, the time from initial failure to first packet transmission can be broken down as follows:
1. Time to identify failure
2. Time to load and boot software on Standby RP
3. Time to load new configuration on Standby RP
4. Time to reset and reload line cards
5. Time to load new configuration on line cards
6. Time to learn routes and pass keepalive message
7. Route convergence
8. Time to begin forwarding again
9. Time to reestablish layer 2 services.
This simplest approach is called Cold Standby, which implies that the entire system will lose function for the duration of the restoration. All sessions and all traffic flowing through the router are lost during the recovery time. The benefit of using Cold Standby is that the router restarts without manual intervention by rebooting with the Standby RP taking control of the router.
Various processor redundancy schemes eliminate one or more of the above steps. For example the second and third steps can be eliminated if the Active and Standby RPs both boot and initialize upon powerup. If the line cards are kept up during switchover then the fourth and fifth steps can be eliminated. A “hot” Standby RP is fully initialized and synchronized with the Active RP and is able to implement a hitless switchover. The time taken by steps 6 and 7 can be hidden by reducing step 8 to zero or near zero time. Step 9 can be reduced to a few seconds
HA features are especially important on edge routers because these routers do not benefit from redundant network architecture topologies that core routers typically benefit from, and, therefore, are likely to be a single point of failure in a network. Customers see the downtime as a major obstacle to their business goals and customer relations. However, it is not always possible to build equipment and circuit redundancy throughout the entire network. Therefore the availability initiatives of an edge router must concentrate on features that will:                Isolate any errors on one part of the router from affecting the rest of the system.        Allow a faulty processor to switch over to any redundant processors in the event of a failure.        Minimize the switchover time between processors.        
In order to ensure that a “hot” Standby processor is able to take up where the Active left off when a switchover occurs, it is required that both the Active and the Standby Control Processors are configured exactly the same at all times. This is necessary so that applications and system services that depend on configured resources have the same resources available on the Standby as they had on the Active before the switchover. Synchronization of the interface and controller state from the Active to the Standby for a set of shared interfaces is also required to enable a transparent, or “hitless”, switchover between the redundant control processors. All resources and states remain intact during switchover so that forwarding can continue and the control plane can quickly recover the interfaces transparently for the protocols and features using them on the Standby RP.
IOS® is Cisco's Internetwork Operating System software which delivers intelligent networking services on Cisco routers. Stateful IOS services and protocols running on the Active checkpoint state data to the Standby ensuring that it is always current and capable of taking over where the Active left off when a switchover takes place. In some architectures, a Forwarding Processor (FP) is packaged with the RP so that they fail as a unit. In such architectures, the FP packaged with the Standby RP must be kept synchronized with the FP packaged with the Active RP.
In systems that provide this functionality, it is expected that the individual HA-aware applications and system services create and maintain the necessary resources and checkpoint any state associated with the necessary resources to the Standby as the state changes on the Active. In existing systems, this only works when all elements of the system are properly instrumented and are HA-aware. For IOS®, in order to allow applications to migrate to HA-aware implementations as demand requires, most applications and services are not HA-aware and therefore are not modified to create required resources on the Standby as well as the Active.
Many other attempts at providing this functionality use hardware redundancy only, which typically affects the software interface and requires an extended recovery time accompanied by a service interruption. Other hardware/software implementations don't provide for both “HA-aware” and “HA-unaware” features/applications to co-exist requiring enormous software changes in the code base in order to support an HA environment.