High availability is essential for critical networking resources such as edge routers. An edge router typically serves as a single point of communication between computers on a network and computers outside the network. When a processor or communication process within the edge router experiences a failure, internetworking communication with the network is precluded. In response, edge routers have been equipped with redundant resources that activate at failure.
One problem with redundant resources is that communications are disrupted while the edge router restores contact with network nodes. During operation, applications, higher-layer protocols, lower-layer protocols, and the like form complex layers of interdependent data. For example, edge routers using Border Gateway Protocol (BGP) to make routing decisions can require establishment of a BGP session and a Transmission Control Protocol (TCP) session. In order to restore operations after a failure, the redundant resources typically must reestablish communication with network nodes on several different levels before resuming communications (e.g., establish new TCP and BGP sessions). While resultant downtime may be less than that of rebooting or otherwise repairing failed resources, the edge router is nevertheless unavailable during this time. One approach to reducing downtime is to replicate all data transactions to the standby resources for a faster transition.
However, data replication requires significant resources. For example, current edge routers replicate data using the brute force of large bandwidth data channels to send duplicate input, output, and other data to standby resources. As a result, the processor inherits an additional burden that affects ordinary operations. Alternatively, specialized hardware can be dedicated to off-load the replication tasks. However, this increases the complexity and expense of processor design and requires significant silicon area. Furthermore, modern and future network bandwidths, operating at speeds of 10-Gb/s, 40-Gb/s and beyond, exacerbate these design requirements. Thus, current high availability techniques requires a trade-off between downtime and the requirements of data replication.
Accordingly, there is a need for a robust networking device that maintains statefulness between an active process and/or processor and a standby process and/or processor with reduced checkpointing data. Furthermore, this solution should perform stateful switchovers that continue existing BGP and TCP sessions.