1. Field of the Invention
This invention relates generally to hybrid ASCII-binary configuration file management for asynchronous checkpointing and auditing of embedded system software.
2. Description of Related Art
Telecommunication service providers often advertise the reliability of their services by listing the percentage of time per year that their equipment provides full service. When calculating system downtime, service providers may include hardware outage, software failure, and software upgrade periods. For high availability (HA) systems, system downtime must be very infrequent.
Currently, there are two common categories of HA systems: some have “5-nines” availability, while others possess “6-nines” availability. A 5-nines system must be available 99.999% of the time, which translates to roughly five minutes of system downtime per year. A 6-nines system must be available 99.9999% of the time or about thirty seconds per year.
To ensure that HA systems meet their guaranteed availability, redundancy schemes are frequently used to provide protection from both hardware and software failures. In a 1+1 redundancy scheme, one piece of redundant equipment is provided for each active piece of equipment. Alternatively, to allow for cost savings, a service provider may utilize one redundant device for each set of N active devices.
In addition to the redundant hardware, HA systems must also include software that manages the dynamic software object state data transition to a redundant piece of hardware upon failure of the active hardware. Redundant hardware without corresponding software support may produce a “cold start” when initiating the backup hardware. When such a start occurs, services will be interrupted and all service-related, dynamic-persistent state data may be lost.
Even worse, substantial service restoration time may elapse before the redundant hardware becomes active. Service restoration time may include periods to reboot a system with a saved configuration, reestablish connections to network peers, and reestablish active services. Depending upon configuration, it may take several minutes to restore services after a cold start. Due to such outage periods, a system with a cold start can never achieve better than 4-nines availability.
In contrast, a system that requires 6-nines availability must meet very stringent software requirements. The system must have a downtime of less than 50 ms for application restarts, “warm start” of software applications, and controlled failover from an active mode to a standby mode. In addition, the system must take no longer than 5 seconds for software upgrades and uncontrolled failovers.
In addition to these time-based requirements, software packages for HA systems must meet a number of additional requirements. First, the software must maintain high application performance, as telecommunications devices often service thousands of calls per second and tens of thousands of routes or MPLS tunnels per second. Second, the software must checkpoint application state data, while maintaining consistency across multiple applications and between the control and data planes. Embedded systems will not function properly without maintaining data consistency across multiple application processes. Third, the software must allow addition of HA features to third party and legacy software that was not designed for HA systems.
In current systems, software support for hardware redundancy is accomplished using multiple Cooperating Application Processes (CAPs), with each CAP implementing a functional component. These components may include network protocols, hardware forwarding plane management, and dynamic object state information. The functional components exchange data through inter-process communication (IPC), such that the individual components form a cohesive whole. In addition, a standby control plane CAP operates in parallel for each CAP, thereby allowing a quick changeover upon hardware failure.
Asynchronous checkpointing is used to ensure data consistency among the CAPs. The checkpointing process ensures data consistency between active and standby control plane CAPs, across active CAPs, and between the control plane and data plane. In addition, asynchronous checkpointing allows system consistency validation on failover. In this checkpointing schema, each CAP checkpoints only a subset of the object data record, including configuration files, which contain instructions used to manage functionality of the network element.
In current schemes, configuration files are in either ASCII format or binary format, not a combination of the two. This results in significant inefficiencies, as ASCII-based systems require real-time conversion, which consumes resources and slows processing. In addition, ASCII-based files can consume a significant amount of storage space. On the other hand, binary-based schemes improve performance, but make it more difficult for the network operator to modify configuration files.
Accordingly, there is a need for a configuration file framework that minimizes the use of CPU-intensive file parsing and command line conversion logic. In addition, there is a need for a configuration file framework that allows for incremental replication of per-object checkpointed configuration data and automated per-object audits. Furthermore, there is a need to provide these performance benefits, while still allowing easy user editing of the configuration file.
The foregoing objects and advantages of the invention are illustrative of those that can be achieved by the various exemplary embodiments and are not intended to be exhaustive or limiting of the possible advantages that can be realized. Thus, these and other objects and advantages of the various exemplary embodiments will be apparent from the description herein or can be learned from practicing the various exemplary embodiments, both as embodied herein or as modified in view of any variation that may be apparent to those skilled in the art. Accordingly, the present invention resides in the novel methods, arrangements, combinations, and improvements herein shown and described in various exemplary embodiments.