1. Field of the Invention
This invention relates to the field of high-availability computer systems and, more particularly, to checkpointing application components.
2. Description of the Related Art
As web-based applications become more important to business and industry, system failure becomes more expensive, and highly reliable systems assume a greater importance. For example, a web site may handle financial, production, sales, marketing or media applications. Failure to provide these applications to clients for even a few minutes could mean thousands or millions of dollars in lost income.
One way to provide applications on a highly reliable basis is a distributed system with a plurality of redundant components. In a distributed system, a plurality of servers may be connected by a load balancer and a network. Each server may execute one or more application components which comprise a web application. In addition, each application component may have one or more redundant backup components located on a separate server. In the event of a server failure, a redundant backup component located on a still-operational server may be activated, and execution of the application may continue with little to no break in service.
In order to maintain backup application components, the current state of active application components may be checkpointed to a backup store. Current data values for each application component running on a server may be in a data file or other data object. In the event of a server failure, the data objects may then be retrieved by fail-over components on another server. By restoring the data values of each failed component, a backup component may in effect “become” the failed component, and thus be able to continue any actions that the failed component was carrying out prior to failure.
In order to minimize the complexity and effort needed to create a web application, servers may provide built-in checkpoint functionality to application components, thereby saving programmers the effort of creating their own checkpoint mechanisms. Some servers may checkpoint application components by replicating the component state in memory (as opposed to a persistent store) on another server for a secondary instance of the application component. If the primary component fails, the secondary component takes over. However, this limits flexibility in choice of fail-over components. For example, load balancing fail-overs may not be possible. Other servers may checkpoint all application components to a persistent store a periodic time intervals. Although checkpointing to a persistent store may provide more flexibility in managing the fail-over of application components, checkpointing at periodic intervals may result in unnecessary checkpointing in some cases and may miss application state changes in other cases. Typically, conventional application servers apply the same checkpointing technique to all application components running on that server. Thus, all application components on a particular server are typically checkpointed the same way.
In addition, some application components may contain data structures that are not readily serializable to be checkpointed to a persistent store. Such components may not be checkpointed.