1. Field of the Invention
This invention pertains generally to enterprise computer systems, embedded computer systems, and computer systems in general, and more particularly to transparent multiprocess checkpointing and automatic restoration for computer applications.
2. Description of Related Art
In a number of settings high availability service for complex computer applications is a necessary and nonnegotiable requirement. For example, high availability scenarios include the case of Internet and corporate data centers, financial services, telecommunications, government systems and medical systems.
Providing high availability service generally requires that in the case that an application executing on a first server encounters a failure, for any of a number of reasons, application execution continues automatically on a second server that continues to interact with the client system in response to execution of a failover process. The effort involved in achieving such availability and reliability can be one of the most expensive and time-consuming aspects of application development and can even result in delays in deploying an application.
Therefore, there is a need for methods, systems and procedures for achieving high availability and reliability through a cost-effective, easy-to-use software infrastructure, rather than through prolonged custom coding, lengthy development time and substantial expenditure.
Checkpointing is an integral part of high availability which provides for the capture of the complete state of an application process (i.e., periodically), including, but not limited to, its memory pages, stack contents, open files, sockets, pipes, and other state information that is for instance retained in the kernel on behalf of the process. Later on, the application can be restored in the same state as it was when the checkpoint was taken, even when the restoration is performed on a different computer system.
Traditionally, the problem of saving the state of an application has been approached from within the application, taking advantage of its full knowledge of application internals, data structures, and semantics. However, this approach is highly intrusive, since the application code itself needs to be modified to operate in the desired high availability manner.
Accordingly, checkpointing has been used in various varieties over the years, and more recently, lower level transparent checkpointing has been described using checkpointing software forming a layer between the application and the operating system. For example, William R. Dieter and James E. Lumpp, “User-level Checkpointing for LinuxThreads Programs”, In Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference, pp. 81-92, June 2001, incorporated herein by reference in its entirety, describe a user-level checkpointing library for single-process multi-threaded applications.
However, these checkpointing mechanisms are not amenable for use with high availability services for most applications and suffer from a number of drawbacks. Current forms of checkpointing either require modifications to the application and are thus not transparent, or only work for single-process applications, and do not support transparent incremental checkpointing
The present invention overcomes these shortcomings and provides transparent incremental and multiprocess checkpointing based on a user-level library and a loadable kernel module.