1. Field of the Invention
This invention pertains to software-based checkpointing of applications running on computer systems, computer networks, telecommunications systems, embedded computer systems, wireless devices such as cell phones and PDAs, and more particularly to methods, systems and procedures (i.e., programming) for checkpointing and checkpoint-restoration of applications where the core checkpointing service is performed as a kernel service.
2. Description of Related Art
In many environments one of the most important features is to ensure that a running application continues to run even in the event of one or more system or software faults. Mission critical systems in telecommunications, military, financial and embedded applications must continue to provide their services even in the event of hardware or software faults. The autopilot on an airplane is designed to continue to operate even if some of the computer and instrumentation is damaged; the 911 emergency phone system is designed to operate even if the main phone system if severely damaged, and stock exchanges deploy software that keep the exchange running even if some of the routers and servers go down. Today, the same expectations of “fault-free” operations are being placed on commodity computer systems and standard applications.
Checkpointing is a general technique used to capture some or all of an application's state and preserve the state for use at a later time. The application state can, by way of example, be used to recovery a crashed application and to migrate, i.e. move, an application from one server to another.
TODO: The present invention builds on the teachings in U.S. patent application Ser. No. 13/096,461 wherein Havemose (“Havemose”) teaches SYSTEM AND METHOD FOR HYBRID KERNEL- AND USER-SPACE CHECKPOINTING. The present invention further builds on the teachings in U.S. patent application Ser. No. 13/920,889 wherein Havemose (“Havemose”) teaches SYSTEM AND METHOD FOR HYBRID KERNEL- AND USER-SPACE CHECKPOINTING USING A CHARACTER DEVICE. In these two patent applications Havemose teaches system and methods for checkpointing multi-process applications using a hybrid kernel-mode and user-mode checkpointer and the use of a character-device for checkpointing. In U.S. Pat. No. 7,293,200 Neary et al (Neary) disclose “Method and system for providing transparent incremental and multiprocess checkpoint to computer applications”. In U.S. patent application Ser. No. 12/334,660 Backensto et al (Backensto) teach METHOD AND SYSTEM FOR PROVIDING CHECKPOINTING TO WINDOWS APPLICATION GROUPS providing similar checkpointing services to Windows applications, and in Ser. No. 12/334,634 Havemose (Havemose) teach METHOD AND SYSTEM FOR PROVIDING COORDINATED CHECKPOINTING TO A GROUP OF INDEPENDENT COMPUTER APPLICATIONS. Neary, Havemose and Backensto use a user-space checkpointer combined with interception and functionality to adjust links to libraries and files for checkpoint restore.
OpenVZ (http://en.wikipedia.org/wiki/OpenVZ) approaches checkpointing differently by providing checkpointing using a custom kernel. In other words, checkpointing is provided using a custom operating system.
Virtual Machine technology, such as VMware®, XEN®, and KVM, offer similar features, often using terminology such as snapshot and live migration. Virtual machine technology, however, is an entire additional software layer sitting under the operating system, which adds overhead and management complexity.
The prior art thus requires functionality running in user space (Neary, Havemose and Backensto), a custom operating system (OpenVZ), or a commitment to a hardware virtualization platform (VMware, XEN and KVM). Having a checkpointer with extensive user space components makes the checkpointer very dependent on system libraries and requires constant updating as user libraries change between releases of the operating system. Relying on a custom operating system requires application to be customized for the custom operating system, which can reduce the number of applications available to customers. Finally, a commitment to hardware/system virtualization can be expensive and change the deployment and management model of applications.
There is therefore a need for a checkpointing service that runs fully transparent to the applications, runs on standard operating systems, and operates without requiring a hardware virtualization layer. The present invention provides checkpointing as a kernel service generally loaded as a loadable kernel module working along with user-space interceptors. The kernel service may be dynamically loaded into the kernel and provides checkpointing services universally to all applications without requiring any application customizations or customization to the underlying operation system. Interceptors are loaded as part of loading the application. The kernel module checkpointer can be further optimized by modifying the kernel. Likewise, no hardware or system virtualization technology is required which keeps memory and hardware requirements minimal.