1. Field of the Invention
This invention relates generally to a method and apparatus for portable checkpointing and specifically to a method and apparatus for pre-compiling user software, written in a general-purpose programming language, to enable portable checkpoints in a Universal Checkpoint Format (UCF). Some of the concepts related to this invention are disclosed in the following two reports by the inventors: Volker Strumpen, Balkrishna Ramkumar, Portable Checkpointing and Recovery in Heterogeneous Environments, Dept. of Electrical and Computer Engineering, University of Iowa, Technical Report No. 96-61-1, June 1996 and B. Ramkumar and V. Strumpen, Portable Checkpointing for Heterogeneous Architectures, Proceedings of the 27th Fault-Tolerant Computing Symposium, Jun. 25-27, 1997. Both of these reports are incorporated herein by reference.
2. Background of the Related Art
As Internetworking matures, worldwide distributed computing will become more prevalent. Simple probabilistic analysis suggests that such large geographically distributed systems will exhibit a high probability of single failures, even if each individual component is quite reliable. Due to the difficulties associated with programming such systems today, local area networks (LANs) are still used heavily for long running simulations. Even on such systems, failures occur frequently due to a variety of reasons including network failures, process failures, and even administration downtime. Thus, fault tolerance is fast becoming an essential feature for networked programming systems.
Large distributed systems are inherently heterogenous in nature. Even LANs today often consist of a mixture of binary incompatible hardware components and operate with an even larger variety of operating systems, or different versions of the same operating system. Providing fault tolerance in such environments is a key technical challenge, especially since it requires that checkpointing and recovery be portable across the constituent architectures and operating systems.
A checkpoint is the state of a computation, saved partway through its execution. A checkpoint can be restored and the computation can be recovered from that state. Portable checkpoints are machine independent checkpoints based on the automatic generation of checkpointing and recovery code.
The subject of checkpoints has been investigated by several researchers, especially in the field of fault tolerance. Unfortunately, no one has been able to develop the technology (software of otherwise) that provides for machine independent state generation and restoration for general-purpose programming languages.
In the present invention, a user software program is pre-compiled with a source-to-source pre-compiler before a native compiler generates the machine dependent object code. This object code may now generate portable checkpoints of the program state on a stable storage medium at a desired frequency. The checkpoint can be recovered on a binary incompatible machine, possibly with a different processor and operating system.
Some application areas of this technology are support of fault tolerance in heterogeneous computer networks, migrating process to binary compatible machines for load balancing or load redistribution, suspension of execution of a program for subsequent execution at a later time on a possibly different configuration of machines, or retrospective diagnostics and debugging.
This method provides a cheap and cost-effective solution to computationally intensive problems where dependability is critical, either because a quick response time is essential, or because failures result in higher operation costs, important application areas include e.g., air-traffic control, battlefield virtual realty simulation, hardware design, and VLSI design and test. Current technology requires companies (e.g., IBM, Intel, Boeing) to invest heavily in replicated hardware, or spend substantial effort and time running long and complex simulations to identify and debug flaws and potential weaknesses in their product designs.
The problem of reliability in computing systems has been studied in many different forms. The evaluation of the performability of degradable computing systems was first addressed in a seminal paper by Myer, J. F. On evaluating the performability of degradable computing systems, IEEE Transactions on Computers, 29(8):720-731, August 1980.
Reliable computing has also received attention in the context of parallel and distributed systems, ranging from hardware and/or interconnection network-specific solutions, language specific solutions, algorithm-specific solutions to application-specific solutions. A good survey of checkpointing and rollback techniques can be found in: (1) Deconinck, G. Vounckx J., Cuyvers R., Lauwereins R., Survey of Checkpointing and Rollback Techniques. Technical Report 03.1.8 and 03.1.12, ESAT-ACAA Laboratory, Katholieke Universiteit, Leuven, Belgium, June 1993 and (2) Elnozahy E. N., Johnson D. B., Wang Y. M. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. Computing Surveys, 1996. (submitted), Also Technical Report CMU-CS-96-181, School of Computer Science, Carnegie Mellon University.
There has also been work in optimizing the checkpointing and recovery process. Beck M., Plank J. S., Kingsley G. Compiler-assisted checkpointing. Technical Report CS-94-269, University of Tennessee, December 1994. submitted to FTCS 95. Beck et al classify checkpointing optimizations into two categories: latency hiding optimizations and memory exclusion optimizations. Latency hiding optimizations make a copy of the checkpoint in main memory and overlap the task of writing the checkpoint to stable storage with useful computation. Compression algorithms have been used to reduce the amount of data to be checkpointed, although it has been shown that compression is only beneficial in systems exhibiting contention for secondary storage.
Memory exclusion optimizations include incremental checkpointing, compiler-assistance to reduce the frequency and volume of checkpoints, and user-directed checkpointing. The use of hardware support to identify memory pages that have changed since the last checkpoint has been proposed (Elnozahy E. N., Johnson D. B., Zwacnepoel W. The performance of consistent checkpointing. IEEE Symposium on Reliable and Distributed Systems, pages 39-47, October 1992). These pages are then copied to secondary storage using copy-on-write while program execution continues. While yielding very low checkpointing overhead, a primary disadvantage of this method is that is restricted to binary compatible hardware and operating systems.
The use of compilers to assist in the checkpointing process was first proposed by Li and Fuchs (Li C-C. J., Fuchs W. K. CATCH--Compiler-assisted Techniques for Checkpointing. In International Symposium on Fault Tolerant Computing, pages 74-81, 1990 and Li C-C J., Stewart E. M., Fuchs W. K. Compiler Assisted Full Checkpointing. Software-Practice and Experience, 24 no. 10:871-8861, October 1994), where the compiler identifies points in the program where checkpoints may potentially be taken, the heuristics are used to determine which of these checkpoints will be activated. Beck et al propose extensions to the transparent libckpt library for automatic uniprocessor checkpointing. They support compiler directives that may be provided by the programmer (or a static analyzer) to optimize the frequency of checkpointing and the amount of information that needs to be checkpointed, by identifying memory that can be excluded from being checkpointed. This work does not address portability.
Elnozahy et al (Elnozahy E. N., Johnson D. B., Zwaenepoel W. The performance of consistent checkpointing. In IEEE Symposium on Reliable and Distributed Systems, pages 39-47, October 1992) and Plank et al (Plank J. S., Beck M., Kingsley G., Li K. Libckpt: Transparent Checkpointing under Unix. In Proceedings of the Usenix Winter Technical Conference, San Francisco, Calif., January 1995) have proposed efficient implementation techniques to minimize the overhead of checkpointing to few percent of the execution time, The techniques developed in these references rely on efficient page-based bulk copying and hardware support to identify memory pages modified since the last checkpoint, Unfortunately, these optimizations are restricted to binary compatible hardware and operating systems.
The issue of portability across heterogeneous architectures has been addressed in the language community (Franz M. Kaashoek Code generation on the Fly: A Key to Portable Software. PhD thesis, Institute for Computer Systems, ETH Zurich, 1994 and Gosling J. The Java Language Environment. Technical Report, Sun Microsystems, Mountain View, Calif., 1995. white paper. Languages like Java provide an interpreter-based approach to portability where the program byte code is first "migrated" to the client platform for local interpretation. Unfortunately, such methods severely compromise performance since they run at least an order of magnitude slower than comparable C programs. Another possibility is "compilation on the fly" which provide portability by compiling the source code on the desired target machine immediately prior to execution. This technique requires the construction of a complex language environment. Moreover, to date neither interpreter-based systems nor compilation on the fly are explicitly designed to support fault tolerance.
The idea of stack mobility has been explored by researchers in a limited context. Theimer and Hayes (Theimer M. M., Hayes B. Heterogeneous Process Migration by Recompilation. In Proceedings of the 11th International Conference on Distributed Computing Systems, pages 18-25, July 1991) present a recompilation-based approach to heterogeneous process migration. Their compilation technique is to, upon migration, translate the state of a program into a machine independent state. Then, a migration program is generated that represents the state, and can be compiled on a target machine. When run, the machine independent migration program recreates the process. Rather than compiling a migration program each time that a checkpoint is to be taken, the present method instruments the original program with code that barely affects the runtime during normal execution. This avoids the overhead of compiling a migration program and is conceptually much simpler. However, several assumptions are made, including one that "the state of a program at any migration point is sufficiently well-specified to allow its complete translation between machine-dependent and machine-independent forms." What constitutes a migration point, and how this program state is identified and translated are not discussed.
Richards and Ramkumar (Richards, R. J., Ramkumar B. Blocking Entry Points in Message-Driven Parallel Systems. In International Conference on Parallel Processing, August 1995) report the transformations needed to support runtime stack mobility for small tasks in a portable parallel language called ELMO. The technique relied on explicit programmer support for marshaling and unmarshalling complex data structures. The transformations were developed for task migration in portable parallel programming environments for homogeneous networks and did not discuss fault tolerance or checkpointing.
Zhou et al (Zhou S., Stumm M., Li K., Wortman D. Heterogeneous Distributed Shared Memory. IEEE Transactions on Parallel and Distributed Systems, 3 no. 5:540-554, September 1992) describe the Mermaid system for distributed shared memory on heterogeneous systems. This system is not fault tolerant, but generates data representation conversion routines automatically for all shared memory objects. This paper provides a detailed treatment on conversion. A major difference from the present invention is the conversion code generation for complex data types. Whereas Mermaid uses "utility software" to generate this code, the present invention utilizes the information provided by the abstract syntax tree to this end. Another design decision of Mermaid is the dedication of a page of memory to a particular data type. Although the authors defend this method in the context of dynamically allocated shared memory, such an organization is clearly impractical for the runtime stack, which has to be converted too when saving a checkpoint. Moreover, the poor data locality caused by this data organization is likely to result in a significant loss in performance.
Seligman and Beguelin (Seligman E., Beguelin A. High-Level Fault Tolerance in Distributed Programs. Technical Report CMU-CS-904-223, Carnegie Mellon University, December 1994) have developed checkpointing and restart methods in the context of the Dome C++ environment. Dome provides checkpointing at multiple levels, ranging from high level user-directed checkpointing that sacrifices transparency for portability and low overhead, to low level checkpointing that is transparent but results in non-portable code and requires larger checkpoints. Dome's checkpointing is designed for portability, but requires that the program be written in the form of a main loop that computes and checkpoints alternately. This obviates the need to store the runtime stack. Our approach, on the other hand, provides a general mechanism to save the runtime stack.
In contrast to these other methods, our invention presents a novel method and apparatus for portable checkpointing in heterogeneous network environments. Programs can be checkpointed on one machine running UNIX, and transparently recovered on a machine with different byte-ordering and data-alignments. The present invention provides a new, efficient portable checkpointing and recovery mechanism that provides both portable program execution as well as fault tolerance in heterogeneous environments.
The above references are incorporated by reference herein where appropriate for appropriate teachings of additional or alternative details, features and/or technical background.