The invention relates generally to Java virtual machines and, more particularly, to a Java virtual machine with built-in support for fault-tolerant operation.
Large-scale, complex computer systems are brought into use through integration of software programs with a hardware platform. It is important that such systems be reliable and, because many such systems are xe2x80x9cservers,xe2x80x9d they are expected to run continuously for long periods of time without failure or interruption. As a result, various techniques, such as the use of special-purpose redundant hardware, are employed to ensure continuous service. Such techniques provide what is often collectively referred to as xe2x80x9cfault tolerancexe2x80x9d because they enable such systems to mask, i.e., recover, from a fault, such as the failure of a hardware component.
Fault-tolerance may also be obtained through software technology that utilizes commodity hardware that is less expensive. Frequently, such techniques utilize xe2x80x9ccheckpointsxe2x80x9d wherein system status from one instance of an application is copied to a backup instance, such that the backup instance can take over processing using the copied system status as a starting point.
A telecommunication network is an example of a complex system that must be reliable. Telecommunication networks facilitate communications between a large number of public and private communications systems by providing numerous functions such as switching, accounting, time management, and the like. A telecommunications network provides such functions through network switches, or nodes, interconnected by links, or channels, of transmission media such as wire, fiber-optic cable, or radio waves. Some of the nodes are connected to one or more users.
Modern telecommunication networks require complex, automated switching and, to that end, software programs are written to provide reliable, dependable performance and efficient use of resources, as well as service features and functions, such as Call Waiting, Caller ID, and the like. Such systems may be configured in a number of different ways depending on what types of transmission media are used, what types of users are served, and what mix of features are purchased. As a result of such complexities and the large number of different configurations, it is difficult to operate such systems reliably and dependably. Software programs that operate such systems must, thus, be extremely reliable and achieve very a high fault-tolerance.
A programming language adapted for implementing software for such systems is xe2x80x9cJavaxe2x80x9d which was introduced by Sun Microsystems, Inc., of Palo Alto, Calif. Java has been described as an object-oriented, distributed, interpreted, robust, secure, portable, architecture-neutral, multithreaded, and dynamic computer language.
To obtain fault-tolerance for software systems using Java, application software may be written such that all fault-tolerance capabilities, include the derivation of checkpoints, is built into the application program by its developer. However, experience has shown that this may not be an optimal solution. In many cases, changes to application programs are made without correctly changing the portions of the programs which effect the checkpointing, such that the checkpoints are not accurate and the system state copied to the backup is corrupt. In addition, mechanisms developed in application software may also be intrusive to the software source code (since additional code is added in ways that obfuscate understanding of the working of the system under normal conditions), or introduce additional inefficiencies into the software program.
Accordingly, a continuing search has been directed to the development of methods for mechanisms within the JVM which allow the JVM to support checkpointing in ways that are less intrusive, more efficient, and more likely to compute accurate checkpoint data.
According to the present invention, a method is disclosed for reliably and efficiently supporting fault-tolerance mechanisms within a Java virtual machine (JVM) by modifying the JVM itself. Such modifications to a first JVM permit the first JVM to use internal information maintained by the first JVM to checkpoint objects that are created, modified, and/or deleted during the process of responding to an event of a transaction. The checkpointed objects are sent to and stored in a second JVM such that the second JVM may take over the responsibilities of the first JVM should the first JVM fail. The application-level programmer is thus relieved of the burden of incorporating checkpointing into the source code and/or object code of an application program.