Referring to FIG. 1, the present invention concerns interactions and interdependencies of agents 102-1 through 102-N cooperating in a distributed processing computer system 100. Depending on the operating system used, each agent 102 may be a thread or process, and thus is a unit that executes a computation or program. Some of the agents 102-1 through 102-N may be executing on one data processing unit while others are executing at remote sites on other data processing units. More generally, agents can be hosted on different computer systems using different operating systems.
In addition to application processes executed by agents 102, the distributed system 100 also includes "external" devices 104 (i.e., external to the agents 102) with which messages are exchanged, journal processes 106 that record state information on stable, non-volatile, storage 108, and at least one restart manager process 109 that restarts other processes in the system after a failure. Application processes use journal services to record state information on stable storage. To further protect from failures, journal processes 106 often store data on two or more non-volatile storage media to compensate for the unreliability of storage devices such as magnetic disks. Typically, data written to a journal service is recorded on stable storage in the order received, and the data stored on stable storage cannot be modified, making write operations to stable storage irrevocable operations.
In this document the terms "journal," "journal process" and "journalling process" are used interchangeably. All refer to a process for storing information on stable storage to enable consistent recovery and completion of a distributed computation after a failure of the computer system, a part of the computer system, or any process running on that computer system.
The restart manager process 109 is used when a computer system is powered on or reset after a system failure. It uses information stored on stable storage 108 to determine the state in which each application process 102 is to be restarted. A communications path or bus 110 interconnects the various processes 102, 108, 109 and devices 104 in the system 100.
Each agent's application program is, in the context of the present invention, considered to be a finite state machine which progresses through a sequence of internal states. Complex computations are mapped into simpler sets of states suitable for synchronization with other computations.
Application processes execute user-defined programs and synchronize their execution by exchanging messages. In any particular application process, a set of protocols defines the types of messages sent, as well as the applicable constraints thereon--i.e., the circumstances under which each message type is to be sent and/or received. Such constraints define order and coexistence requirements between messages.
Computer processes can fail due to software errors or hardware failures. Failures can cause messages and process state information stored in a computer's volatile memory to be corrupted, lost, or otherwise unusable. However, state information recorded on external devices such as disks, terminals, etc. remain in existence independent of process failures. As a result, state transitions are called external actions if they cause information to be recorded on external devices.
If state information has been recorded on external devices, execution of an agent may have to continue after a process or system failure if the computation being performed by the agent was interrupted and the agent has not already entered a final state. To continue processing and consistently complete protocols in the presence of failures, the "process state" of each agent typically needs to be stored on stable (nonvolatile) storage. To compensate for lost messages and to ensure protocol termination, a message may need to be recorded on stable storage and sent repeatedly until received.
It is a premise of the present invention, as well as a premise of most distributed computer processing systems, that processes have to continue or resume execution even after a failure if resources are left in an intermediate state. Such situations arise when processes are interrupted while performing multiple related external actions such as dispensing money at a teller machine, updating secondary storage, or setting machinery. Premature termination of such processes would potentially leave devices in an intermediate, usually inconsistent state, cause machinery to be blocked, allow money to be withdrawn incorrectly, or cause other kinds of inconsistencies. Extensive studies of these types of scenarios have been made in the area of transaction processing systems and database systems.
To ensure that an interrupted process can continue execution, it is common practice to use a "journalling process" to store state information regarding each intermediate state of each constituent process. The problem addressed by the present invention concerns the high cost of journalling state information for the intermediate states of an application process. In particular, each journalling operation uses scarce system resources, and also slows down the progress of the application process because of the requirement that state information be stored on stable storage before the actions associated with a subsequent state transition are performed.
It has been recognized in the past that many protocols can be modified so as to reduce the associated journalling requirements. For instance, there are a number of variations on the so-called "two phase commit" protocol used in transaction processing, designed to avoid journalling one or more states that would otherwise have been considered to require such journalling. For systems handling millions of transactions, avoiding one journalling step per transaction is a significant savings.
In the past, such adjustments to protocols to avoid journalling have been performed manually on an ad hoc basis. The present invention provides an automated system and method for identifying the states of each agent participating in a distributed computation that must be journalled and the states that do not need to be journalled.
In contrast to other techniques used to ensure correction execution of protocols in the presence of failures, the present invention does not require a process state to be checkpointed on each send operation nor does it require processes to execute a snapshot protocol. Rather, the present invention assumes that the behavior derived for each finite state machine ensures correct execution.