The present invention is related to U.S. patent application Ser. No. 07/601,990, filed Oct. 23, 1990, now U.S. Pat. No. 5,329,626, entitled Rule Driven Transaction Management System and Method, which is assigned to the same assignee as the present invention and which is hereby incorporated by reference.
Referring to FIG. 1, the present invention concerns interactions and interdependencies of agents 102-1 through 102-N cooperating in a distributed processing computer system 100. Depending on the operating system used, each agent 102 may be a thread or process, and thus is a unit that executes a computation or program. Some of the agents 102-1 through 102-N may be executing on one data processing unit while others are executing at remote sites on other data processing units. More generally, agents can be hosted on different computer systems using different operating systems. For the purposes of the present discussion, it is sufficient to assume that there is a communications path or bus 110 which interconnects all the agents in the system 100.
In addition to agents which represent computer processes, a distributed system 100 may also include agents which represent "external" devices 104. The infrastructure of a distributed computer system also contains, among other well known components, journal services (herein called journal processes) 106 that record state information on stable, non-volatile, storage 108, and at least one restart manager process 109 that restarts other processes in the system after a failure. Application processes use journal services to record state information on stable storage. To further protect from failures, journal processes 106 often store data on two or more non-volatile storage media to compensate for the unreliability of storage devices such as magnetic disks. Typically, data written to a journal service is recorded on stable storage in the order received, and the data stored on stable storage cannot be modified, making write operations to stable storage irrevocable operations.
The restart manager process 109 is used when a computer system is powered on or reset after a system or process failure. It uses information stored on stable storage 108 to determine the state in which each application process 102 is to be restarted. A communications path or bus 110 interconnects the various processes 102, 108, 109 and devices 104 in the system 100.
Each agent's application program is, in the context of the present invention, considered to be a finite state machine which progresses through a sequence of internal states. Complex computations are mapped into simpler sets of states suitable for synchronization with other computations.
Application processes execute user-defined programs and synchronize their execution by exchanging messages. In any particular application process, a set of protocols defines the types of messages sent, as well as the applicable constraints thereon--i.e., the circumstances under which each message type is to be sent and/or received. Such constraints define order and coexistence requirements between events (i.e., state transitions).
In commercial data processing systems, transaction processing is one of the most important distributed applications. Transaction processing systems execute a predefined number of protocols. Among them is a protocol which enables components in a transaction processing system to reach agreement on the outcome of a transaction. The protocol is known as the "two phase commit" protocol. Although that is not important to the present discussion, it is important that the use of a fixed number of predefined protocols makes it possible to "optimize message flows" by writing the application software to avoid sending agent-to-agent messages in certain predefined situations.
For instance, the coordinator in a "two phase commit protocol" only sends abort notifications to those participants from which it did not receive an abort notification. Thus the coordinator exploits its local knowledge that those participants already know about the outcome of a transaction and therefore need not be notified.
Other optimizations that can be used in transaction processing systems are concerned with only journaling those message which must be sent reliably, e.g., in the "presumed commit" version of the two phase commit protocol only abort notifications are journaled. See U.S. patent application Ser. No. 08/051,523, filed Apr. 22, 1993, now U.S. Pat. No. 5,371,889 which is hereby incorporated by reference. An agent which voted "yes" on the outcome of a transaction infers commitment of the transaction from not receiving an abort notification. The coordinator by knowing when participants are enabled to vote on the outcome of a transaction delays journaling actions until sending "request-to-commit" notifications. This avoids journaling actions for those transaction which are aborted before starting the two phase commit protocol. In accordance with the present invention, by exploiting knowledge about an external event's pre-conditions, other external actions of an agent can be delayed and sometimes avoided altogether.
Most of the above optimizations implicitly exploit knowledge about the state of the other participant. The prior art does not teach how to optimize message flows in a distributed computation system that is not executing a fixed number of predefined protocols for which optimizations can be defined manually.
In the present invention, the protocols which bind the agents of a distributed computation are highly variable because the application program or programs being used specify the agent dependencies to be used from a library of predefined agent dependency types. The messages to be sent between agents of a distributed computation are dynamically determined at run time based on the agent dependencies that the application program has established. As a result, there is no obvious simple method to optimize message flows since one cannot determine what messages to avoid sending until one knows what agent dependencies have been selected.
The present invention is based on the concept that many possible message optimizations in a distributed computation system whose protocols are determined at run time would require knowledge of the status of events associated with the distributed computation and also require each agent to have some knowledge of the event/condition rules used by other ones of the agents. In the Rule Driven Transaction Management System and Method referenced above, protocols between agents are represented by event/condition pairs. Each agent is configured to operate as a finite state machine insofar as other agents are concerned. Upon triggering an event that indicates a state transition, an agent sends a message to all participating agents whose events are constrained by conditions that refer to the triggered event. For example, if an event a in Agent 1 depends on event b in Agent 2, a notification is sent by Agent 2 to Agent 1 (i.e., by the process triggering event b to the process triggering event a) upon the occurrence of event b. In the context of the present invention, agents can also be programmed to send a notification to other agents to indicate that a condition cannot be satisfied.