1. Field of the Invention
The present invention relates generally to distributed computing, and more particularly to a transaction processing system in which component operations in related transactions are distributed so that at least one operation in a second transaction is performed before a first transaction having a conflicting operation is committed. The present invention specifically concerns a method and apparatus for scheduling the performance of the conflicting operations according to available resources and ensuring that the results of the conflicting operations are committed in the same order as the order of performance of the conflicting operations.
2. Description of the Background Art
A desirable feature of a computing system is the ability to recover from partial system failures that interrupt memory write operations. If an application program has a memory write operation in progress at the time of the system failure, it is most likely that the memory record will become erroneous. To enable the recovery of memory records after a partial system failure, it is necessary for the application program to keep backup copies of the records in nonvolatile memory. When the computing system is restarted, the memory records to be recovered are replaced with the backup copies.
To facilitate the making of backup copies and the recovery of memory records, the operating system typically provides an established set of memory management procedures that can be invoked or called from an application program to define a "recovery unit." The recovery unit consists of program statements between a "START" statement and a "COMMIT" statement. All of the statements in the "recovery unit" must be completed before the memory records modified by the statements in the recovery unit are made available for subsequent processing. The "START" statement corresponds to the making of a backup copy in nonvolatile memory, and the "COMMIT" statement corresponds to switching of the backup copy with a modified version. The statements in the "recovery unit" specify operations in a single "transaction." Upon recovering from a partial system error, inspection of the nonvolatile memory will reveal that the operations in the single "transaction" are either all completed, or none of them are completed.
In a distributed computing system, the operations in a single transaction may modify files in different data bases, and the files may be shared by other processes. During the operation of the transaction, the files may be inconsistent for a time, although the files will be consistent upon completion of the transaction. A typical example is a transfer of funds from one account to another, in which a first account is debited, and at a slightly later time, another account is credited. During the interim, the two accounts are inconsistent because the sum of the two accounts does not represent the total funds in the two accounts. Due to inconsistency when files are being modified by a transaction, it is known to prevent other processes from accessing the files until the modification is finished. Recoverability can be assured in this example by performing commitment for both files at the same time and place. By changing a single flag, for example, the backup copies of each file can be replaced at the same time with the modified versions of the files. In many instances, however, it is desirable to distribute the operations in a transaction among multiple processors or processes in a computing system, and to commit the transaction by committing the operations in each process or processor while permitting some variability between the times of commitment. In these instances, an "atomic commitment protocol" is typically used to ensure recoverability. The protocol requires the exchange of information about the state of the transaction between the processors or processes. To identify the transaction being performed, the transaction is typically assigned a unique "transaction identification number."
A widely used atomic commitment protocol is known as the "two-phase commit protocol." In a somewhat elementary example of this protocol, one processor or process in the computing system is assigned the role of a coordinator which initiates a transaction. To begin a transaction, the coordinator sends a prepare command to all of the processors or processes participating in the transaction.
Upon receipt of the "prepare" command, each processor or process participating in the transaction performs a "START" operation by first placing "write locks" on memory accessed by the transaction, writes the transaction identification number into permanent memory to remember that it is prepared for the transaction, and then sends an acknowledgement back to the coordinator processor, but does not yet perform its part of the transaction. The coordinator waits for acknowledgements from all of the participants. When the coordinator receives acknowledgements from all of the participants, the coordinator records in permanent memory a list of the participants and a notation that the transaction is now being completed, and then the coordinator sends "commit" commands to all of the participants. The coordinator, however, may receive a message from a participant indicating that it cannot prepare for the transaction, or the coordinator may fail to receive acknowledgements from all of the participants after a predetermined time period, possibly after the coordinator has retransmitted the "prepare" command. In this case the coordinator transmits an "abort" command to all of the participants.
Upon receipt of the "commit" command, each participant checks its permanent memory for the transaction identification number to determine whether the participant has prepared for the transaction, and if it has, it performs its part of the transaction, and then performs a "COMMIT" operation to update the state of permanent memory and clear the transaction ID from permanent memory in one "atomic" step, and erase the write locks. Then the participant sends an acknowledgement back to the coordinator. When the coordinator receives acknowledgments from all of the participants, it erases the list of participants from permanent memory, and the transaction is finished.
In a many distributed computing systems, the processors or processes are permitted to perform multiple transactions simultaneously. In the usual case each processor or process performs transactions that are local to the processor or process, and also performs portions of global transactions. In a distributed data base system, for example, local data base queries and edits may occur locally, and some of the modifications may be made globally. A direct application of the two-phase commit protocol described above may perform satisfactorily in such a system, so long as the global transactions can be given a high priority with respect to the local transactions. But use of the read and write locks may unnecessarily restrict local transactions that could be processed concurrently.
Additional complexity is introduced when it is desired to process global transactions concurrently across multiple processors or processes in a distributed computing system. It is impractical to permit a processor or process to view a global picture of all the conflicts in all of the other processors or processes. Without a global picture, however, it is difficult for a processor or process to ensure that there is a correlation between its seriablility order and the seriability orders of the other processors or processes. Time-stamping of transaction requests and data updates is one method that has been used to address this problem of concurrency control. In general, concurrency control in a distributed computing system has been achieved at the expense of restricted autonomy of the local processors or processes, or by locking.
The problem of global deadlock also has to be addressed whenever global transactions are performed concurrently. One known solution is to provide a global transaction manager that decides whether or not to dispatch concurrent global transaction requests. An example is described Y. Breitbart et al., "Reliable Transaction Management in a Multidatabase System", Proc. of the ACM SIGMOD conf. on Management of Data, Atlantic City, N.J., June 1990, pp. 215-224. The global scheduler keeps track of global transaction requests for local locks on data items by using a global lock mechanism. Each global data item has a global lock associated with it. A global transaction that needs only to read a data item requests a global read-lock. Locks are conflicting if they are requested by two different transactions on the same data item and at least one of the requested locks is a write-lock. If two global transactions request conflicting global locks, the scheduler will prevent one of the transactions from proceeding because it knows that the two transactions will cause a conflict at the local site. The scheduler uses strict two-phase locking for allocating global locks to global transactions, and maintains a global "wait for graph." The "global wait for graph" is a directed graph G=(V,E) whose set of vertices V is a set of global transactions and an edge T.sub.i .fwdarw.T.sub.j belongs to E if and only if global transaction T.sub.i waits for a global lock allocated to global transaction T.sub.j. If a global transaction waits for a global lock, then the transaction state becomes "blocked" and the transaction is included in the "global wait for graph." The transaction becomes active again only after it can obtain global locks that it was waiting for. To avoid global deadlocks, the "global wait for graph" is always made acyclic. To ensure data consistency in the presence of failures, the scheduler also uses a "commit graph" and a "wait-for-commit graph" to determine when to schedule a commit operation. The commit graph CG=&lt;TS,E&gt; is an undirected bipartite graph whose set of nodes TS consists of a set of global transactions (transaction nodes) and a set of local sites (site nodes). Edges from E may connect only transaction nodes with site nodes. An edge (T.sub.i,S.sub.j) is in E if and only if transaction T.sub.i was executing at site S.sub.j, and the commit operation for T.sub.i has been scheduled for processing. After the commit operation for T.sub.i is completed, T.sub.i is removed from the commit graph along with all edges incidental to T.sub.i. Global database consistency is assured if the commit graph does not contain any loops. The wait-for-commit graph is a directed graph G=(V,E) whose set of vertices V consists of a set of global transactions. An edge T.sub.i .fwdarw.T.sub.j is in E if and only if T.sub.i has finished its execution, but its commit operation is still pending and T.sub.j is a transaction whose commit operation should be completed or aborted before the commit of T.sub.i can be scheduled. The scheduler uses the following algorithm for constructing the wait-for-commit graph, and in scheduling a commit operation of transaction T.sub.i :
1. For each site S.sub.k in which T.sub.i is executing, temporarily add the edge T.sub.i .fwdarw.S.sub.k to the commit graph. PA1 2. If the augmented commit graph does not contain a cycle, then the global commit operation is submitted for processing, and the temporary edges become permanent. PA1 3. If the augmented commit graph contains a cycle then:
a) The edges T.sub.i .fwdarw.T.sub.i1, . . . , T.sub.i .fwdarw.T.sub.im are inserted into the wait-for-commit graph. The set {T.sub.i1, T.sub.i2, . . . ,T.sub.im } consists of all the transactions which appear in the cycle which was created as a result of adding the new edges to the commit graph. PA2 b) Remove the temporary edges from the commit graph.
The transaction T.sub.i, however, need not necessarily wait for the completion of every transaction T.sub.ik such that T.sub.i .fwdarw.T.sub.ik. It may be ready to be scheduled for a commit operation after some of transactions T.sub.ik such that T.sub.i .fwdarw.T.sub.il (0&lt;1&lt;r) successfully commit (and in some cases, a successful commit of only one such transaction would be sufficient to schedule the transaction's commit ).