1. Field of the Invention
The present invention relates generally to distributed computing, and more particularly to a transaction processing system in which component operations in related transactions are distributed so that at least one operation in a second transaction is performed before a first transaction having a conflicting operation is committed. The present invention specifically concerns a method and apparatus for scheduling the commitment of conflicting global transactions in a distributed transaction processing system without restricting the commit order of local transactions.
2. Description of the Background Art
A desirable feature of a computing system is the ability to recover from partial system failures that interrupt memory write operations. If an application program has a memory write operation in progress at the time of the system failure, it is most likely that the memory record will become erroneous. To enable the recovery of memory records after a partial system failure, it is necessary for the application program to keep backup copies of the records in nonvolatile memory. When the computing system is restarted, the memory records to be recovered are replaced with the backup copies.
To facilitate the making of backup copies and the recovery of memory records, the operating system typically provides an established set of memory management procedures that can be invoked or called from an application program to define a "recovery unit." The recovery unit consists of program statements between a "START" statement and a "COMMIT" statement. All of the statements in the "recovery unit" must be completed before the memory records modified by the statements in the recovery unit are made available for subsequent processing. The "START" statement corresponds to initiating the making of a backup copy in nonvolatile memory, and the "COMMIT" statement corresponds to switching of the backup copy with a modified version. The statements in the "recovery unit" specify operations in a single "transaction." Upon recovering from a partial system error, inspection of the nonvolatile memory will reveal that the operations in the single "transaction" are either all completed, or none of them are completed.
In a distributed computing system, the operations in a single transaction may modify files in different data bases, and the files may be shared by other processes. During the operation of the transaction, the files may be inconsistent for a time, although the files will be consistent upon completion of the transaction. A typical example is a transfer of funds from one account to another, in which a first account is debited, and at a slightly later time, another account is credited. During the interim, the two accounts are inconsistent because the sum of the two accounts does not represent the total funds in the two accounts. Due to inconsistency when files are being modified by a transaction, it is known to prevent other processes from accessing the files until the modification is finished. Atomicity can be assured in this example by performing commitment for both files at the same time and place. By changing a single flag, for example, the backup copies of each file can be replaced at the same time with the modified versions of the files. In many instances, however, it is desirable to distribute the operations in a transaction among multiple processors or processes in a computing system, and to commit the transaction by committing the operations in each process or processor while permitting some variability between the times of commitment. In these instances, an "atomic commitment protocol" is typically used to ensure atomicity. The protocol requires the exchange of information about the state of the transaction between the processors or processes. To identify the transaction being performed, the transaction is typically assigned a unique "transaction identification number."
A widely used atomic commitment protocol is known as the "two-phase commit protocol." In a somewhat elementary example of this protocol, one processor or process in the computing system is assigned the role of a coordinator which initiates the commit process of a transaction. For this purpose, the coordinator sends a prepare command to all of the processors or processes participating in the transaction.
Upon receipt of the "prepare" command, each processor or process participating in the transaction checks whether the operation can be completed successfully, writes an indication of the decision to acknowledge successful completion together with the transaction identification number into permanent memory to remember that it is prepared for the transaction, and then sends an acknowledgement back to the coordinator processor, but does not yet commit its results for the transaction. The coordinator waits for acknowledgements from all of the participants. When the coordinator receives acknowledgements from all of the participants, the coordinator records in permanent memory a list of the participants and a notation that the transaction is now being completed, and then the coordinator sends "commit" commands to all of the participants. The coordinator, however, may receive a message from a participant indicating that it cannot prepare for the transaction, or the coordinator may fail to receive acknowledgements from all of the participants after a predetermined time period, possibly after the coordinator has retransmitted the "prepare" command. In this case the coordinator transmits an "abort" command to all of the participants.
Upon receipt of the "commit" command, each participant checks its permanent memory for the transaction identification number to determine whether the participant has prepared for the transaction, and, if it has, it then performs a "COMMIT" operation to write its results into permanent memory and clear the transaction ID from permanent memory in one "atomic" step. Then the participant sends an acknowledgement back to the coordinator. When the coordinator receives acknowledgments from all of the participants, it erases the list of participants from permanent memory, and the transaction is finished.
Additional complexity is introduced when it is desired to process global transactions concurrently across multiple processors or processes in a distributed computing system. It is well known that global serializability is not guaranteed merely by ensuring that each processor or process achieves local serializability, because local transactions may introduce indirect conflicts between distributed global transactions. It is impractical to permit a processor or process to view a global picture of all the conflicts in all of the other processors or processes. Without a global picture, however, it is difficult for a processor or process to ensure that there is a correlation between its serialability order and the serialability orders of the other processors or processes. Time-stamping of transaction requests and data updates is one method that has been used to address this problem of concurrency control. In general, concurrency control in a distributed computing system has been achieved at the expense of restricted autonomy of the local processors or processes, or by locking.
The problem of global deadlock also has to be addressed whenever global transactions are performed concurrently. One known solution is to provide a global transaction scheduler that decides whether or not to dispatch concurrent global transaction requests. An example is described Y. Breitbart et al., "Reliable Transaction Management in a Multidatabase System", Proc. of the ACM SIGMOD conf. on Management of Data, Atlantic City, N.J., June 1990, pp. 215-224. The global scheduler keeps track of global transaction requests for local locks on data items by using a global lock mechanism. Each global data item has a global lock associated with it. A global transaction that needs only to read a data item requests a global read-lock. Locks are conflicting if they are requested by two different transactions on the same data item and at least one of the requested locks is a write-lock. If two global transactions request conflicting global locks, the scheduler will prevent one of the transactions from proceeding because it knows that the two transactions will cause a conflict at the local site. The scheduler uses strict two-phase locking for allocating global locks to global transactions, and maintains a global "wait for graph." The "global wait for graph" is a directed graph G=(V,E) whose set of vertices V is a set of global transactions and an edge T.sub.i .fwdarw.T.sub.j belongs to E if and only if global transaction T.sub.i waits for a global lock allocated to global transaction T.sub.j. If a global transaction waits for a global lock, then the transaction state becomes "blocked" and the transaction is included in the "global wait for graph." The transaction becomes active again only after it can obtain global locks that it was waiting for. To avoid global deadlocks, the "global wait for graph" is always made acyclic. To ensure data consistency in the presence of failures, the scheduler also uses a "commit graph" and a "wait-for-commit graph" to determine when to schedule a commit operation. The commit graph CG=&lt;TS,E&gt; is an undirected bipartite graph whose set of nodes TS consists of a set of global transactions (transaction nodes) and a set of local sites (site nodes). Edges from E may connect only transaction nodes with site nodes. An edge (T.sub.i,S.sub.j) is in E if and only if transaction T.sub.i was executing at site S.sub.j, and the commit operation for T.sub.i has been scheduled for processing. After the commit operation for T.sub.i is completed, T.sub.i is removed from the commit graph along with all edges incidental to T.sub.i. Global database consistency is assured if the commit graph does not contain any loops. The wait-for-commit graph is a directed graph G=(V,E) whose set of vertices V consists of a set of global transactions. An edge T.sub.i .fwdarw.T.sub.j is in E if and only if T.sub.i has finished its execution, but its commit operation is still pending and T.sub.j is a transaction whose commit operation should be completed or aborted before the commit of T.sub.i can be scheduled. The scheduler uses the following algorithm for constructing the wait-for-commit graph, and in scheduling a commit operation of transaction T.sub.i :
1. For each site S.sub.k in which T.sub.i is executing, temporarily add the edge T.sub.i .fwdarw.S.sub.k to the commit graph. PA1 2. If the augmented commit graph does not contain a cycle, then the global commit operation is submitted for processing, and the temporary edges become permanent. PA1 3. If the augmented commit graph contains a cycle then:
a) The edges T.sub.i .fwdarw.T.sub.i1, . . . , T.sub.i .fwdarw.T.sub.im are inserted into the wait-for-commit graph. The set {T.sub.i1, T.sub.i2, . . . , T.sub.im } consists of all the transactions which appear in the cycle which was created as a result of adding the new edges to the commit graph. PA2 b) Remove the temporary edges from the commit graph.
The transaction T.sub.i, however, need not necessarily wait for the completion of every transaction T.sub.ik such that T.sub.i .fwdarw.T.sub.ik. It may be ready to be scheduled for a commit operation after some of transactions T.sub.ik such that T.sub.i .fwdarw.T.sub.il (0&lt;l&lt;r) successfully commit (and in some cases, a successful commit of only one such transaction would be sufficient to schedule the transaction's commit ).
Global serializability can be guaranteed in a distributed transaction processing system by enforcing a "commitment ordering" for all transactions. In Yoav Raz, U.S. patent application Ser. No. 07/703,394, filed May 21, 1991, and entitled "Commitment Ordering For Guaranteeing Serializability Across Distributed Transactions," it was shown that if global atomicity of transactions is achieved via an atomic commitment protocol, then a "commitment ordering" property of transaction histories is a sufficient condition for global serializability. The "commitment ordering" property occurs when the order of commitment is the same as the order of performance of conflicting component operations of transactions. Moreover, it was shown that if all of the local processes were "autonomous," i.e., they do not share any concurrency control information beyond atomic commitment messages, then "commitment ordering" is also a necessary condition for global serializability.
In some applications, it is desirable for local serializability to be guaranteed by pre-existing mechanisms in the processors or processes in a distributed transaction processing system. In this case, it is desirable to provide a mechanism which does not violate the autonomy of the local processors or processes and guarantees global serializability if the local processors or processes assure local serialability. The solution to this problem is described in Georgakopoulos et al., "On Serializability of Multidatabase Transactions through Forced Local Conflicts," Proceedings of the Seventh Int. Conf. on Data Engineering, Kobe, Japan, April 1991.
Georgakopoulos et al. first classify known methods of concurrency control in distributed transaction processing systems into several groups, including observing the execution of the global transactions at each local processor or process, controlling the submission and execution order of the global transactions, limiting the membership in the system to processors or processes which use strict schedulers, assuming the possibility of conflicts among global transactions whenever they execute at the same processor or process, modifying the local processors or processes, and rejecting serializability as the correctness criterion. Georgakopoulos then describe an "optimistic ticket method" (OTM) which is said not to violate local autonomy and guarantees global serializability if the participating local processors or processes assure local serializability. OTM is said to use "tickets" to determine the relative serialization order of the subtransactions of global transactions at each local processor or process (i.e., an LDBS). A ticket is a (logical) timestamp whose value is stored as a regular data item in each LDBS. Each subtransaction of a global transaction is required to issue a "Take-A-Ticket" operation which consists of reading the value of the ticket and incrementing it through regular data manipulation operations. The value of a ticket and all operations on tickets issued at each LDBS are subject to the local concurrency control and other database constraints. Only the subtransactions of global transactions have to take tickets; local transactions are not affected. To maintain global consistency, OTM must ensure that the subtransactions of each global transaction have the same relative serialization order in their corresponding LDBSs. Since the relative serialization order of the subtransactions at each LDBS is reflected in the value of their tickets, the basic idea in OTM is to allow the subtransactions of each global transaction to proceed but commit them only if their ticket values have the same relative order in all participating LDBSs. This requires that the LDBS support a visible "prepared to commit state" for all subtransactions of global transactions. The prepared to commit state is "visible" if the application program can decide whether the transaction should commit or abort.
It is said that OTM processes a multidatabase transaction G as follows. Initially, it sets a timeout for G and submits its subtransactions to their corresponding LDBSs. All subtransactions are allowed to interleave under the control of the LDBSs until they enter their prepared to commit state. If they all enter their prepared to commit states, they wait for the OTM to validate G. The validation can be performed using a Global Serialization Graph (GSG) test. The nodes in GSG correspond to "recently" committed global transactions. In its simplest form, the set of recently committed global transactions in OTM does not contain transactions committed before the oldest of the currently active global transactions started its execution. For any pair of recently committed global transactions G.sub.i.sup.c and G.sub.j.sup.c, GSG contains a directed edge G.sub.i.sup.c .fwdarw.G.sub.j.sup.c if at least one subtransaction of G.sub.i.sup.c was serialized before (obtained a smaller ticket than) the subtransaction of G.sub.j.sup.c in the same LDBS. Similarly, if the subtransaction of G.sub.j.sup.c in some LDBS was serialized before the subtransaction of G.sub.i.sup.c a directed edge G.sub.i.sup.c .rarw.G.sub.j.sup.c connects their nodes in GSG.
Initially, GSG contains no cycles. During the validation of G, OTM first creates a node for G in GSG. Then, it attempts to insert edges between G's node and nodes corresponding to every recently committed multidatabase transaction G.sup.c. More specifically, if the ticket obtained by a subtransaction of G at some LDBS is smaller (larger) than the ticket of the subtransaction of G.sup.c there, an edge G.fwdarw.G.sup.c (G.rarw.G.sup.c) is added to GSG. If all such edges can be added without creating a cycle in GSG, G is validated. Otherwise, G does not pass validation, its node, together with all incident edges, is removed from the graph and G is restarted.
G is also restarted if at least one LDBS forces a subtransaction of G to abort for local concurrency control reasons (e.g., local deadlock), or its timeout expires (e.g., global deadlock). Alternatively, OTM may set new timeout and restart only the subtransactions that did not report prepared to commit in time. If more than one of the participating LDBSs uses a blocking mechanism for concurrency control, the timeouts above are necessary to resolve global deadlocks. An alternative approach is to maintain a wait-for graph (WFG) having LDBS as nodes. Then, if a cycle is found in the WFG and the cycle involves LDBS that use a blocking technique to synchronize conflicting transactions, a deadlock is possible. Dealing with deadlocks in MDBSs is said to constitute a problem for further research.
Georgakopoulos et al. disclose a refinement for "rigorous" LDBs called "implicit tickets." A "rigorous" scheduler guarantees "strictness" and also does not allow transactions to write a data item until the transactions that previously read it either commit or abort. Under a "strict" scheduler, no transaction can read or write a data item until all transactions that previously wrote it commit or abort. It is said that rigorous schedulers guarantee that for any pair of transactions T.sub.i and T.sub.j, such that T.sub.i is committed before T.sub.j, T.sub.i also precedes T.sub.j in the serialization order corresponding to the execution schedule. The "implicit ticket method" (ITM) is said to take advantage of the fact that if all LDBs produce rigorous schedules, then ticket conflicts can be eliminated. To guarantee global serializability in the presence of local transactions, ITM requires the following conditions to be satisfied: 1) all local database systems use rigorous transaction management mechanisms; 2) each multidatabase operation has at most one subtransaction at each LDBs; and 3) each subtransaction has a visible prepare to commit state.
Accordingly, workers skilled in the art have been working for a considerable period of time to solve the problem of guaranteeing global serializability without significantly limiting the autonomy of existing local processors or processes, and without limiting concurrency or imposing unnecessary overhead.