Sync Point managers (SPMs) provide an operating system service that makes it possible for distributed transaction programs (TPs) to perform atomic transactions, i.e., logical units of work (LUWs) consisting of data updates that are either all committed (made permanent) or all backed out (undone). An atomic transaction is identified by a logical unit of work identifier, called an LUWID in this description. All accesses to data involved in a transaction are associated with its LUWID. Thus a database manager, when asked by an SPM to commit a transaction, uses the LUWID to identify the data updates that are to be made permanent. Some data base managers will allow two programs participating in the same transaction, i.e., using the same LUWID, simultaneous access to data resources, reducing the probability that deadlocks between programs will occur.
As a result of these uses of the LUWID, it is vital that all programs participating in the same transaction use the same LUWID, and that two programs not participating in the same transaction, and will thus not commit together, have different LUWIDs. This includes programs that were once working together on the same transaction, but are no longer connected because the communications connection is taken down normally or because of failures.
In unchained transaction systems, each transaction (LUW) is explicitly started and terminated by a TP and a new LUWID for each transaction is sent by its associated SPM to the SPMs of other TPs involved in the transaction. Unchained transactions are used when not all work of the programs need to be treated as part of an atomic transaction. Thus the work that occurs between the end of one transaction and the explicit start of the next cannot be backed out or committed in a coordinated fashion. Also, with unchained transactions, there tends to be a fixed hierarchical relationship between programs, since the protocols for distributing the LUWID require that the LUWID always flow down the tree of TPs, that is, the tree where each branch is created when a program creates a connection to another program.
Chained transaction systems are those in which the end of a transaction at any TP inherently marks the beginning of the next transaction at that TP. Thus, there is no need for any TP to explicitly start the next. With chained transactions, all work performed by any program in a tree is part of an atomic transaction. Since this is the case, the LUWID for a new transaction is generated implicitly by incrementing the LUWID for the last transaction without any explicit actions by TPs or any messages sent between SPMs. Since this is the case, there is no required fixed hierarchical relationship between programs involved in the transaction.
In distributed transaction systems, partner TPs can exist in different nodes of the system connected by communications links or within one node of the system. These TPs form a tree of programs whose data updates will either commit or backout together. A problem arises in chained systems when a transaction tree is broken apart. Subtrees can become separated because of failures in the underlying communications connections, or because an error is detected that causes one TP or operating system (OS) to abnormally terminate the connection. Subtrees can also become separated because a TP chooses to terminate its connection normally to another TP. In chained transactions, special action is required to prevent TPs in severed subtrees that can no longer commit together from proceeding with the same LUWID, since both sides normally generate their next LUWID by incrementing the LUWID that they shared for the previous transaction. If these severed TPs were not prevented from proceeding with the same LUWID, data damage could occur; that is, data could be backed out that should be committed, and vice versa.
The conventional method of solving this problem is by dismantling all nodes of both severed subtrees after the tree break occurs. A new tree can then be rebuilt. While this works to prevent data damage, it is very expensive in terms of the amount of work that must be performed and the resulting system time lost in recreating the connections. Therefore a need exists to ensure that subtrees separated for any reason will proceed with different unit-of-work identifiers without completely dismantling the tree.