In distributed data processing systems, two-phase commit protocols are used to coordinate transactions. Distributed transactions are widely used today to allow a computer system interact with several resources, and apply changes to resources while ensuring data consistency. A two-phase commit protocol requires that all resources in a distributed system agree to commit a transaction, before it is committed. This ensures that either all resources commit the transaction or all resources abort the transaction. As a result, the changes made either all succeed (transaction is committed) or all fail (transaction is rolled back). When the transaction is committed, all the changes are made permanent. When the transaction is rolled back, all the changes already made are undone.
A distributed transaction involves three parties: an application program, a transaction manager and one or more resource managers. The application program uses a set of resource managers to execute its function. Resource managers are responsible for managing transactional resources. The application program also uses the transaction manager interface to define the transaction boundaries. In a prepare phase, the transaction manager first attempts to prepare all the resource enlisted in the transaction. During this prepare phase, the transaction manager polls all resource managers to determine if they are ready to commit the resources. If all resource managers agreed to commit, then the transaction manager starts a commit phase to complete the transactions at all resources.
In case of failure during the commit phase of the two-phase commit protocol, the data becomes inconsistent for some time until the recovery happens. A number of existing solutions are known to ensure failure recovery for two-phase commit operations.
In the article entitled “Inferring a Serialization Order for Distributed Transactions”, IEEE Paper, ISBN: 0-7695-2570-9, a failure recovery solution is proposed that lies on an automatic execution of the commit order by data partition identifier through merging prepare log entries of all database partitions in one log, then sort these entries by partition identifier and perform the commit according to this sorting. In U.S. Pat. No. 6,363,401, data inconsistency is avoided by committing to the available resources and trying to commit to the unavailable resources later when they become available. Another solution to the above problem is described in U.S. Pat. No. 5,319,773. In U.S. Pat. No. 5,319,773 the failure recovery procedure comprises retrying to commit failed resources asynchronously while the application operates. In U.S. Pat. No. 5,319,774, there is disclosed a failure recovery solution consisting in performing cyclic attempts to commit the failed resources. These solutions all propose recovery procedures to limit data inconsistency in case of failure. However, these solutions do not provide a solution to data inconsistency until the recovery happens.
There is accordingly a need for a method and a system that efficiently control data inconsistency until data recovery, in case of failure during a two-phase commit procedure.