1. The Field of the Invention
The present invention relates to transaction processing. More specifically, the present invention relates to maintaining correct transaction results when transaction management configurations change.
2. Background and Relevant Art
Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g., word processing, scheduling, and database management) that prior to the advent of the computer system were performed manually. More recently, computer systems have been coupled to one another and to other electronic devices to form both wired and wireless computer networks over which the computer systems and other electronic devices can transfer electronic data. As a result, many tasks performed at a computer system (e.g., voice communication, accessing electronic mail, transaction processing, Web browsing, and printing documents) include the exchange of electronic messages between a number of computer systems and/or other electronic devices via wired and/or wireless computer networks.
A feature of most, if not all, transaction processing systems is what is commonly referred to as a two-phase commit protocol. A two-phase commit protocol enables a number of different components (as part of the same transaction) to do some processing and agree on an outcome. The two-phase commit protocol also enables different components to be returned to their pre-transaction state when some error conditions occur. Since, a single transaction can update many different components (e.g., databases), the two-phase commit protocol strategy is designed to ensure that either all the components are updated or none of them, so that the components remain consistent. That is, the two-phase commit protocol attempts to maintain the atomicity of transactions by executing transactions in two phases, a prepare phase and a commit phase.
In a prepare phase, a transaction coordinator (e.g., a transaction manager) identifies what resources are necessary to a transaction and what components (e.g., resource managers) should be contacted to access the necessary resources. The transaction coordinator can then attempt to contact the components by sending a request to prepare message (hereinafter referred to simply as a “prepare message”) requesting that the components commit to performing an operation on the necessary resource according to the transaction.
Components that are in a state (or that subsequently transition into a state) capable of performing operations requested in a prepare message, indicate this capability to the transaction coordinator by sending a prepare complete message (hereinafter referred to simply as a “prepare complete message”) to the transaction coordinator. A prepare complete message further indicates that a component will remain in a state capable of applying the requested operations even if the component subsequently fails. Along with sending a prepare complete message, a component can record a log entry in a log file. The log entry indicates the prepare complete message was sent and potentially includes some state information that allows the component to return to the prepare complete state during failure recovery. When all the contacted components (i.e., each component that was sent a prepare message) respond with prepare complete messages, the transaction coordinator can then proceed to a commit phase.
However, if any component does not respond or responds that it is not capable of performing operations according to the transaction, the transaction coordinator may abort the transaction. Alternately, the transaction coordinator can attempt to contact another component with access to the necessary resource (by sending a prepare message to the component) to request performance of the operations that would otherwise have been performed by the non-responsive or negatively responding component.
In a commit phase (after a prepare phase is successful), the transaction coordinator sends a commit (or abort) transactional message (hereinafter referred to as a “commit (or abort message”) to all components participating in the transaction (i.e., any component from which the transaction coordinator received a prepare commit message). Reception of a commit message causes a component to perform any operations that were indicated as being prepared in a previous corresponding prepare complete message. A component can also write appropriate log entries for any performed operations to a log, including a commit entry. After a component successfully performs the indicated operations, the component sends a commit complete transactional message (hereinafter referred to as a “commit complete message”) to the transaction coordinator. After receiving commit complete messages from all contacted components (each component that was sent a commit message), the transaction coordinator can advance its beginning of log past the commit record.
It may be that a component fails after sending a prepare complete message (and thus does not immediately receive a corresponding commit (or abort) message. Thus, when the component is restated, the component can replay log entries to re-create the state before the failure occurred. Accordingly, a two-phase commit protocol used along with logs can facilitate recovery from some errors that occur during a transaction.
In some environments, recovery cookies are utilized to attempt to address component failures after prepare complete messages have been sent. In these environments, a prepare message includes (or is sent along with) a recovery cookie. The recovery cookie includes information necessary for transaction recovery (e.g., the identity of the transaction coordinator) in the event that a component fails. Each component included in a transaction is sent a corresponding recovery cookie. Components store recovery cookies (e.g., in log entries) for presentation back to the transaction coordinator when it is necessary to recover a transaction (e.g., after a component failure).
To recover a transaction (e.g., when a component restarts), a component typically issues reenlist calls specifying the recovery cookie it received. Reenlist calls are essentially calls to the transaction coordinator (identified from the recovery cookie) requesting state for transactions that were open when the component failed. For example, a reenlist call can request an indication of whether or not a transaction for which the component sent a prepare complete message was actually committed. After a component issues all reenlist calls it intends to issue, it can issue a recovery-complete call to the transaction coordinator. In response, the transaction coordinator can complete its internal cleanup of the state it has for the component but that the component is unaware of.
Unfortunately, even when using recovery cookies, not all errors that occur during a transaction can be sufficiently addressed. In some environments, it is possible for transaction management configurations to change over time and for a transaction coordinator to “forget” state related to a transaction. For example, an old log can be loaded at a transaction coordinator (e.g., before a component can initiate recovery) resulting in an ancestor instance of the transaction coordinator. If a recovering component issues a reenlist call to the ancestor transaction coordinator, the ancestor transaction coordinator may have no memory of the identified transaction. For example, the ancestor transaction manager may not recognize a recovery cookie issued by the transaction coordinator before the old log was loaded. Thus, the ancestor transaction manager may respond incorrectly (e.g., indicating an abort when a commit is in fact the correct response). Alternately, the ancestor transaction coordinator may not provide the component any answer and the component may remain “in doubt”. Manual intervention, for example, from an administrator, may be required to transition the component out of the “in doubt” state.
At least one mechanism has been developed for automatically identifying that a current instance of a transaction manager is different than the instance of the transaction manager that sent a prepare message. These mechanisms can be used to transition a component out of an “in doubt” state. One particular mechanism, commonly referred to as amnesia detection, utilizes a set of algorithms during component recovery to detect transaction coordinator instances. For example, the set of algorithms can detect a different instance of transaction coordinator (e.g., resulting from loading an older log file) and/or can detect a different transaction coordinator (e.g., when a component connects to a different transaction manager).
When a different instance or transaction coordinator is detected during recovery, the set of algorithms can cause the component to become operational without harming the component. That is, the set of algorithms cause the component to become operational, while leaving the component “in doubt” with respect to transactions initiated by other instances or other transaction coordinators. Thus, the component can recover in a manner that does not cause data corruption or other harm to the component (e.g., that can result from aborting a transaction that was actually committed or committing a transaction that was actually aborted).
Unfortunately, recovery mechanisms that utilize amnesia detection do not insure that a component recovers into a state that would have resulted had there been no failure. That is, during recovery amnesia detection does not identify the correct answer for transactions. For example, amnesia detection does not insure that a component that was sent a commit (or abort) message from a different instance of a transaction coordinator will implement the commit (or abort) during recovery. Thus, while amnesia detection can mitigate component harm during recovery, manual intervention may be required to remove or correct “in doubt” transactions that remain after recovery.
To increase performance, some environments distribute transaction coordinator functionality across a number of specialized transaction coordinators. Each specialized transaction coordinator can be assigned specific tasks. Thus, when implementing a transaction, a component may communicate with a number of different specialized transaction coordinators. Further, a component may or may not be aware of the specialized transaction coordinators it is communicating with to coordinate the transaction.
Additionally, the configuration of transaction managers can change. For example, transaction managers may fail and/or (re)start (potentially at different times) and the specified tasks assigned to a specialized transaction coordinator may change. It may also be that, from time to time, a database is moved between different systems. Thus, recovery cookies and amnesia detection algorithms, which can recover from or mitigate the effects of a failure between a component and a single transaction manager, are typically not sufficient for recovering from or mitigating a failure in a multiple transaction coordinator environment.
In at least one mechanism, transaction coordinators drive the recovery process. A transaction coordinator monitors a component and, during recovery, the transaction coordinator can query the component for transactions that are “in doubt.” The components responds to the request by sending a list of “in doubt” transactions to the transaction coordinator. The transaction coordinator then parses the list to identify any transactions that are related the transaction coordinator. This is advantageous, since the transaction manager can control when recovery occurs.
However, during component recovery, and especially when coordinator functionality is distributed across a number of specialized transaction coordinators, the recovering component can connect to a new transaction manager that is different from a prior transaction coordinator the component was connected to (e.g., before a failure occurred). In any event, the component may have transactions from the prior transaction coordinator that are “in doubt.” However, since the component is no longer using the prior transaction coordinator, the prior transaction coordinator will never query the component for a list of “in doubt” transactions. Further, since the new transaction manager is unaware of transactions related to the prior transaction manager, the new transaction manager will ignore any “in doubt” transactions related to the prior transaction coordinator. Thus, in environments where transaction management configurations change, a component can be prevented from obtaining a correct result for transactions related to a prior transaction coordinator.
Therefore systems, methods, and computer program products for maintaining correct transaction results when transaction management configurations change would be advantageous.