A software system is a set of software instructions that cause a computer system to perform a group of related tasks. Sometimes tasks include one or more atomic transactions. An atomic transaction is a set of operations that change the state of data or equipment controlled by the software system in such a way that either the transaction completes successfully to produce a final changed state, or, if the transaction does not complete successfully, then the data or equipment is returned to its original state before the transaction was started. If the power fails or a component dies during the transaction, the transaction is not completed, and interim changes associated with the transaction have to be undone, also said to be “rolled back.” To provide this “undo” or “rollback” functionality, data indicating one or more reverse operations for each operation involved in the atomic transaction are accumulated in a special, “undo” data structure.
As computer systems are employed to tackle ever-larger tasks, the number of simple operations involved in an atomic transaction and the time required to perform the atomic transaction can become substantial and economically significant. For example, an atomic transaction to update the software controlling a facility involving thousands of components each having thousands of attributes or functions, can involve tens of millions of operations and take hours of time. As another example, an atomic transaction to add sales data from thousands of outlets of a major retailer to a central database at the end of each business week can involve tens of millions of database table update operations that take hours of time to complete.
If such a large transaction does not complete successfully, hours of time are lost. Not only are the hours of operations lost up to the point of failure, but also a commensurate number of additional hours are lost as the interim changes are undone. For example, the weekly update for the retailer's central database may take four hours, and the transaction may fail after three and a half hours because the power fails. Then the problem has to be resolved, e.g., the power restored; and the interim changes are undone for the next 3.5 hours; and then the transaction is restarted. Even if the power is restored in a few minutes and the transaction completes successfully the second time, the example weekly update is not completed until more than eleven hours after the start. An extra seven hours are consumed due to a power failure of a few minutes.
Sometimes the errors that cause the transaction to fail before completion involve less dramatic problems, such as temporary resource shortages, that often are not even apparent to a user of the system. For example, in some database software systems, a database administrator must configure the software system to reserve total storage space on the computer system for the data indicating the undo operations, and to reserve storage space individually for segments of storage space for each concurrent transaction. If the database administrator has underestimated the number of transactions attempted concurrently, or the number of operations involved in each transaction, or both, then the reserved storage space might become insufficient to accommodate all the concurrent transactions through to completion. A “disk full” error is encountered. One or more of the transactions may fail before completion, even though adequate undo storage space becomes available after a few minutes, when one of the other transactions completes. The failed transactions are then undone, and subsequently restarted, such as when more undo storage space becomes available. Hours of time are consumed to perform the undo operations and then repeat the transaction operations already performed, even if the additional undo storage space becomes available in a few minutes.
Based on the foregoing, techniques are clearly needed for providing atomic transactions that can be resumed after resolving an error without undoing all the operations accomplished at the time the error occurred.