The invention disclosed herein relates generally to database recovery systems. More particularly, the present invention relates to a method and system for defining transactions based on a database log, keeping track externally to the log of transactions in progress at any one time, and recovering from system failures by at least undoing any actions performed on the database as part of transactions which were in progress at the time of system failure.
Database recovery is an important function of a database system. Data in a database can be lost or damaged due to various types of failures, including physical disasters (fires, floods, etc.), computer system crashes, software or human error, and physical failures in the media upon which the database is stored. To prepare for such failures, a backup copy of the database is usually stored on a secondary media, and the backup copy is periodically updated to match the database. Also, a log or journal is maintained which keeps track of changes made to the database, and the log is used in the event of loss or damage to the database to reproduce the changes made to the database since an earlier backup.
Another important feature of database systems is transaction protection. A transaction is generally a logical unit of work which comprises a set of actions forming a single logical operation. As a typical example, a transfer of money from one account to another account is a transaction which involves several discrete actions in a database--deleting the amount of money from a file representing the first account, adding the amount to another file representing the second account, and possibly updating other files such as files representing bank accounting records regarding the amounts of total funds in the bank's various types of accounts. If any one of these actions were performed without the others, the database would be inconsistent and would contain erroneous data. Therefore, the database system must have the capability to perform either all the actions or none of them, thus protecting the integrity of the transaction and the database.
A mechanism used to protect transactions is embedded commitment control. An application program or agent inserts special command lines in the database log which identify transactions and indicate the start and end of each transaction. For example, in certain database applications, such as applications based on Structured Query Language ("SQL"), a "START" statement or the like is inserted in the log at the beginning of a transaction and either a "COMMIT" or similar statement or "ABORT" (or "ROLLBACK") statement is inserted in the log at the end of the transaction, depending upon whether the transaction is to be committed (performed in the database) or aborted (not performed). The START, COMMIT and ABORT statements may be part of an SQL program embedded in an application program or may be inserted by the database manager system under certain conditions, for example, when a program completes successfully (COMMIT) or aborts prior to completion (ROLLBACK). Database management systems vary in the extent to which explicit transaction-related statements may be included in SQL programs or in which only external transaction control is available through external agents or transaction managers.
During database recovery, the recovery program scrolls through the log looking for records having the START and COMIT/ABORT statements. For each completed transaction having START and COMMIT statements, the program retrieves the records in-between associated with the transaction and performs a redo or roll forward by updating the database with the actions in the transaction if such actions have not already been performed in the database. For incomplete transactions having a START statement with no COMMIT, or for transactions ending with an ABORT, the actions in the transaction are undone or rolled back if they had already been performed in the database. The use of checkpoints in the log helps identify which actions have been performed in the database.
A simplified example of a portion of one type of existing database log having embedded commitment control is shown in FIG. 1. This example is used to illustrate the general operation of and problems associated with existing systems and does not necessarily represent an actual database log.
Among other possible data stored in the log, such as times and user login information, each record in the log stores a log sequence number ("LSN") which uniquely identifies each record, a Transaction ID which uniquely identifies each transaction, a Job ID which uniquely identifies the source of each job performed on the database (e.g., teller number, ATM number, etc.), and an Action to be performed, such as FETCH, UPDATE, INSERT, DELETE, etc. Some of these simplified fields may actually consist of a number of fields, e.g., the simplified Job ID field may actually consist of Job Name, User Name, and Job Number fields. Actions also include START, COMMIT, and ABORT statements which provide transaction protection as explained above and are used by the transaction management system. The START, COMMIT and ABORT statements may be inserted in the database and log by the application program or may be provided by the database management program. For Actions other than those related to transaction protection, additional information is stored including a filename of the file affected by the action and before and/or after images of the data representing the change to the data in the database.
If the log section shown in FIG. 1 is available after a system failure, the recovery program can identify that transaction 1 was completed because it has START and COMMIT statements (at LSN=001 and LSN=007), that transaction 2 was aborted, and that transaction 3 is incomplete because it has no COMMIT or ABORT statement. Thus, if all these actions were performed on the database before system failure, the recovery program allows the actions in transaction 1 to stand and rolls back the actions performed in transactions 2 and 3.
Another aspect of transaction protection is locking. Once a transaction has started, no other action should be performed on the files involved in the transaction until the transaction is completed or aborted, or else the database could become inconsistent. START, COMMIT, and ABORT statements can be used to determine when and which files are locked. For example, in the log section shown in FIG. 1, filename F1 would be locked until transaction 1 is committed or aborted.
A problem with existing transaction protection schemes is that an embedded agent or application program is required in the database managing program to dictate to the database the boundaries of a transaction and to insert the START and COMMIT/ABORT statements in the database log. However, if an application program does not contain such embedded logic, the database will not have fault tolerant transaction processing. For example, the simplified log section in FIG. 2 is similar to the log section in FIG. 1 except for the absence of a Transaction ID field and any statements defining the start or end of transactions. Alternatively, the Transaction ID may be present but filled with null or zero entries because of the absence of transaction control.
As a result, a recovery program could not by itself determine which actions constitute transactions and thus could not protect the transactions. For example, in FIG. 2 the recovery program would be unable to determine whether the log record identified by LSN=4 forms part of the same transaction as the record identified by LSN=3, forms part of a new transaction, or does not form part of a transaction at all.
Furthermore, application programs lacking embedded commitment control may also be unable to lock database files properly, resulting in the use and storage of inconsistent or incorrect data, as explained above.
Because the boundaries of transactions vary depending upon the application, the development of embedded commitment control requires particular knowledge of the type of work performed by the application and the nature of the business in which the application program is used. The development of commitment control can thus be expensive and labor intensive. In addition, a change in the boundaries of a transaction due, for example, to a change in the way work is performed, would require a change in the commitment control logic.
Another drawback of existing transaction protection schemes is that many of them require the recovery program to scroll through an entire log, which may be quite time consuming, in order to find the transactions and either redo the completed transactions or undo the incomplete or aborted transactions.
There is thus a need for a method for defining transactions in the absence of commitment control and for a method for protecting transactions during database recovery using logs generated by programs lacking embedded logic for setting transaction boundaries. The present invention provides these and other advantageous functions.