1. Technical Field
This invention relates to main-memory transaction processing systems. More specifically, the present invention relates to a logging method and system for recovery of a main-memory database in a transaction processing system.
2. Description of Related Art
A transaction processing system must process transactions in such a manner that “consistency” and “durability” of data are maintained even in the event of a failure such as a system crash. Consistency of data is preserved when transactions are performed in an atomic and consistent manner so that the data initially in a consistent state is transformed into another consistent state. Durability of data is preserved when changes made to the data by “committed” transactions (transactions completed successfully) survive system failures. A database often refers to the data on which transactions are performed.
To achieve consistency and durability of a database, most transaction processing systems perform a process called logging, the process of recording updates in terms of log records in a log file. In case of a failure, these log records are used to undo the changes by incomplete transactions, thereby recovering the database into a consistent state before the failure. These log records are also used to redo the changes made by the committed transactions, thereby maintaining the durable database.
Recovering a consistent and durable database after a system crash using only the log records would require a huge volume of log data because all the log records that have been generated since the creation of the database must be saved. Therefore, the process called “checkpointing” is often used where the database is copied to a disk in a regular interval so that only the log records created since the last checkpointing need to be stored. In practice, each page in the database has a “dirty flag” indicating any modification by a transaction so that only the pages modified since the last checkpointing are copied to a disk.
FIG. 1 shows a conventional recovery architecture of a main-memory database management system (DBMS), where two backups (101 and 102) are maintained with a log (103). A single checkpointing process updates only one of the backups, and successive checkpointing processes alternate between them. To coordinate the alternation, each in-memory database page has two dirty flags. When a transaction modifies a page, it sets both flags indicating modification by the transaction. When a checkpointing process flushes the page to the first backup, the first flag is reset to indicate that no further checkpointing needs be done to the first backup. Similarly, when a successive checkpointing process flushes the page to the second backup, the second flag is reset.
FIG. 2 shows a conventional restart process using a backup of the database and the log in main-memory DBMSs. The restart process comprises the following four steps. First, the most recent backup is read into main memory (BR for “backup read”) (201). Second, the database is restored to the one as existed at the time the backup was made (BP for “backup play”) (202). Third, the log records are read into main memory (LR for “log read”) (203). Fourth, the database is restored to the one of the most recent consistent state using the log records (LP for “log play”) (204).
FIGS. 3a and 3b are flow charts of the conventional two-pass log-play process. To restore the database to the one of the most recent consistent state, the log records generated by all the committed transactions need to be played, but the log records generated by so-called “loser transactions” that were active at the time of system crash have to be skipped (A transaction is said to be a loser when there is a matching transaction start log record but no transaction end record). For this purpose, all the log records encountered scanning the log from the checkpointing start log record to the end of the log are played (307). Then, the changes by the log records of the loser transactions are rolled back (308).
To identify loser transactions, a loser transaction table (LTT) is maintained, which has two fields, TID and Last LSN. This table is initialized with the active transactions recorded in the checkpointing start log record (301). When encountering a transaction start record (302), a matching entry is created in the LTT (305). When encountering a transaction end (either commit or abort) record (303), the matching entry is removed from the LTT (306). Otherwise (304), the LSN of the current log record is recorded in the Last LSN field of the matching LTT entry (307). When reaching the end of the log, the transactions that have matching entries still in the LTT are losers. The most recent record of a loser transaction can be located using the Last LSN field of the matching LTT entry, and other records of the transaction can be located by chasing the Last LSN field of accessed log records backward.
When using the physical logging method of the conventional art, log records must be applied in the order of log-record creation during the LP process. That is, logs created earlier must be used first to redo some of the updates. The conventional logging method imposes the sequential ordering because the undo and redo operations are not commutative and associative. This sequential ordering requirement imposes a lot of constraints in the system design.
In a main-memory DBMS, for example, where the entire database is kept in main memory, disk access for logging acts as a bottleneck in the system performance. In order to reduce such a bottleneck, employment of multiple persistent log storage devices may be conceived to distribute the processing. The use of multiple persistent log storage devices, however, is not easily amenable to the conventional logging method because there is necessarily an overhead of merging log records in the order of creation during the step of LP.
Therefore, there is a need for an efficient logging system that may comport with massive parallel operations in distributed processing.