With log-based processing, work needs to be performed based on a description of the work in a set of records that are stored in a log. An example of log-based processing is system recovery processing. In log-based recovery, the log records represent a sequence of work items that are ordered operations on a set of objects. Specifically, the log records may be redo records that represent changes made to data items in a database prior to a system failure. Generally, recovering the system based on the log entails repeating the processing of the logged work items on the objects.
One context in which log-based processing may be performed is for recovery of a database system after a failure or inadvertent termination within the system. In the context of database recovery, the log is a redo log that records changes made during transactions on a set of objects. Some of the changes recorded in the redo log have been committed but not yet flushed to disk at the time of the failure. The set of objects are database objects, such as tables, rows, views, indexes, and the like. Thus, recovering the database system based on the redo log entails reapplying, to the database objects, changes reflected in the work items. Another context for log-based processing is recovery after media loss or persistent (disk) data corruption. This type of recovery typically involves restoring a backup of the data and then applying the log to replay all the changes since the time at which the backup was taken.
Use of redo logs for system recovery is described in U.S. Pat. No. 5,832,516 to Bamford et al., entitled “Caching data in recoverable objects”; U.S. Pat. No. 6,507,853 to Bamford et al., entitled “Recovering data from a failed cache using recovery logs of caches that updated the data”; U.S. Pat. No. 6,609,136 to Bamford et al., entitled “Recovering data from a failed cache using a surviving cache”; U.S. Pat. No. 6,507,853 to Bamford et al., entitled “Recovering data from a failed cache using recovery logs of caches that updated the data”; the contents of all of which are incorporated by reference in their entirety for all purposes as if fully set forth herein.
Log-based processing is not always in the context of system recovery. Rather, log-based processing may also be performed to repeat logged work on another system. For example, log-based processing may be performed to construct and maintain a standby database system. Approaches to constructing standby databases and processing redo records are described in U.S. patent application Ser. No. 10/308,851 filed on Dec. 2, 2002 by Subramaniam, entitled “Replicating DDL Changes Using Streams”; U.S. patent application Ser. No. 10/308,879 filed on Dec. 2, 2002 by Arora et al., entitled “In Memory Streaming With Disk Backup and Recovery of Messages Captured From a Database Redo Stream”; U.S. patent application Ser. No. 10/308,924 filed on Dec. 2, 2002 by Souder et al., entitled “Asynchronous Information Sharing System”; U.S. patent application Ser. No. 10/443,206 filed on May 21, 2003 by Jain et al., entitled “Buffered Message Queue Architecture for Database Management Systems”; U.S. patent application Ser. No. 10/449,873 filed on May 30, 2003 by Lu et al., entitled “Utilizing Rules in a Distributed Information Sharing System”; the contents of all of which are incorporated by this reference in their entirety for all purposes as if fully set forth herein.
Typical approaches to log-based processing fall into two main categories. The first category involves serial schemes. With serial schemes, a single recovery process reads through the sequence of work items in the log and performs the work on the objects, one work item at a time. In large-scale systems with abundant resources, such a scheme does not take advantage of the available resources and leads to under-utilization of the system resources. For example, when there are multiple CPUs in the system, the recovery process runs in only one of the CPUs and the other CPUs are not utilized. Furthermore, serial schemes are not able to effectively overlap the CPU and I/O components of recovery processing.
The second category of log-based processing involves parallel schemes. With parallel schemes, multiple processes work together in parallel to perform log-based recovery. However, such schemes typically allocate specific tasks to named processes, thus limiting the flexibility of the entire architecture. In particular, a single process acts as the Coordinator for the log processing session. The Coordinator is assigned the task of reading through the entire sequence of work items and assigning the work to be performed to other processes known as Worker processes. Because there are no ordering constraints with respect to work processing that need to be honored across any two different objects, the entire work represented in the log is partitioned by the Coordinator, based on the objects on which the work needs to be performed, prior to assigning partitions of work to the Worker processes.
In situations in which the number of objects is much larger than the number of Worker processes (typically the case in many systems), each Worker process can be assigned a subset of the objects to work on. The Coordinator process directs all the work corresponding to an object to the Worker process that handles the subset of objects in which this object belongs. The Worker process can then process work on its objects in the order in which it receives work items from the Coordinator, thus honoring a total-ordering constraint for work processing on any given object. However, even though the work processing is handled by a set of processes in parallel, there is significant under-utilization of system resources. For example, the Coordinator process often becomes the bottleneck as it struggles to identify and extract work from the log and to assign the work to a large number of relatively idle Worker processes. Furthermore, parallel schemes typically utilize specialized Worker processes that either perform only CPU-based operations or only IO operations.
The Coordinator process is responsible for synchronization tasks, including the need to periodically “checkpoint” the work being performed. During log-based processing, the processing of work items needs to be periodically checkpointed in order to minimize lost work upon resumption of processing after a failure of the original processing session. Processing is checkpointed by identifying and storing a common point, in the processing of the log, which all processes have reached. With such synchronization checkpoints, the Coordinator process identifies a common point in the set of work items for the various objects, and ensures that all Worker processes complete work up to that point. That is, all log processing is completed for all the work items up to that point in the set of work items, and no work is performed on any work items beyond that point in the set of work items.
Once all the processes reach the checkpoint, the Coordinator process takes appropriate action, such as saving the state of the objects, and resumes the processing of the work item via the Worker processes. This approach to handling points of synchronization leads to significant resource under-utilization because the Worker processes that are finished with their work ahead of other Worker processes, i.e., the processes that reach the juncture before the other processes, cannot continue processing more work items until every process has reached the point of synchronization.
One approach to using checkpoints in managing shared resources is described in U.S. Pat. No. 6,567,827 to Bamford et al., entitled “Using a checkpoint to manage data that is shared by a plurality of nodes”; the contents of which is incorporated by reference in its entirety for all purposes as if fully set forth herein.
Parallel schemes for log-based recovery are unable to fully utilize global system resources (particularly in configurations involving distributed clusters of CPUs and memory units) because critical-path coordination work remains centralized in a single Coordinator process and, consequently, in a single node of the distributed cluster.
Based on the foregoing, there is room for improvement in the performance characteristics of log-based processing.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.