1. Field of the Invention
The present invention relates to maintaining recovery data for a computer system. More specifically, the present invention relates to systems, methods, and computer program product claims for identifying appropriate undo data from a forward pass through a log.
2. Background and Relevant Art
Computer systems and related technology affect many aspects of society. Indeed, the computers system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g. database management, scheduling and word processing) that prior to the advent of the computer system were typically performed manually. Many tasks performed at a computer system include the manipulation of files or other large objects within transaction stores such as databases and transactional message queing systems. For example, a user can transfer commands to a software component at the computer system (e.g., by using a keyboard or mouse) to cause the computer system to create, delete, or modify a file.
At times, performance of a single task can involve a number of operations. In some cases, these operations may be related, such that it is important that all the related operations be performed together. For example, when transferring funds between bank accounts, it may be desirable to perform a first operation, which reduces (i.e., debits) a value in a first file associated with a first bank account, together with a second operation, which increases (i.e., credits) a value in a second file associated with a second bank account.
Unfortunately, when performing a task involving a number of related operations, there is always some possibility that the task might be interrupted (e.g., a user may halt performance of the task or a computer system fault may occur). This can result in a computer system performing some of the related operations, while other related operations are not performed. When only a portion of related operations are performed, data affected by the related operations is often referred to as being in an “inconsistent state.”
In some environments, tasks involving a number of operations may be performed in a distributed system where modules connected to a common network interoperate and communicate between one another in a manner that may be transparent to a user. These modules may perform operations by communicating in the background to transfer user commands, program responses, and data between different computer systems. Due to increased complexity, including the possibility of multiple points of failure, the chance of a distributed system being placed in an inconsistent state can be substantially greater than a stand-alone computer system. Also, due to the background interoperability between modules, a user may be unaware that the distributed system is in an inconsistent state.
Identifying the cause of an inconsistent state often requires a level of technical expertise beyond that of the average user. Further, even if the cause of an inconsistent state is identified, it may require a significant amount of time to transition a computer system out of the inconsistent state (e.g., by entering user commands to reverse the effects of previously performed operations). To reduce the chance that a computer system will have to be transitioned out of an inconsistent state by a user, computer systems are often backed-up (e.g., to tape media), at regular intervals (e.g. once a day, once a week, etc.).
A back-up preserves the state of a computer system as of the time the back-up is performed. If after a successful back-up (e.g., backing-up a system that is known to be in a consistent state), a computer system subsequently transitions into an inconsistent state, the computer system can easily be returned to a consistent state by restoring data from back-up. However, depending on the back-up interval, a significant amount of data may be lost when restoring from back-up. For example, if a back-up is performed every day at 11:00 PM and a computer system transitions into an inconsistent state at 10:00 PM, twenty-three hours of data may be lost if the computer system is restored from the last back-up.
To reduce the loss of data when transitioning out of an inconsistent state, computer systems (both stand-alone and distributed systems) can utilize transactional systems, such as, for example, transactional file systems, transactional databases, or transactional message systems. A transactional system can treat a number of related operations as a single atomic unit (commonly referred to as a “transaction”). That is, either all the related operations are performed or none of the related operations are performed. To help achieve this atomicity, an entry for each related operation and data associated with each operation can be written to a log when the operation is successfully complete. Thus, a log can be utilized to maintain a record of all the data modified by operations that occur between back-up intervals.
When all the related operations associated with a transaction are successfully completed, an entry can be included in a log indicating the transaction was “committed.” If data modifications resulting from operations of a committed transaction are subsequently lost (e.g., due to a computer system failure), log entries for the operations can be processed to “redo” the data modifications (commonly referred to as “roll-forward”). On the other hand, when all the related operations associated with a transaction do not complete, an entry can be included in the log indicating the transaction was “aborted.” When a transaction is aborted, log entries associated with any operations that were performed can be processed to “undo” resulting data modifications (commonly referred to as a “roll-back”). Thus, a log helps ensure that data can be transitioned out of an inconsistent state with minimal loss of data.
To successfully redo and undo operations, a log must maintain sufficient redo and undo data. For some types of operations, the amount of redo data and undo data can be minimal. For example, redo data and undo data for an object creation operation might include only an object name. On the other hand, for other types of operations, the amount of redo and undo data can be quite large. For example, redo data and/or undo data for an object modification operation or object deletion might include significant portions of the contents of an object.
When a portion of an object is to be modified, the log must store both a pre-modified version of the portion of the object (undo data) and a post-modified version of the portion of the object (redo data). For example, when the first ten bytes of a file are to be modified, the pre-modified version of the first ten bytes may be copied to an undo log entry and the post-modified version of the first ten bytes may be copied to a redo log entry. Thus, for each modification to a file, contents of the film may be copied to a log at least twice.
In a computer system where objects are frequently modified, this can result in a large number of copies being performed. If errors occur during these copies, for example, do to a system fault, a check to determine if correct undo data was written to the log can be performed. Typically, to transition a computer system out of an inconsistent state, the most recent back-up is loaded onto the computer system and subsequently the log is rolled-forward to the desired recovery time. If it is determined that one or more log entries contain incorrect undo data, it may be difficult to transition a computer system out of an inconsistent state during a roll-forward recovery. Thus, a log entry that contains incorrect undo data may need to be corrected during a roll-forward recovery.
Some transactional systems are utilized in an environment where one computer system operates as primary and another computer system acts as a secondary (often referred to as a “hot-spare”). In this environment, the primary may receive user and/or application program commands causing a number of operations to be performed as part of a transaction. The primary also maintains a log with log entries for performed operations. At specified intervals or as a result of some event, log entries from the primary are transferred to the secondary. Thus, as the primary performs operations and log entries are added to the log, these logs entries are eventually transferred to the secondary.
The secondary rolls-forward through these log entries as they are received to continually approach the state of the primary. If at some time the primary suffers from a system fault, functionality can be shifted to the secondary. Shifting functionality can include causing transactions, whether they be new transactions or transactions that were open on the primary at the time of the system fault, to be received at the secondary.
However, due to inevitable delays in receiving log entries at the secondary, the state of the secondary can lag behind the state of the primary. One of these inevitable delays is transmission time. That is, transferring log entries across a network connection will consume some amount of time. Further, as the amount of data in a log entry increases, the transmission time can also increase. Thus, log entries that contain increased amounts of data, such as, for example, redo log entries containing redo data and undo log entries containing undo data, may take longer to transfer than other log entries. If the contents of objects are frequently modified at the primary, this increased transmission time may persist as each modification to the contents of an object causes both a redo entry and an undo entry to be transferred to the secondary.
Any delay in receiving log entries at the secondary can cause a corresponding delay in the secondary's ability to process outstanding transactions. If open transactions are shifted to the secondary at a time when the secondary is lagging the primary, the secondary may be unaware of how to process the transactions. The secondary may be required to postpone processing these open transactions until appropriate log entries associated with the open transactions are received. Delaying the processing of transactions is an inconveniences to users and may result in lost revenue to entities associated with the delayed transactions.
Therefore, systems, methods, and computer program products, for increasing the chances of a roll-forward recovery placing a computer system in an appropriate state would be advantageous. Systems, methods, and computer program products, for transferring undo data between computer systems in a manner that conserves bandwidth would be advantageous.