1. Field of the Invention
The present invention relates to repetitive data protection for data stored in a block oriented data object comprising several indexed segments. This technology allows restoration of the data contents of block oriented data objects, as it was, before given timestamps, by using so-called undo-log information.
2. Description of the Related Art
Continuous Data Protection (“CDP”) is an emerging backup and recovery technology for block oriented data objects comprising several indexed segments. As this technology has been developed for protecting large amounts of coherent data, prime candidates for applying CDP are database applications. By means of the CDP technology both backup and recovery times can be reduced to seconds, wherein the density of recovery points is high.
According to CDP, every modification of data stored in the segments of a data object is recorded by copying and writing the old data contents together with the corresponding segment index and the time of modification to an undo-log journal before writing new data to a segment. Typically, undo-log journals are not located on the same volume as the data object to be protected.
If at some point in time, corrupted data has been written to the data object, the undo-log information can be used to recover this failure. Therefore, a point in time previous to the write of corrupted data is chosen. Then, all modifications recorded in the undo-log journal from this point in time up to the current time are extracted from the undo-log journal and are written back to the corresponding segments of the data object. Via this operation any modification that has happened after the chosen point of time is in effect undone, so that afterwards the data contents of the data object is identical to its data contents at the previous time. The mechanism, how previous points in time are restored, depends on the concrete implementation of the CDP solution. Today, many CDP solutions keep their data repository on disk and avoid sequential storage media, such as tapes.
As described above, the undo-log information generated by CDP allows restoration of the data contents of a data object for any arbitrary previous point in time. Correspondingly the amount of undo-log data to be stored is high. As the amount of data that can be stored on a storage medium is limited, a reduction the number of possible recovery points has been proposed. Instead of creating a continuous undo-log journal, i.e. an undo-log journal containing every single data modification, an undo-log journal is created such that only certain points in time can be recovered, as e.g. hourly or event triggered recovery points. In the context of the present invention this approach is called repetitive data protection with coarse graining. Therefore, only the first data modification of a segment after a defined recovery point has to be recorded.
By means of CDP and repetitive data protection it is possible to optimize the time needed to restore corrupted application data by undoing data modifications instead of overwriting data with an image representing a previous point in time. If, at restore time, the amount of application data is large compared to the number of modifications that happened after corruption of data, this technology provides significantly faster recovery times as long as the point in time to be restored resides in the “near past”.
Due to the fact that the size of an undo-log journal is growing over time, there is a point in time beyond which traditional restore technologies become faster than CDP or repetitive data protection. This critical point of time can be pushed further into the past by storing the undo-log journal on a random access storage medium. Then, all data modifications irrelevant for data restore to a given restore time can easily be skipped. However, this is not possible if the undo-log journal is located on a sequential storage medium.
Another problem with recovery mechanisms based on undo-log information arises from the fact that these technologies are sensitive to any data corruption that happens to the undo-log journal. A single corrupted undo-log block invalidates all points in time that could be restored using the corresponding journal. For this reason it has been proposed to write the undo-log information to multiple journals and, thus, keep redundant copies of the undo-log information.