1. Field of the Invention
This invention relates to the field of computer data backup and recovery, and particularly to an improved computer data backup and recovery system that both reduces the time and trouble involved in data backup by storing in a secondary system such changes as are made to the data on a primary system each time such a change is made and reduces the time and trouble required for data recovery by using that backup data.
2. Description of Related Art
In order to avoid the loss of and disruption to data due to hardware failure, software failure or disaster, data backup is in general a mandatory aspect of database architecture and management using computers.
Several methods have been developed for ways of backing up data in the field of computers. These methods are commonly contingencies for the problems foreseen by their developers.
One of several such data backup method in the prior art is periodically to acquire copies of complete files. In this data backup method, updates to the source data (By source data, we refer here and below to that data directly acted on, or processed by, the computer system.) performed after the data is copied are not reflected in the backup files. Therefore, this method of data backup entails the danger that large volumes of updated data may be lost, although that volume will vary with the backup interval.
A second data backup method is periodically to make copies of complete files and, when files are updated between backups, to store that update data in logfiles. Performed primarily as on-line processing, in addition to acquiring copies of complete files on magnetic tape or other media periodically, this involves acquiring logfiles with a magnetic disk device, magnetic tape device or similar equipment when files are updated during the intervals between periodic copying of the complete files. There are some differences between specific applications of this backup method.
In brief, copies of complete files are acquired to provide for the possibility of destruction of the files containing the data. The copies are acquired at daily or weekly frequencies, although the specific cycle would be determined by the particular application. One variation of this method is to segment the files containing the source data rather than copying them in toto.
In traditional applications of this second data backup method, updates are suspended when the copies are acquired in order to maintain data integrity.
In such applications of this method, logfiles of updates are acquired when the source data is updated after the file copies have been obtained. The data logged comprises three types: transaction logs (referred to below as “T logs”) of the update data itself, pre-update image logs (referred to below as “B logs”) of the data updated and post-update logs (referred to below as “A logs”) of the data updated.
As a simple description of these logs, consider a bank account with a balance of 100,000 yen. A withdrawal of 10,000 yen results in a balance of 90,000 yen. In this case, the B log records the initial balance of 100,000 yen, the T log records the withdrawal of 10,000 yen and the A log records the resulting balance of 90,000 yen.
If for some reason the source data were destroyed in a computer failure, the complete copies most recently acquired or that part of the complete copies corresponding to the data destroyed would be used to restore the data to its state at the time the source data was copied. Then the log data generated since the copies were obtained would be used to restore the data to its state immediately prior to the destruction of the files. Such traditional applications of this second data backup method have suffered from the drawback of the considerable time required to acquire the file copies and restore the data when files are destroyed as the volume of the source data expands. A further drawback of this method is the difficulty of 24-hour operation since data updates must be suspended when file copies are acquired.
This first data backup method and second data backup method are contingencies for file and device damage.
A third data backup method is provided for such situations as abnormal termination of a program and transaction cancellations. As a contingency for abnormal program termination and transaction cancellations, this third data backup method is one of storing pre-update data (B log data) updated by a transaction (a set of processing activity) over the period from the start of a transaction to its conclusion. If an executing program suffers an abnormal termination or the transaction is canceled, the pre-update data is used to restore the data updated by the transaction to its state prior to the transaction.
A system deadlock requires the same manner of processing as a transaction cancellation. Traditional applications of this third data backup method have suffered from inefficiency in that they require always storing copies of the source data for the infrequent contingencies of abnormal program termination and transaction cancellation.
A fourth data backup method is provided for data updating errors caused by program errors. This fourth data backup method becomes an issue when a program is not correct. Let us suppose, for example, that 10,000 yen is withdrawn from a bank account with a balance of 100,000 yen but that the resulting balance following the withdrawal is reported as 110,000 yen. In such a case, this fourth data backup method would restore the data to the state of the source data immediately prior to the operation performed by the incorrect program and then repair the data on the basis of the T log by running a correct program.
A fifth data backup method is provided for coping with disasters. The objective of acquiring a backup in this fifth data backup method is to cope with potential disaster. By “disaster” here, we refer to fires, floods, earthquakes and the like. A traditional application of this fifth data backup method would be to make copies of backup files and logfiles and store them in a fireproof safe in order to prevent the loss of files in a disaster of this sort.
A more rigorous application of this method would be to make copies of backup files and logfiles and transfer them to a remote location as a contingency against loss of the operational files.
However, this method suffers from the drawback that since the files thus acquired are copied and forwarded to a remote location, data that is completely identical to copies and logs acquired for the operational files cannot be stored and if the operational files are lost, the update data for a certain period of time will be unavailable.
A sixth data backup method is provided for the destruction of file storage media. This sixth data backup method is a backup technology known as Redundant Array of Inexpensive Disks (RAID), a backup method to provide for the destruction of file storage media.
Applications of this sixth data backup method include storing entirely identical duplicates of files, writing file content segmented to multiple storage devices, and generating parity bits to segment data and write it to a storage device.
From the CPU and software point of view, this data backup method appears to be writing to a single disk device, and the operational files and the backup files are stored on the same device. Therefore, this method suffers the drawback of not providing at all for disasters.
This sixth data backup method suffers from the further drawbacks of being incapable of backing out to handle an on-line abnormal termination and being incapable of restoring data to its state at some earlier point in time.
Since this data backup method also suffers from the flaw of requiring more time than ordinary write operations, it is capable of recovery from the destruction of data only in units of disk volumes and has the disadvantage of taking long periods of time to restore data. In addition, RAID structures must be comprised of devices having equivalent performance characteristics.
A more advanced form of this sixth data backup method employs disk mirroring and allows installation of backup devices at remote locations. When data on the operational disk is updated, the addresses storing updated data and the updated data itself are transmitted to the backup device. Some implementations are equipped with functions that, if required, stop updating the backup system at a given time, restore with the updated data collected on the backup device until the data content is identical to that of the operational device.
This has the advantage of performing real-time backups, but suffers from the following drawbacks. Since mirroring between the operational and backup devices uses hardware addresses on the disk devices, the operational and backup devices must have exactly identical performance and functions. And since it involves the use of hardware addresses, the user is unable to specify whether or not to perform mirroring file by file. Nor is it capable of restoring data to its state at some given time in the past to recover from an error made at that point and update it correctly beyond that point.
A seventh data backup method is to acquire data backups when source data is updated. An application of this seventh data backup method is first to acquire a copy of the entire source data and then basically to acquire A logs. Some applications of this seventh data backup method entail the acquisition of B logs and T logs. Since the A logs grow in volume with data update operations if A logs are merely acquired and stored when the source data is updated successively, it takes an extremely long time to restore data to its original state if the source data is destroyed. To avoid this inconvenience, the A logs are periodically merged with the copies initially acquired, the result being effectively identical to the acquisition of a complete copy of the source data at that point in time. However, since this is in principle no different from the periodic acquisition of a complete copy of the data, this method has suffered from the drawback of requiring considerable time because of the need to restore data to its most recent state with the A logs after first restoring the data acquired with the most recent complete copies when the source data is destroyed.
Drawbacks common to these first through seventh, though not the sixth, data backup methods are the difficulty of creating indexed backups and the time they take to implement. A method called databasing has come into on-line use with traditional file systems. Since these databases have complex formats in which it is possible for multiple indices across several levels to be updated, for example, most such file systems are not subjected to backup. The exceptions either use backup systems that consist of writing index updates to A logs to enable recovery or are complete mirroring implementations.