1. Field of the Invention
This invention relates to mass storage systems for digital computers, and in particular to a method for providing a static snapshot or image of a mass storage system.
2. Description of Related Art
It is desirable during the operation of a computer system with a mass storage system, such as a magnetic disk, to periodically make a backup copy of the data stored on the mass storage system to allow for recovery in the event of a failure of the mass storage system. This is commonly done by reading the data stored on the mass storage system and writing it to a magnetic tape.
However, if the data stored on the mass storage system is being updated by other programs as the backup copy is being made, the image of the data on the mass storage system written to tape may be inconsistent. This is because normal backup techniques either copy the blocks from the mass storage system sequentially to the linear-access tape, or walk the file system stored on the mass storage system, starting with the first block of the first file in the first directory and proceeding in order to the last block of the last file of the last directory. The backup program is not aware of updates performed to a block of the mass storage system after that block has been written to tape.
This problem of inconsistent data being written to tape is particularly likely to occur if the mass storage system is being used by a database management system, where an update may involve changing information stored on different parts of the mass storage system. If a database update is made while the backup tape is being written, the image of the database management system written to tape will have the old values for any data already written to tape at the time of the database update, and the new values for any data written to tape following the database update. A restoration based on the tape image of the database would yield an inconsistent database.
Horton et al., U.S. Pat. No. 5,089,958, which is hereby incorporated by reference in its entirety for the material disclosed therein, discloses a technique for producing an image of a mass storage system at any point in time after the technique is started. This is done by establishing a base image of the mass storage system at the start of the technique and a log indicating each change made to the mass storage system. An image at any point in time can then be produced by starting with the base image and making all the changes indicated in the log up to that point in time. To improve performance, the Horton system also provides for differential images so that the compilation of changes to form an image does not have to start with the base image.
There are two difficulties with using the technique of Horton to provide an image for backup operations. First, the technique is not designed to provide a static snapshot or image of the mass storage system, but to allow an image from any point in time to be created at some later time. This increases the complexity of the technique and requires the compilation of changes whenever a virtual image is desired.
The second difficulty with using the technique of Horton is that the log must store a copy of each change made to the mass storage system in order to produce an image of the mass storage system as it was at a specified time. This means that the size of the log can grow without bound, eventually exhausting the space available for its storage. At this point, updates to the mass storage system are not possible without compromising the ability to produce an image from any previous point in time.
With many database systems or file systems, certain key blocks (such as master directory blocks) are frequently updated, perhaps with every update to any other block. A copy of these blocks must be written to the log each time they are changed. This will, of course, result in a very large log file, with many of the entries being copies of the key blocks as they changed over time.
Another approach to creating a static image of a mass storage system is possible if the mass storage system has the ability to produce a mirror, or identical copy, of one disk's data on a second disk. At the time the static image is needed, mirroring of data is stopped and the mirror disk is used as the static image. When the static image is no longer necessary (for example, when the tape backup has been completed), the two disks are resynchronized, by copying any changes made during the time mirroring was not active to the mirror disk, and mirroring is resumed.
This approach also has problems. Unless there are three or more disks mirroring the information on the main disk, when mirroring is stopped to produce the static image there is no longer the redundancy of mirrored disk or disks and updates can be lost if there is a disk failure. Furthermore, it requires an entire disk to be devoted to the storage of the static image.
But the major disadvantage of this mirror disk approach is the time necessary to restart mirroring after the static image is no longer needed. This requires updating the mirror disk with all the changes that have been made since mirroring was stopped. If a log of these changes is not available, this means that all the data on the mirror disk must be copied from the disk which has been updated. For large disks such as would be found on a database system, this could take many hours.