Data of a computer system generally is archived on a periodic basis, such as at the end of each day; at the end of each week; at the end of each month; and/or at the end of each year. Data may also be archived before or after certain events or actions. When archived, the data is logically consistent, i.e., all of the data subjected to the archiving process at any point in time is maintained in the state as it existed at that particular point in time.
The archived data provides a means for restoring a computer system to a previous, known state, which may be necessary when performing disaster recovery such as occurs when data in a primary storage system is lost or corrupted. Data may be lost or corrupted if the primary storage system, such as a hard disk drive or other mass storage system, is physically damaged, if the operating system of the primary storage system crashes, or if files of the primary storage system are infected by a computer virus. By archiving the data on a periodic basis, the computer system always can be restored to its state as it existed at the most recent backup time, thereby minimizing any permanent data loss should disaster recovery actually be performed. The restoration may be of one or more files of the computer system or of the entire computer system itself.
There are numerous types of methods for archiving data. One type includes the copying of the data subject to the archive to a backup storage system. Typically, the backup storage system includes backup medium comprising magnetic computer tapes or optical disks used to store backup copies of large amounts of data, as is often associated with computer systems. Furthermore, each backup tape or optical disk can be maintained in storage indefinitely by sending it offsite. In order to minimize costs, such tapes and disks also can be reused on a rolling basis if such backup medium is rewriteable, or destroyed if not rewriteable and physical storage space for the backups is limited. In this later scenario, the “first in-first out” methodology is utilized in which the tape or disk having the oldest recording date is destroyed first.
One disadvantage to archiving data by making backups is that the data subject to the archiving process is copied in totality onto the backup medium. Thus, if 250 gigabytes of data is to be archived, then 250 gigabytes of storage capacity is required. If a terabyte of data is to be backed up, then a terabyte of storage capacity is required. Another related disadvantage is that as the amount of data to be archived increases, the period of time required to perform the backup increases as well. Indeed, it may take weeks to archive onto tape a terabyte of data. Likewise, it may take weeks if it becomes necessary to restore such amount of data.
Yet another disadvantage is that sometimes an “incremental” backup is made, wherein only the new data that has been written since the last backup is actually copied to the backup medium. This is in contrast to the “complete” backup of the data, wherein all the data subject to the archiving process is copied whether or not it is new. Restoring archived data from complete and incremental backups requires copying from a complete backup and then copying from the incremental backups thereafter made between the time point of the complete backup until the time point of the restoration. A fourth and obvious disadvantage is that when the backup medium in the archiving process is stored offline, the archived data must be physically retrieved and mounted for access and, thus, is not readily available on demand.
In view of the foregoing, it will be apparent that it is extremely inefficient to utilize backups for restoring data when, for example, only a particular user file or some other limited subset of the backup is required. To address this concern, a snapshot can be taken of data whereby an image of the data at the particular snapshot moment can later be accessed. The object of the snapshot for which the image is provided may be of a file, a group of files, a volume or logical partition, or an entire storage system. The snapshot may also be of a computer-readable medium, or portion thereof, and the snapshot may be implemented at the file level or at the storage system block level. In either case, the data of the snapshot is maintained for later access by (1) saving snapshot data before replacement thereof by new data in a “copy-on-write operation,” and (2) keeping track of all the snapshot data, including the snapshot data still residing in the original location at the snapshot moment as well as the snapshot data that has been saved elsewhere in the copy-on-write operation. Typically, the snapshot data that is saved in the copy-on-write operation is stored in a specially allocated area on the same storage medium as the object of the snapshot. This area typically is a finite storage data of fixed capacity.
The use of snapshots has advantages over the archiving process because a backup medium separate and apart from a primary storage medium is not required, and the snapshot data is stored online and, thus, readily accessible. A snapshot also only requires storage capacity equal to that amount of data that is subjected to the copy-on-write operation; thus, all of the snapshot data need not be saved to a specifically allocated data storage area if all of the snapshot data is not to be replaced. The taking of a snapshot also is near instantaneous.
Advantageously, a snapshot may also be utilized in creating a backup copy of a primary storage medium onto a backup medium, such as a tape. As disclosed, for example, in Ohran U.S. Pat. No. 5,649,152, a snapshot can be taken of a base “volume” (a/k/a a “logical drive”), and then a tape backup can be made by reading from and copying the snapshot onto tape. During this archive process, reads and writes to the base volume can continue without waiting for completion of the archive process because the snapshot itself is a non-changing image of the data of the base volume as it existed at the snapshot moment. The snapshot in this instance thus provides a means by which data can continue to be read from and written to the primary storage medium while the backup process concurrently runs. Once the backup is created, the snapshot is released and the resources that were used for taking and maintaining the snapshot are made available for other uses by the computer system.
A disadvantage to utilizing snapshots is that a snapshot is not a physical duplication of the data of the object of the snapshot onto a backup medium. A snapshot is not a backup. Furthermore, if the storage medium on which the original object of the snapshot resides is physically damaged, then both the object and the snapshot can be lost. A snapshot, therefore, does not provide protection against physical damage of the storage medium itself.
A snapshot also requires significant storage capacity if it is to be maintained over an extended period of time, since snapshot data is saved before being replaced and, over the course of an extended period of time, much of the snapshot data may need saving. The storage capacity required to maintain the snapshot also dramatically increases as multiple snapshots are taken and maintained. Each snapshot may require the saving of overlapping snapshot data, which accelerates consumption of the storage capacity allocated for snapshot data. In an extreme case, each snapshot ultimately will require a storage capacity equal to the amount of data of its respective object. This is problematic as the storage capacity of any particular storage medium is finite and, generally, the finite data storage will not have sufficient capacity to accommodate this, leading to failure of the snapshot system.
Accordingly, snapshots generally are used solely for transient applications, wherein, after the intended purpose for which the snapshot is taken has been achieved, the snapshot is released and system resources freed, perhaps for the provision of a subsequent snapshot. Furthermore, because snapshots are only needed for temporary purposes, the means for tracking the snapshot data is usually stored in RAM memory of a computer and is lost upon the powering down or loss of power of the computer, and, consequently, the snapshot is lost. In contrast thereto, backups are used for permanent data archiving.
Accordingly, a need exists for an improved system and method that, but for protection against physical damage to the storage medium itself, provides the combined benefits of both snapshots and backups without the time and storage capacity constraints associated with snapshots and backups. One or more embodiments of the present invention meet this and other needs, as will become apparent from the detailed description thereof below and consideration of the computer source code incorporated herein by reference and disclosed in the incorporated provisional U.S. patent application.