1. Field of the Invention
This invention relates to data backup in a computer system. Particularly, this invention relates to managing storage space for snapshot backups.
2. Description of the Related Art
Making backups of computer data is often a critical function for any business to provide for the safe recovery of data that may be compromised or destroyed through accidents or deliberate acts. Conventional backups to tape (or other media) involve performing a complete copy of specified data. However, it is often not practical to perform a full backup at each backup interval. So, it is typical for incremental backups to be made where only the data that has changed since the last full backup is backed up. Thus, when a restore of the backup data is necessary, the last full backup as well as all incremental backups since must be utilized. Depending upon the number of incremental backups, the task quickly becomes extremely timeconsuming.
Known snapshot backups operate somewhat differently than simply making a full copy of specified data and subsequent incremental backups in a traditional manner. Through special handling of the data, a snapshot backup comprises a virtually perfect copy of the data at a specific point in time, a “picture” of the data taken at a specified instant, typically without regard to the amount of data being backed up. Effectively, a snapshot backup operates by only backing up the changes that have occurred. In addition, only the differences in the data are transferred across the backup connection, greatly reducing the overhead required to operate a snapshot backup compared to a traditional backup. In one sense, snapshot backups completely reverse a traditional backup process by functioning with only changed data increments. Snapshot backup technology has continued to develop over recent years.
U.S. Patent Application 20050182910 by Stager et al., published Aug. 18, 2005, discloses a method for adding redundancy to a continuous data protection system beginning by taking a snapshot of a primary volume at a specific point in time, in accordance with a retention policy. The snapshot is stored on a secondary volume, and the snapshot is cloned and stored on a third volume. The cloned snapshot is eventually expired according to a cloning policy.
U.S. Pat. No. 6,073,222 by Ohran, issued Jun. 6, 2000, discloses a system and method for using a virtual device established at a computer system to access data as it existed at a selected moment in a mass storage system associated with the computer system, regardless of whether new data has been written to the mass storage system. When an original data block is to be overwritten in the mass storage system with a new data block, the original data block is first preserved in a preservation memory associated with the computer system. The preservation memory thereby preserves the original data block as it existed at the selected moment. A virtual device established at the computer system provides access to data as it existed at the selected moment. This data may include original data blocks preserved in the preservation memory and other original data blocks that remain in the mass storage device, and which have not been overwritten with new data. In order to provide access to the data, the virtual device accesses the preservation memory to obtain those original data blocks that have been preserved therein and also accesses the mass storage device to obtain those original data blocks that remain in the mass storage device.
U.S. Pat. No. 6,081,875 by Clifton et al., issued Jun. 27, 2000, discloses a backup system and method that provides for creation of a reconciled snapshot backup image of a database while the database, residing on a disk array system, is in use by users. A backup computer running a commercial backup utility is connected between the array system and a tape storage system. While the backup is underway, write requests to the database are suspended until the data currently in those data blocks is copied and stored in an original data cache. The disk system address of the copied block and a pointer to the location of the block in the cache are stored in a map. The backup utility incrementally reads portions of the database from the disk system and forwards those portions to the tape system. Prior to each portion being forwarded to the tape system, all data blocks in the portion which have an address that corresponds to the address of a block in the cache are discarded and replaced with the data from the cache for that address.
Cox et al., “Pastiche: Making Backup Cheap and Easy”, USENIX Association, 5th Symposium on Operating Systems Design and Implementation, 2003, pp. 285-98, discloses Pastiche, a simple and inexpensive backup system. Pastiche exploits excess disk capacity to perform peer-to-peer backup with no administrative costs. Each node minimizes storage overhead by selecting peers that share a significant amount of data. it is easy for common installations to find suitable peers, and peers with high overlap can be identified with only hundreds of bytes. Pastiche provides mechanisms for confidentiality, integrity, and detection of failed or malicious peers. A Pastiche prototype suffers only 7.4% overhead for a modified Andrew Benchmark, and restore performance is comparable to cross-machine copy.
Riedel, “Storage Systems—Not Just a Bunch of Disks Anymore,” QUEUE, ACM, June 2003, pp. 32-42, discusses the larger storage systems that are typically detached from the server hosts—the specialized appliances that form the core of data centers everywhere. Riedel introduces the layers of protocols and translations that occur as bits make their way from the magnetic domains on the disk drives and interfaces to desktop computers.
Cooper et al., “Peer-to-Peer Data Trading to Preserve Information,” ACM Transactions on Information Systems, Vol. 20, No. 2, April 2002, pp. 133-170, discusses how a network of autonomous archiving sites can trade data to achieve the most reliable replication. A series of binary trades among sites produces a peer-to-peer archiving network. Two trading algorithms are examined, one based on trading collections (even if they are different sizes) and another based on trading equal sized blocks of space (which can then store collections). The concept of deeds is introduced; deeds track the blocks of space owned by one site at another. Policies for tuning these algorithms to provide the highest reliability, for example by changing the order in which sites are contacted and offered trades, are discussed. Finally, simulation results are presented that reveal which policies are best. The experiments indicate that a digital archive can achieve the best reliability by trading blocks of space (deeds), and that following certain policies will allow that site to maximize its reliability.
Use of snapshot technology to provide advanced data protection solutions has been growing rapidly to provide capabilities such as near instant backup, near instant restore, multiple snapshot-based backups to provide multiple fast recovery points. The snapshot technology provider under such environment may be any layer in the storage stack such as the file-system, volume manager, or storage subsystem. A data protection solution under this environment, such as Tivoli Data Protection (TDP) for hardware for DB2, mySAP, and Oracle, will have to manage the storage space used for creating the snapshot backups in accordance with the policy. However, managing snapshot storage space together for snapshot backups and tape backups presents some limitations with conventional implementations.
Users implementing these advanced data protection solutions may frequently perform snapshot based backups, e.g. every two hours, and maintain multiple versions of these snapshot backups to provide increased recoverability. Typically, these snapshot backups exist on the same storage as the data being protected. In some cases, the backups may even have dependency on the validity of the source data (e.g. software based copy-on-write snapshots) for the backup to be valid. Therefore, the snapshot based backups may only provide limited data protection and require use of traditional tape-based backup methods at a lesser frequency, e.g. nightly, for a complete data protection solution.
Tape backups may also utilize snapshot technology to create a point-in-time copy of the data, which is then used to move the data to the tape. The data movement process can be performed from an alternate system to reduce backup impact on a production system. However, the snapshot must stay active for the duration of the tape backup. In addition, the storage used for this snapshot cannot be reused for any other purposes (such as generating a new snapshot backup or another tape backup) until the tape backup is complete, a process which can take hours.
These foregoing factors present problems in managing the snapshot storage space in the context of policy based operation that must ensure that the snapshot used by a tape backup remains valid for the backup duration. Also, if a snapshot needs to be retained as a valid backup, it must be subjected to the policy enforcement where it could become eligible for reuse per the policy.
Existing data protection for hardware products support snapshot and tape backups which are limited to only one snapshot. When performing a tape based backup, another backup (e.g. the next backup) cannot be performed until the current backup is completed. Depending on the backup duration, this limits backup frequency.
In view of the foregoing, there is a need in the art for systems and methods for snapshot backups that manage a snapshot storage space in the context of policy based operation while ensuring that the snapshot used by a tape backup remains valid for the backup duration. Further, there is a need for such systems and methods to provide a snapshot backup while subjected to policy enforcement where the snapshot can become eligible for reuse under the policy. There is also a need for such systems and methods to manage more than one simultaneous snapshot backup. As detailed hereafter, these and other needs are met by the present invention.