1. Field of the Invention
This invention relates to computer systems and, more particularly, to data backup and restoration within computer systems.
2. Description of the Related Art
Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding many terabytes of data, for mission-critical applications. Numerous different types of storage devices, potentially from multiple storage vendors, with varying functionality, performance and availability characteristics, may be employed in such environments.
Any one of a variety of factors, such as system crashes, hardware storage device failures, software defects, or user errors (e.g., an inadvertent deletion of a file) may potentially lead to data corruption or to a loss of critical data in such environments. In order to recover from such failures, various kinds of backup techniques may be employed. Traditionally, for example, backup images of critical data may have been created periodically (e.g., once a day) and stored on tape devices. However, a single backup version of production data may not be sufficient to meet the availability requirements of modern mission-critical applications. For example, for disaster recovery, it may be advisable to back up the data of a production application at a remote site, but in order to be able to quickly restore the data in the event of a system crash or other error unrelated to a large-scale disaster, it may be advisable to store a backup version near the production system. As a consequence, in some storage environments, multiple stages of backup devices or hosts may be employed. A first backup version of a collection of production files may be maintained at a file system at a secondary host, for example, and additional backup versions may be created periodically at tertiary hosts from the secondary host file system. The use of multiple stages may also help to reduce the impact of backup operations on production application performance. In some environments, multiple layers of additional backup versions may be generated for additional enhancements to availability: for example, production data may be copied from a production host or server to a first layer backup host, from the first layer to a second layer, from the second layer to a third layer, and so on. Hosts or servers at several of the layers may also be susceptible to similar kinds of errors or faults as the production hosts, and hence may also need some level of backup support for their own data, as well as for the backup versions of production hosts' data.
Traditionally, the ability to initiate restore operations has often been restricted to backup administrators or other backup experts, and end users have usually not been allowed to restore data objects. However, requiring administrators to support restore operations needed as a result of common errors (such as inadvertent deletions of user files) may lead to unnecessary delays and reduced productivity. Techniques that allow end users to perform restore operations as needed (e.g., on objects to which the end users have access permissions, such as a file owned by an end user and inadvertently overwritten by the end user), without requiring the end users to understand the details of backup layers or to know where backup versions are physically stored, may thus help reduce administrative costs and improve overall organizational efficiency.
Traditional backup techniques may also result in data duplication in some cases. For example, in some environments, snapshot facilities (e.g., provided by an operating system) may be used to create point-in-time images of data that is to be backed up at one or more layers of a backup hierarchy. For each snapshot of a collection of data, some traditional snapshot techniques may store a “path” for the original or source version of the data, and may be capable of restoring the data of the snapshot to the path associated with the snapshot. Thus, for example, if the data of two production directories A and B were backed up at a secondary host directory C, and a snapshot of C (with an associated path to C) were created at a tertiary host using such a snapshot technique, the typical way to restore A from the snapshot would be to first restore C to the secondary host, and then copy A from the secondary host to the primary host. If a direct restoration from the tertiary host to the production system were desired, additional snapshots associated with the paths to A and B would be needed. Creating such additional snapshots may, however, result in duplication of data, because the contents of A and B would also be stored within the snapshot of C. The cost of duplicating data in this manner may quickly become unsustainable, especially in environments where hundreds of images may at least partly duplicate data stored in other images. If, on the other hand, only snapshots of A and B were stored on the tertiary host in the example describe above in an effort to minimize storage used for the snapshots, and no snapshot of C were stored, the ability to restore C (which may also have contained data other than the copies of A and B) from the tertiary host may be lost.