1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to data backups of file systems.
2. Description of the Related Art
Conventional backups of file systems may take a considerable amount of time and backup media. In many file systems, a significant portion of the data (e.g. files) is not changed after creation or an initial period of access. The data that are backed up in a full backup are typically the same data that were backed up in the last full backup or even on earlier full backups.
The conventional mechanism to back up data is to periodically perform a full backup of everything in the file system, for example once a week or once a month, and to perform incremental backups between full backups, for example every day. FIG. 1 illustrates a typical backup pattern using a conventional backup mechanism. Using the conventional mechanism, full backups are performed periodically, and each full backup makes a copy of 100% of the data in the file system, even though a large percentage (e.g., 90%) of that data may not have changed since the previous full backup. Therefore, using the conventional backup mechanism, data for which one or more copies may exist on previous full backups 108 are backed up on each current full backup 104.
To perform a restore from conventional backups, a current full backup 104 is typically restored, and then any changed data are restored from the incremental backups 106. Typically, the file system cannot be brought back online and made operational until all the data have been restored.
HSM (Hierarchical Storage Management) systems may be installed in some file systems to move file data from (expensive) online storage to (cheaper) offline media—typically, but not necessarily, tape. The file metadata (inode, directory entry) is left online to provide transparency for applications using the file system. Typically, only when an application attempts to use data that has been moved offline will the HSM copy the data back to disk.
An HSM system and a conventional backup mechanism may be used together to reduce the time and media needed to make backup copies, as illustrated in FIG. 2. The HSM system may sweep through a file system looking for “old” data—data that have not changed recently. The HSM system may make copies of the data in HSM-specific pools or volumes. Once the required HSM copies have been made, the file is called “migrated”. The backup mechanism, if it is able to recognize data that has been migrated by the HSM file system, may not back up the data for a migrated file—only metadata (e.g. the directory entry and inode metadata) may be backed up. For example, when 90% of the data in a file system is old (unchanging), eventually all of that data will have been migrated by HSM. Then, a typical full backup of the file system will copy only 10% if the data, and all of the file system metadata.
Thus, HSM may be used to identify unchanging data and make backup copies of that data to special pools not used by the conventional full and incremental backup processes. Note that the benefit of HSM to conventional backups may be realized regardless of whether the customer actually uses HSM to remove some of the data from the file system. The benefit may be realized even if the data is left online.
However, there are several problems with using HSM in combination with a backup mechanism to improve the performance of conventional backups. For one thing, this solution requires the configuration and administration of two different mechanisms—the HSM system and the backup mechanism. HSM is complex, and it may take considerable administrative effort to set up and maintain an HSM system. HSM may also have scalability issues when dealing with file systems containing more than a few million files. An HSM system may have its own proprietary databases for keeping track of offline volumes and migrated data. These databases may be different from standard backup mechanism databases and catalogs. The backup mechanism must be able to recognize data that have been migrated by the HSM system and for which there are sufficient copies made by the HSM system. In addition, not all file systems have the infrastructure (e.g., a DMAPI implementation) required to support HSM systems, so there are file systems that cannot benefit from the improvements that HSM may offer in conventional backups. Further, data stored on the HSM storage media may be in a different storage format than data stored on the backup media. Backup utilities typically have standard functions that work with the backup format; the HSM format may not be usable by backup utility functions.
Another alternative for improving the performance of backups is the “synthetic full backup”. Synthetic full backups are synthesized from existing full backups. In a synthetic backup, instead of doing a full backup from the file system from “scratch”, a (copy of a) previous full backup is used; data that have been deleted from the file system are subtracted from the full backup and data that are new or have changed on the file system are added to the full backup. From that, a new “synthetic” full backup is generated. The synthetic backup will still end up copying unchanging data, since synthetic full backups require rewriting older data every time.