1. Field of the Invention
Exemplary embodiments of the present invention relate to storage management systems, and more particularly, to backup and archival operations in storage management systems that employ a hierarchical storage manager provided by a third party.
2. Description of Background
Because of factors such as growing complexity within storage infrastructure, pressure to reduce backup and recovery windows, and constant changes that threaten application availability, conducting efficient and effective storage management for file systems has become increasingly difficult. Remote backup and recovery services such as IBM's Tivoli Storage Manager (TSM) provide users with a network-based system to protect data from hardware failures, errors, and unforeseen disasters by storing backup and archive copies on offline and offsite storage. TSM software provides centralized, Web-based storage administration through policy-based automation of a variety of tasks to enable users to backup, restore, archive, and retrieve data using a hierarchy of data areas (for example, disk, optical, and tape-based media).
Backup procedures are designed to provide the ability to back up successive copies or versions of files to offline storage so that, should an online storage device fail, a data error occur, someone accidentally delete a file, or the data become inaccessible for any other reason, the chosen version of the data can be restored by placing the backup copy back into a designated system. Archive procedures are designed to provide the ability to create a copy of a file or a set of files representing an end point of a process into the hierarchy of storage for long-term retention over a specified amount of time. Archived files can either remain on the local storage media or be deleted. The retrieval process locates the copies within the archival storage and places back into a designated system.
Because higher-speed storage devices (such as hard disk drive arrays) are more expensive than slower devices (such as optical discs and magnetic tape drives), some larger file systems employ a Hierarchical Storage Manager (HSM) to automatically move data between high-cost and low-cost storage media. In a file system managed by an HSM (for example, IBM's TSM HSM for Windows and TSM for Space Management), most of the file system data is stored on slower offline devices and copied to faster online disk drives as needed. The HSM monitors the use of data in a file system, identifies which files in a file system have not been accessed for long periods of time, and migrates all or some of their data to slower storage devices. This frees space in the faster online storage, thereby allowing additional files and more data to be stored. In effect, an HSM provides an economical solution to storage large amounts of data by turning faster disk drives into caches for the slower mass storage devices.
IBM's TSM HSM for Windows provides as HSM client for Windows NTFS file systems operating under Windows 2003. Using this HSM client, individual files from directories, complete NTFS files systems can be migrated to HSM storage according to automated policies based on data longevity, access speed, and cost needs. A migrated file leaves a small piece of the file, called a stub file, on the local file system that contains the necessary metadata to recall the migrated file so that the file appears to reside locally. The migration of files is transparent—Windows users and applications can see and access migrated files like any other file present on the file system, and when a user accesses a migrated file, it is dynamically and transparently restored to client storage. When original files are replaced by stub files, the stub files themselves are backed up to backup storage (for example, by the TSM backup-archive client) when a full backup or an incremental backup is initiated because the files have changed by becoming stub files.
Traditionally, backup operations in file systems managed by an HSM have been accomplished either by recalling all migrated files to back them up or simply backing up only the HSM stubs without corresponding file contents. Restore operations are performed by recreating whatever was backed up. A problem with backing up the file contents of an HSM managed file system is that all files must be recalled. This can overwhelm the local file system by exceeding its ability to store all the recalled files and cause a volume full condition that results in backup failure. A problem with backing up HSM stubs only is that the actual file contents are not preserved in the backup data store. As a result, if for some reason the HSM store is lost, the stubs are useless as they now reference lost records.
The current solution to these problems in HSM-managed file systems is to backup only the HSM stubs while documenting that the HSM data store should never expire. As a result, however, the size of the associated store monotonically increases because file contents are preserved regardless of whether they are still referenced by stubs. Moreover, if for some reason there is corruption of the HSM store, the file contents are lost and cannot be recreated.