1. Field of the Invention
The present invention relates to data storage. More particularly, the invention is directed to tiered data storage environments in which data storage devices are arranged in a tiered hierarchy and data is stored therein according to policy-aware data placement algorithms. Still more particularly, the invention is concerned with the protection of tiered storage data using improved file backup and restore techniques.
2. Description of the Prior Art
By way of background, the cost of data storage may vary considerably according to the nature and capabilities of the underlying data storage device(s). Exemplary storage cost determinants include the basic storage technology employed (e.g., disk or tape), and device operational characteristics such as access speed, transfer rate, data redundancy, fault tolerance, etc. In a tiered storage system, a collection of storage devices is divided into hierarchically defined storage tiers based on relative device cost (and associated capabilities). This arrangement allows a data owner to leverage its total data storage investment by placing lower value data on less-costly, lower tier storage devices, and reserving high cost, upper tier storage devices for higher value data.
Information Lifecycle Management (ILM) involves the assessment of data “value” and the corresponding assignment of such data to tiered storage. Using policy-based data placement algorithms that classify data according to defined parameters, and which take into account differentiating factors such as access speed requirements, anticipated access frequency, anticipated concurrency level, etc., a data set (e.g., a file, a set of files, a directory, a logical volume, etc.) can be assigned to the storage tier that reflects the best utilization of data storage resources. ILM also contemplates that a data set created in one storage tier may need to be moved to other storage tiers during its lifetime according to changes in the data set's perceived value.
In order to provide application transparency relative to the tiered storage system and its underlying ILM transactions, there is typically a single file system that provides a global namespace for all of the data stored in the various tiers. Applications can thus access their data in conventional fashion (e.g., via file and pathname lookups) without having to be aware of how the data is assigned to particular storage devices within the file system. An application's accessibility to its data will likewise be unaffected by the movement of data between tiers.
A present disadvantage of ILM and policy-based data placement within the context of a single file system is the difficulty of implementing traditional data backup/restore protection. Consider, for example, a data backup/restore sequence in which data maintained by the tiered storage file system is periodically copied to a backup file system on a backup storage resource, and then subsequently restored to the original file system. A conventional backup/restore product will backup a data file's contents and its standard file metadata (e.g., ownership and authorization identifiers, timestamps, etc.) to the backup storage. However, conventional backup/restore products have little or no understanding of the kind of extended ILM metadata that may be used by the tiered storage file system to maintain a file in an ILM environment (e.g., storage tier identifiers, service class identifiers, etc.). Nor is such information readily available through conventional file system interfaces. As a result, the subsequent restore operation cannot guarantee that a file's contents will be placed in the policy-determined storage tier. In all likelihood, the file will not be placed in the correct storage tier during the restore operation. The result will be sub-optimum storage utilization and application performance. Storage tiers may also fill prematurely, which can cause application outages. Application outage time is often very expensive to an enterprise.
Although it may be possible to implement policy placement rules that assign data based on standard file metadata, it is not practical to use the metadata of a backed up file during a conventional restore operation. This is because the full set of a file's standard attributes is typically not communicated to the target file system until after the contents of the file have been restored. Restoring the metadata before the file data has been restored would render the file accessible but incomplete, and any attempt by an application to access the file could lead to serious application errors.
A typical procedure for restoring a file previously backed up to tape (assuming the file systems are POSIX-compliant) would involve the following steps using conventional file system calls in the tiered storage file system:                1) Restore application issues open( ) call to the tiered storage file system using the O_CREATE flag to create the file to be restored (the restore file);        2) Restore application copies data blocks from backup file buffers to restore file buffers;        3) Restore application sets owner, group, timestamps and permissions for the restore file from the backed up metadata; and        4) Restore application issues close( ) call to the tiered storage file system to close the restore file.In this example, the restored file can easily be evaluated to the wrong storage tier. This happens because the allocation decision needs to be made before the first file block is written, not after the data is written. Here, the allocation information (i.e., the backed up file metadata) is not known to the target file system until the end of the restore operation.        
It is to improvements in the backup and restoration of files in a tiered data storage environment that the present invention is directed. In particular, what is needed is a technique for handling extended file metadata during backup and restore operations and for correctly identifying a file's proper tiered location whenever the file is restored from a backup storage file system to the tiered storage file system.