1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to data storage systems.
2. Description of the Related Art
In data storage environments such as corporate LANs, total storage needs are increasing and storage costs are an increasing part of the IT budget. More and/or higher capacity storage devices may be added, but this solution is expensive and difficult to manage, and does not address the root of the problem. There is a limit to storage capacity no matter how much storage capacity is added. This solution tends to provide a constant cost per byte for storage, as it tends not to take advantage of lower cost-per-byte storage devices. A high percentage of data stored in a storage environment may be infrequently accessed, or never accessed at all after a certain time. Lower cost-per-byte for storage may be realized using methods that move at least some of this infrequently accessed data off more expensive storage devices and on to the less expensive storage devices.
Hierarchical Storage Management (HSM) is a data storage solution that provides access to vast amounts of storage space while reducing the administrative and storage costs associated with data storage. HSM systems may move files along a hierarchy of storage devices that may be ranked in terms of cost per megabyte of storage, speed of storage and retrieval, and overall capacity limits. Files are migrated along the hierarchy to less expensive forms of storage based on rules tied to the frequency of data access.
In HSM systems, data access response time and storage costs typically determine the appropriate combination of storage devices used. A typical three tier HSM architecture may include hard drives as primary storage, rewritable optical as secondary storage, and tape as tertiary storage. Alternatively, hard drives may be used for secondary storage, and WORM (Write Once, Read Many) optical may be used as tertiary storage.
Rather than making copies of files as in a backup system, HSM migrates files to other forms of storage, freeing hard disk space. Events such as crossing a storage threshold and/or reaching a certain file “age” may activate the migration process. As files are migrated off primary storage, HSM leaves stubs to the files on the hard drive(s). These stubs point to the location of the file on the alternative storage, and are used in automatic file retrieval and user access. The stub remains within the file system of the primary storage, but the file itself is migrated “offline” out of the file system onto the alternative storage (e.g. tape).
In HSM, when a file that has been migrated to a lower rank of storage, such as tape, is accessed by an application, the stub may be used to retrieve and restore the file from the lower rank of storage. The file appears to be accessed by the application from its initial storage location, and demigration of the file back into the file system is performed automatically by the HSM system using the stub. While on the surface this demigration may appear transparent to the user, in practice the process of accessing and restoring the file from offline storage (e.g. tape) may introduce a noticeable time delay (seconds, minutes, or even hours) to the user when compared to accessing files stored on primary storage. Thus, accessing offloaded data in an HSM system is typically non-transparent to the application or user because of the difference in access time. In addition, since HSM introduces a substantial time lag to access offloaded data, HSM systems typically only offload low access (essentially, no access) data.
A file system may be defined as a collection of files and file system metadata (e.g., directories and inodes) that, when set into a logical hierarchy, make up an organized, structured set of information. File systems may be mounted from a local system or remote system. File system software may include the system or application-level software that may be used to create, manage, and access file systems.
File system metadata may be defined as information file system software maintains on files stored in the file system. File system metadata may include definitions and descriptions of the data it references. Generally, file system metadata for a file includes path information for the file as seen from the application side and corresponding file system location information (e.g. device:block number(s)). File system metadata may itself be stored on a logical or physical device within a file system.
File systems may use data structures such as inodes to store file system metadata. An inode may be defined as a data structure holding information about files in a file system (e.g. a Unix file system). There is an inode for each file, and a file is uniquely identified by the file system on which it resides and its inode number on that system. An inode may include at least some of, but is not limited to, the following information: the device where the inode resides, locking information, mode and type of file, the number of links to the file, the owner's user and group IDs, the number of bytes in the file, access and modification times, the time the inode itself was last modified and the addresses of the file's blocks on disk (and/or pointers to indirect blocks that reference the file blocks).
As data increases and disks get cheaper and faster than tape, the cost of tape media and drives is becoming larger relative to the cost of the online storage. Also, since data tends to increase faster than the speed of tape drives, backup windows are being squeezed. Conventional backup mechanisms may perform incremental backups and then offline apply the incremental backups to the last full backup to make a synthetic full backup. This may help reduce the backup window but does not reduce the total storage media or number of tape drives needed.
Conventional restore mechanisms typically restore a full backup, then one or more incremental backups. In backup systems using cumulative incremental backups (data that has not changed is backed up in a “full” backup, then data that has not changed since the full backup is backed up in incremental backups), the incremental backups grow over time. This may consume considerable amounts of storage media, as well as a considerable amount of time in performing backups and restores. Conventionally, incremental backups are typically performed as file-level incremental backups. When a file or portion of a file is modified, the entire file is backed up during an incremental backup.