1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to Hierarchical Storage Management (HSM) systems.
2. Description of the Related Art
In data storage environments such as corporate LANs, total storage needs are increasing and storage costs are an increasing part of the IT budget. Hierarchical Storage Management (HSM) is a data storage solution that provides access to vast amounts of storage space while reducing the administrative and storage costs associated with data storage. Rather than making copies of files as in a backup system, HSM migrates files to other forms of storage, freeing hard disk space. HSM systems may migrate files to less expensive forms of storage based on rules tied to the frequency of data access. A typical two tier HSM architecture may include hard drives as primary storage and rewritable optical or tape as tertiary, or offline, storage. Events such as crossing a storage threshold and/or reaching a certain file “age” may activate the migration process. As files are migrated off primary storage, HSM leaves stubs to the files on the hard drive(s). These stubs point to the location of the file on the alternative storage, and are used in automatic file retrieval and user access. The stub remains within the file system of the primary storage, but the file itself is migrated “offline” out of the file system onto the alternative storage (e.g. tape).
In file systems, files may include parts that are actively accessed, and other parts that are not actively accessed. For example, in data-warehousing environments and other database environments, database files may be partially inactive, with only initial headers or other parts being updated while the rest of the file is inactive. HSM applications typically decide to migrate inactive files from disk to tape by looking at timestamps at the file level, typically maintained in inodes for the files. However, file-level timestamps are inadequate for the HSM application to detect partially inactive files as in the data warehousing setups. File-level timestamps are inadequate for detecting partially inactive files. Therefore, partially inactive files remain entirely online in the file system consuming disk space. Manual methods have been used to migrate out such files and then migrate them back in to disk when required.
Mechanisms have been proposed to allow HSM application to partially migrate files by establishing a file system clone and looking at the file in the clone to determine which parts of the file has changed. However, this approach cannot detect read activity on the file and may cause unnecessary file migrations even though there is read activity on the file.
A file system may be defined as a collection of files and file system metadata (e.g., directories and inodes) that, when set into a logical hierarchy, make up an organized, structured set of information. File systems may be mounted from a local system or remote system. File system software may include the system or application-level software that may be used to create, manage, and access file systems.
File systems may use data structures such as inodes to store file system metadata. An inode may be defined as a data structure holding information about files in a file system (e.g. a Unix file system). There is an inode for each file, and a file is uniquely identified by the file system on which it resides and its inode number on that system. An inode may include at least some of, but is not limited to, the following information: the device where the inode resides, locking information, mode and type of file, the number of links to the file, the owner's user and group IDs, the number of bytes in the file, access and modification times, the time the inode itself was last modified and the addresses of the file's blocks on disk (and/or pointers to indirect blocks that reference the file blocks).