1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to data storage systems.
2. Description of the Related Art
Regulatory compliance is a major concern of business entities such as corporations. The financial industry and public companies are subjected to increasing amounts of scrutiny about how they perform financial transactions. As part of this scrutiny, business entities may be required to conform to regulations regarding data retention. For example, regulations may specify that any data pertaining to certain types of work that a company does or that certain officers of the company participate in must be kept for a certain period of time, and during that time the data must not be altered; in other words, the data must be immutable.
The regulations requiring immutability do not say how immutability of data should be implemented. Vendors spend time and resources working on solutions to guarantee that certain data is not altered. One solution is to store the data offline, for example on a WORM (Write Once, Read Many) device such as a CD-R or some other immutable data container. An immutable data container can be one or more independent devices or a collection of devices represented as one virtual device. One such device is the EMC Centerra. Once on the immutable data container, the data cannot be changed. However, conventional solutions for data retention typically require business entities to use non-standard interfaces and customization of both applications and storage infrastructure to meet the new regulatory requirements.
File Systems
A file system may be defined as a collection of files and file system metadata (e.g., directories and inodes) that, when set into a logical hierarchy, make up an organized, structured set of information. File systems organize and manage information stored in a computer system. File systems may support the organization of user data by providing and tracking organizational structures such as files, folders, and directories. A file system may interpret and access information stored in a variety of storage media, abstracting complexities associated with the tasks of locating, retrieving, and writing data to the storage media. File systems may be mounted from a local system or remote system. File system software may include the system or application-level software that may be used to create, manage, and access file systems.
File system metadata may be defined as information that file system software maintains on files stored in the file system. File system metadata may include, but is not limited to, definitions and descriptions of the data it references. File system metadata may include one or more of, but is not limited to, inodes, directories, mapping information in the form of indirect blocks, superblocks, etc. Generally, file system metadata for a file includes path information for the file as seen from the application side and corresponding file system location information (e.g. device:block number(s)). File system metadata may itself be stored on a logical or physical device within a file system.
Tiered Storage Hierarchies
Some file systems may be implemented as tiered storage hierarchies. In tiered storage hierarchies, two or more tiers of storage are implemented, data may be stored to one or more of the tiers, and stored data may be moved or migrated between the tiers of storage. Storage and migration of data may be controlled by a set of rules or policy. Hierarchical Storage Management (HSM) is an exemplary data storage solution for implementing tiered storage hierarchies. FIG. 1 illustrates file system software 100 implementing a generic tiered storage hierarchy 102 with three storage tiers 104. Tiered storage hierarchy 102 is a file system. File system software 100 may include the system or application-level software that may be used to create, manage, and access the file system. Note that each storage tier 104 may include one or more physical storage devices or, alternatively, may be implemented as volumes or virtual devices allocated across one or more physical storage devices.
In conventional tiered storage hierarchies, all data stored within the file system implemented as a tiered storage hierarchy is mutable, or modifiable. To make a set of data immutable, the data must be moved out of the primary file system, or “offline”, to tertiary storage.
Hierarchical Storage Management (HSM)
Hierarchical Storage Management (HSM) is a data storage solution that provides access to vast amounts of storage space while reducing the administrative and storage costs associated with data storage. HSM systems may move files along a hierarchy of storage devices that may be ranked in terms of cost per megabyte of storage, speed of storage and retrieval, and overall capacity limits. Files are migrated along the hierarchy to less expensive forms of storage based on rules that may be tied to the frequency of data access. In HSM file systems, data access response time and storage costs typically determine the appropriate combination of storage devices used. A typical three-tier HSM architecture may include hard drives as primary storage, rewritable optical as secondary storage, and tape as tertiary storage. Alternatively, hard drives may be used for secondary storage, and WORM (Write Once, Read Many) optical may be used as tertiary storage.
Rather than making copies of files as in a backup system, HSM migrates files to other forms of storage, freeing hard disk space. Events such as crossing a storage threshold and/or reaching a certain file “age” may activate the migration process. As files are migrated off primary storage, HSM leaves stubs to the files on the hard drive(s). These stubs point to the location of the file on the alternative storage, and are used in automatic file retrieval and user access. The stub remains within the file system of the primary storage, but the file itself is migrated “offline” out of the file system onto the alternative or tertiary storage (e.g. tape).