1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to data storage systems.
2. Description of the Related Art
In data storage environments such as corporate LANs, total storage needs are increasing and storage costs are an increasing part of the IT budget. More and/or higher capacity storage devices may be added, but this solution is expensive and difficult to manage, and does not address the root of the problem. There is a limit to storage capacity no matter how much storage capacity is added. This solution tends to provide a constant cost per byte for storage, as it tends not to take advantage of lower cost-per-byte storage devices. A high percentage of data stored in a storage environment may be infrequently accessed, or never accessed at all after a certain time. Lower cost-per-byte for storage may be realized using methods that move at least some of this infrequently accessed data off more expensive storage devices and on to the less expensive storage devices.
Hierarchical Storage Management (HSM) is a data storage solution that provides access to vast amounts of storage space while reducing the administrative and storage costs associated with data storage. HSM systems may move files along a hierarchy of storage devices that may be ranked in terms of cost per megabyte of storage, speed of storage and retrieval, and overall capacity limits. Files are migrated along the hierarchy to less expensive forms of storage based on rules tied to the frequency of data access.
In HSM systems, data access response time and storage costs typically determine the appropriate combination of storage devices used. A typical three tier HSM architecture may include hard drives as primary storage, rewritable optical as secondary storage, and tape as tertiary storage. Alternatively, hard drives may be used for secondary storage, and WORM (Write Once, Read Many) optical may be used as tertiary storage.
Rather than making copies of files as in a backup system, HSM migrates files to other forms of storage, freeing hard disk space. Events such as crossing a storage threshold and/or reaching a certain file “age” may activate the migration process. As files are migrated off primary storage, HSM leaves stubs to the files on the hard drive(s). These stubs point to the location of the file on the alternative storage, and are used in automatic file retrieval and user access. The stub remains within the file system of the primary storage, but the file itself is migrated “offline” out of the file system onto the alternative storage (e.g. tape).
In HSM, when a file that has been migrated to a lower rank of storage, such as tape, is accessed by an application, the stub may be used to retrieve and restore the file from the lower rank of storage. The file appears to be accessed by the application from its initial storage location, and demigration of the file back into the file system is performed automatically by the HSM system using the stub. While on the surface this demigration may appear transparent to the user, in practice the process of accessing and restoring the file from offline storage (e.g. tape) may introduce a noticeable time delay (seconds, minutes, or even hours) to the user when compared to accessing files stored on primary storage. Thus, accessing offloaded data in an HSM system is typically non-transparent to the application or user because of the difference in access time. In addition, since HSM introduces a substantial time lag to access offloaded data, HSM systems typically only offload low access (essentially, no access) data.
A file system may be defined as a collection of files and file system metadata (e.g., directories and inodes) that, when set into a logical hierarchy, make up an organized, structured set of information. File systems may be mounted from a local system or remote system. File system software may include the system or application-level software that may be used to create, manage, and access file systems.
File system metadata may be defined as information that file system software maintains on files stored in the file system. File system metadata may include, but is not limited to, definitions and descriptions of the data it references. File system metadata may include one or more of, but is not limited to, inodes, directories, mapping information in the form of indirect blocks, superblocks, etc. Generally, file system metadata for a file includes path information for the file as seen from the application side and corresponding file system location information (e.g. device:block number(s)). File system metadata may itself be stored on a logical or physical device within a file system.
File systems may use data structures such as inodes to store file system metadata. An inode may be defined as a data structure holding information about files in a file system (e.g. a Unix file system). There is an inode for each file, and a file is uniquely identified by the file system on which it resides and its inode number on that system. An inode may include at least some of, but is not limited to, the following information: the device where the inode resides, locking information, mode and type of file, the number of links to the file, the owner's user and group IDs, the number of bytes in the file, access and modification times, the time the inode itself was last modified and the addresses of the file's blocks on disk (and/or pointers to indirect blocks that reference the file blocks).
Many databases provided by DBMS vendors (e.g. Oracle, IBM's DB2, etc) offer a feature called data partitioning which enables users to store data from a single table on different partitions in a file based on certain criteria. In a partitioned database, a partitioning key or keys may be used to partition table data across a set of database partitions. A database partition may include its own user data, indexes, configuration files, and transaction logs. An exemplary criterion that may be used as a key is time; for example, every month's sales data may be stored in separate partitions in one file. An enterprise may want to have some data in partitions online at all times, but some partitions may not be frequently accessed. Users still want to access the older partitions fairly easily, even if infrequently.
Database systems that support data partitioning may allow users to manually move older database partitions offline. However, once the database partitions are moved offline, they cannot be accessed unless they are brought back manually by the DBA. Therefore, the process is not transparent. For older or inactive database partitions, typically only the headers of the files are being accessed. However, the rest of the files are being kept on expensive storage unnecessarily.
Database users typically want to have their most frequently accessed data on the best quality storage. Given a database file, it is typical to have only a small region of a file that is heavily accessed, and other regions of the files that are infrequently accessed if at all. As the access patterns and workload change, the data may be manually migrated between higher quality and more costly storage and lower quality and less costly storage. It is not easy for users to constantly monitor the usages and move data accordingly. Even if they could, the movement of data is not a transparent process and may cause database down time. In at least some databases, including databases that support data partitioning, headers may be continuously updated and thus the database file may appear to be always current. In database systems that support data partitioning, even though the content of a partition may not have been accessed for a long time, the database engine may update a header block of the partition frequently, for example once every second. Even for a partition in which the data has not been accessed for a long time, the timestamp of the partition is always current. Thus, traditional HSM-like solutions may not work well for databases because they are based on the file-level timestamps of files, which may always be current in database files.
Some file systems may be extent-based, rather than block-based. An extent may be defined as one or more adjacent, or contiguous, blocks of storage within a file system. Extents, unlike blocks are not fixed size. Extent-based file systems allocate disk blocks in groups (extents), rather than one at a time, which results in sequential allocation of blocks. As a file is written, one or more extents may be allocated, after which writes can occur at the extent level. One or more extents may be allocated for one file. Using extents may make it less likely to have a fragmented file system.