An application that contains a large amount of data may have some fraction of that data that is heavily used and some fraction that is infrequently used. Furthermore, legal requirements, such as the Sarbanes-Oxley Act, have increased the need for applications to retain data long after it has been accessed. Thus, there is a need for managing storage of data with different access patterns in the most cost-effective manner.
There are numerous commercial solutions that currently address the need for managing storage of data with different access patterns. These solutions are commonly referred to as “Hierarchical Storage Management” (“HSM”). Recently, some vendors have started marketing their solutions by referring to them as “Information LifeCycle Management” (“ILM”). All the current solutions have drawbacks, as described hereafter.
Various approaches to HSM allow administrators to specify criteria for migrating infrequently used data to secondary or tertiary storage (e.g., tape) that have less performance capabilities than primary storage. In all these systems, the granularity of migration is an OS file. A forwarding pointer is left in a file system, allowing the HSM system to semi-transparently recall the data. It is semi-transparent because the user is likely to notice the delay and an administrator may need to load the secondary media. However, in the context of a database, for example, there may be relatively few large files and there may be some data within a file that is frequently accessed and some other data within the same file that is infrequently accessed.
Existing approaches to HSM also do not allow segregation of data within primary storage devices and, therefore, do not provide direct access to data stored on secondary or tertiary storage devices. That is, data on the secondary or tertiary storage devices must be accessed via the forwarding pointers, in the primary storage, to the data stored on the lesser performing storage devices. For example, data in secondary storage may be only accessible via an NFS (Network File System) mount from the primary storage. One example of this type of approach allows for use of compressed disks (i.e., secondary storage), whereby an administrator is able to specify that infrequently used data be moved from primary storage disks to the secondary compressed disks. However, the data stored on the compressed disks is only accessible via primary storage. Furthermore, some data on the compressed disks may even eventually be moved to tertiary storage, such as tape. Another approach transparently moves infrequently used data from RAID-1 mirroring to RAID-5 protection, i.e., different disk systems with different storage characteristics. However, neither of these approaches provide for data storage on storage devices with disparate performance capabilities that logically operate as a single primary storage system and that, therefore, provide for direct access to all the data.
One approach to application data management software allows users to relocate a set of business transactional data. The granularity of relocation is not an OS file or database object, but a collection of records. Like HSM, the application's administrator defines the retention policies of a business transaction. This approach does not manage a system of disks, within a primary storage system, with different performance capabilities (it does not interface at the database or storage layer but at the SQL application layer), but the user can specify a slower storage configuration as the target for the relocation of data. Determining the relocation and retention policies for business transactional data requires careful analysis and planning. Hence, this approach is currently used with a small set of applications and databases and is not a general HSM solution. Further with this approach, the data management software is schema-specific and must be particularly configured for a particular application.
As described in “Implementing ILM in Oracle Database 10 g,” available from Oracle Corporation, the partitioning functionality that some databases provide to manually divide the database schema into partitions based on the value of certain data can be used to store the data on different storage devices based on the partitions. For example, a common partitioning column would be a date column and a database record will be stored in a particular partition based on the value of the date column. Another common partitioning column is a status column. A record whose status has been changed to, for example, “PROCESSED”, will be moved to a different partition than, for example, an “ACTIVE” partition.
The drawbacks of the forgoing approach are that it is applicable only to data stored in a RDBMS, the RDBMS schema must have a natural partitioning criteria (such as a date column or a status column), the schema has to be manually partitioned, and the partitions must be manually created in different disks based on expected usage. Furthermore, this approach is not completely automatic because it is state-driven. That is, because the partitioning is performed based on specific values in a column (e.g., put records with year value “2005” in one partition and records with all other year values in a different partition), certain uncontrollable changes may require updating the application (e.g., when the calendar year changes). This is not a general solution and may not be applicable in many scenarios.
Another approach is referred to as “tiered storage,” where data from a particular database or particular application is stored on storage devices in one class of storage devices, whereas data from a different database or application is stored on storage devices in a different class of storage device. For example, a database used daily for transactional data operations may be stored on relatively fast storage devices of one storage class, whereas a separate database of historical data, which is only accessed quarterly or annually, may be stored on relatively slow storage devices of another storage class. One drawback to this approach is that the storage system is effectively segregated on a per-database or per-application basis. Consequently, data within a particular database or a particular application cannot be stored on storage devices having different storage classes.
The foregoing description provides an overview of just some of the many approaches and solutions for managing the storage of data. The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.