1. Technical Field
The present invention relates generally to storage management, and more specifically relates to a system and method for providing content based anticipative storage management.
2. Related Art
Hierarchical Storage Management (HSM) is now considered a mandatory minimum capability for virtually all archival systems. Current HSM systems are driven by explicitly stated rule based policies that are derived from low level attributes like age, size, frequency of usage or some user assigned file priority.
The simplistic rules that typically govern the movement of data from expensive media such as disk to cheaper slower media such as tape do not support intelligent, proactive data migration based on a comprehensive consideration of data attributes, content, and interrelationships. Consequently, hierarchical storage space is sub-optimally managed and system performance is diminished.
Current HSM systems solve the problem of archival and space management by providing the ability to set explicit policies based on low level file attributes. TSM HSM™ and LEGATO DiskExtender™ are some systems that offer these capabilities. These systems use very simplistic means to determine data that are candidates for archival. Likewise, data recall operations are typically triggered by specific user requests.
The simplistic attributes employed by current state-of-art hierarchical storage management tools do not address the current sophisticated requirements of data migration (archival and recall) across a hierarchically arranged set of storage systems. Ironically, archival operations generally ignore readily available and important information describing relationships among data objects submitted to archives. Likewise, recall operations do not anticipate recalls of data objects that are likely to be needed by users. Consequently, data migrations (recalls and secondary archival) do not perform as well as is possible with more sophisticated rules, policies, and information.
Current HSM systems are based on low level attributes like file size, age, frequency of usage etc. Hence, the HSM policies are constrained to work on a limited set of attributes. This is of limited use in complicated storage scenarios where the users are interested in migrating files based on the content, rather than just file size, etc. Users do not have the flexibility to set higher level policies for migration across the storage hierarchy like: “Migrate all the files related to drug trials conducted before 1998 and which mention compounds X, Y and Z to tape storage,” or “Migrate all files which refer to project number 1S23 to cheaper SATA disks,” or “Migrate all medical records and related documents for patients who have been discharged.” Similarly, current HSM systems cannot handle more complex policies such as: “Migrate all files that satisfy X, to storage media Y,” where X can be a standard SQL predicate or condition and Y is a type of storage media with a defined cost and performance, perhaps as part of a storage pool.
These types of policies cannot be supported by the existing HSM systems because such systems are severely limited by the number and nature of the policy attributes. In addition, the storage attributes are relatively limited—particularly with regard to performance and cost characteristics of the storage media and therefore the potential service level afforded by the media in question.
Current HSM systems are driven by explicitly defined rules “If X, then Y,” which are dependent on the policy attributes (data object and storage media). However, current systems storage scenarios are more complex. Users cannot always define all the possible rules. Ideally, HSM system should also be governed by a set of implicit rules. For example, users may always migrate a particular set of hospital bills and medical records at the same time. It is quite possible that there is an implicit relationship between the set of object classes involving hospital bills and medical records, e.g., they might belong to a particular patient who has been discharged. So if there is an explicit HSM policy to transfer all hospital bills of a patient who has been discharged, then it makes sense to migrate the medical records of the patient also.
Similarly, if a patient's medical records are being pulled up from tape storage, then it may be a good idea to also pull up the hospital bills. An insurance agent who is looking at the medical records may also want to check the hospital bills. Instead of issuing two separate explicit data migration commands, it would be preferable if the records were automatically pulled up in one single command, thereby reducing the tape latency. These types of rules cannot be implemented with current HSM systems.