1. The Field of the Invention
The present invention relates to systems and methods for the hierarchical storage of data. More specifically, the present invention allows a system implementing hierarchical storage of data to migrate and recall data from between local and remote storage. Even more specifically, this invention teaches migration of data in a manner that enables rapid freeing of local storage space and teaches recall of data in a necessary-only manner to maintain the local storage space in an unencumbered fashion.
2. The Prior State of the Art
Many advances have been made in computer hardware and software, but some general principles have remained constant. For example, there continues to be a difference in the cost of storing data as a function of the medium used to store the data and the accessibility thereto. This is true despite recent advances that have lowered the costs of memory/data storage and increased the overall storage capacity of individual computing devices. In general, it is more expensive to store a data word in cache memory than in system RAM. System RAM, in turn, is more expensive per storage word than magnetic disk storage. Similarly, magnetic disk storage is more expensive per storage word than archival storage. It is thus apparent, regardless of recent advances, that motivation exists to transfer unused or less frequently used data to less expensive storage devices, provided that adequate access, access speeds, etc., are available in retrieving the data to make the transfer cost-effective.
In order to achieve cost-effectiveness, hierarchical data storage systems have been developed that are generally modeled on a mainframe-computing paradigm and include a separate, non-integrated hierarchical storage system. The hierarchical storage system typically administers the placement of units of storage, called datasets, into a hierarchy of storage devices. The hierarchy of storage devices may include a wide range of devices such as high end, high throughput magnetic disks, collections of normal disks, jukeboxes of optical disks, tape silos, and collections of tapes that are stored off-line in either local or remote storage. When deciding where these various datasets should be stored, hierarchical storage systems typically balance various considerations, such as the cost of storing the data, the time of retrieval, the frequency of access, and so forth. Typically, the most important factors are the length of time since the data was last accessed and the size of the data.
Files typically have various components such as a data portion where a user or software entity can store data, a name portion, and a flag portion that may be used for such things as controlling access to the file and for identifying various properties of the data. In prior art systems, hierarchical storage systems sometimes remove files from primary local storage and migrate them to remote storage and leave a "stub file" in their place at the local storage. Stub files typically contain information that allows the hierarchical storage system to determine where and at what time the data in the file was migrated. In general, the process of migrating data from local storage to remote storage involves identifying files that have met particular migration criteria, migrating the data from local to remote storage, deleting the data from local storage, and replacing the deleted data in the local storage with an appropriate stub file. Such prior art approaches, however, have several problems.
For example, one of the primary motivating factors for employing a hierarchical storage system is to remove data that is accessed less frequently and place it onto more cost-effective storage in order to free local storage for other data that is accessed more frequently. When removing this data, however, traditional hierarchical storage systems generally have a fixed migration policy that will migrate data from local storage to remote storage only when certain migration criteria are met. As an example, one criterion of the migration policy might indicate that if data is not accessed within a specified time frame, then the data is moved to remote storage and replaced with a stub file as previously described. In contrast, if these criteria are not met, then no migration can occur. The process of moving data from local storage to remote storage, however, can take a significant amount of time depending on the access speed of the remote storage medium and the amount of data to be moved. Typically, access speeds of the remote storage medium are several orders of magnitude slower than access speeds of the local storage medium.
Another problem encountered when migrating data only upon the meeting of certain criteria occurs when local storage space is needed to store new incoming data. Yet, hierarchical storage systems of this nature are not able to free the local storage fast enough to enable the incoming data to be captured because the data on the local storage is not yet eligible to migrate to remote storage under the policy. With this scenario, it is even likely that some incoming data would be lost. Consequently, the time it takes to migrate data from local storage to remote storage and free remote storage is too long.
Thus, if traditional hierarchical storage systems are to be used to maintain sufficient free local storage to accommodate any incoming data, the migration and freeing of data must be accomplished before the local storage space is needed. As a result, some hierarchical storage systems begin to migrate data to remote storage once the percentage of free local storage drops below a defined threshold. In this manner, a hierarchical storage system can help maintain an amount of free local storage that is anticipated to be sufficient for any local storage needs. This approach, however, creates two fundamental problems. First, it requires an individual to estimate how much local storage may be needed at any given instant in time. This, however, commonly causes storage thresholds to be set at a level larger than any anticipated need. In turn, systems are created that constantly maintain a significant amount of free storage space above that which is required. Ultimately, carrying more local storage than necessary increases expense. Second, although such estimates can sometimes be made reliably, many systems do not lend themselves to such estimates.
In order to reduce the time necessary to recover local storage space, some systems attempt to "pre-migrate" data to remote storage. Pre-migration of data entails migrating data to remote storage but not deleting (truncating) the data from the files stored locally. Sometimes, the pre-migrated data files may be marked to indicate their existence on both local and remote storage. Then, when local storage space is required, the pre-migrated data is truncated from the local files to recover local storage space.
While the pre-migration of data from local files allows fairly rapid recovery of local storage space, problems still exist. Before local files can be truncated, checks must be performed to identify whether the local file has been changed since the pre-migration occurred because this might serve to invalidate the pre-migration. Such checks can be relatively costly in terms of time because, even if such checks pass, truncation can be a fairly complex procedure that is limited to normal file system speeds. Locating pre-migrated files to truncate can also be costly in time.
What is needed to overcome the problems in the prior art is a hierarchical storage system that can free local storage with a speed that is not limited by the access speed of the remote storage medium. Furthermore, the system should reduce or eliminate the time spent in locating files to truncate and the time spent checking whether changes have occurred that preclude truncation of the file without "re-migrating" the file to remote storage. Such a system would allow much less local storage to be reserved since local storage could be freed as fast as it is needed. There does not currently exist a system for hierarchical storage that possesses these features.
Another problem with existing hierarchical storage systems is the inability to determine when changes have been made to a file that has been pre-migrated. Most hierarchical storage systems are implemented using technology that attempts to intercept file accesses that may cause the pre-migration of a file to be invalidated. Unfortunately, in many systems such intercepts are easily circumvented, either intentionally or unintentionally by a variety of mechanisms. Thus, in many instances one cannot absolutely guarantee that a file that has been pre-migrated has not been changed in a way that would invalidate the pre-migration. It would be an advance to have a hierarchical storage manager that is able to identify, with certainty, when files have been modified.
Regardless of which prior art implementation has been used to achieve migration of data from local to remote storage, for various reasons it is often necessary at a later time to recall the data to the local storage from the remote storage. The problem, however, is that when a request is received involving a particular file, prior art systems typically return the entire file to include all data and every associated file property from the remote storage to the local storage. This is despite the fact that many times the request might only be concerned with information contained in the stub file stored on the local storage. As a result, conventional systems squander valuable time when responding to requests for information relating to files because of the time required to locate the data on the remote storage and transfer or return it to the local storage.
Another problem is encountered when users merely desire to read the data of a file without writing to the file. Conventional systems will recall the entire contents of the data stored remotely and leave a copy in the local storage. This unnecessarily encumbers the local storage space with data that is simply being read.
Accordingly, it would be an advance to circumvent the recall of data to local storage from remote storage if the data of the file was not necessarily required in local storage.