1. The Field of the Invention
The present invention relates to systems and methods for providing support for remote storage in a file system. More specifically, the present invention allows a file system to provide native support for remote storage so that the file system can intrinsically identify and process I/O requests involving files or other entities where a portion of the files or entities are stored remotely.
2. The Prior State of the Art
Many advances have been made in computer hardware and software, but some general principles have remained constant. Although the cost of memory and data storage has continued to decrease, and the storage capacity of devices of a given size has continued to increase, there continues to be a difference in the cost of storing data depending on several factors such as the medium used to store the data, and the accessibility of the data. For example, it is generally more expensive to store a data word in cache memory then in system RAM. System RAM, in turn, is more expensive per storage word than magnetic disk storage. Magnetic disk storage is more expensive per storage word than archival storage. Thus, there continues to be motivation to move unused or less frequently used data to less expensive storage. In addition, the desire to access an ever increasing amount of data provides motivation to store data in a cost-effective manner while, simultaneously, providing adequate access speed to the desired data.
Prior art attempts at developing and implementing remote storage of data are based on a mainframe computing model with a separate, non-integrated hierarchical storage system. The hierarchical storage system administers the placement of units of storage, called datasets, in a hierarchy of storage devices. The hierarchy of storage devices may include a wide range of devices such as high end, high throughput magnetic disks, collections of normal disks, jukeboxes of optical disks, tape silos, and collections of tapes that are stored off-line. When deciding where various datasets should be stored, hierarchical storage systems typically balance various considerations, such as the cost of storage, the time of retrieval, the frequency of access, and so forth.
Files typically have various components such as a data portion where a user or other software entity can store data, a name portion, and various flags that may be used for such things as controlling access to the file. In prior art systems, files that are removed from primary storage and migrated to remote storage are often replaced with a "stub file," which contains information that allows the hierarchical storage system to determine where the data in the file has been stored. Such an approach, however, has several problems.
A stub file stores information describing the location of the remotely stored file in the data portion of the file. Traditional file systems are not generally set up to allow the file system to determine the contents of the data portion of a file. Therefore, prior art systems relied on a non-integrated hierarchical storage manager to read the data portion of stub files and determine where a remotely stored file is located. Such a non-integrated approach requires that the hierarchical storage system intercept any I/O operations that are directed to files that have the same appearance as a stub file. In other words, it is impossible to tell from looking at a file whether it is a stub file or a non-stub file that simply happens to have the same appearance as a stub file. For example, stub files often have a fixed length. Beyond this fixed length, however, there is nothing external to distinguish a stub file from a normal file that just happens to have the identical length of a stub file. In order to identify all stub files, a hierarchical storage manager is typically set to intercept all calls directed to files that have the same length as a stub file. Once a call is intercepted, the file can then be examined to determine whether it is indeed a stub file or a normal file that just happens to be of the same length.
It is apparent from the above discussion that there is a certain probability that a non-stub file will be examined by a hierarchical storage manager. This result is undesirable since it slows access to normal files and causes additional unnecessary processing overhead. Prior art systems have attempted to eliminate this overhead by employing different methods to differentiate a stub file from a user file that has the same number of data bytes, yet is a normal data file. These various approaches can reduce the probability of error, but cannot totally eliminate it. It would, therefore, be an advancement in the art to provide a hierarchical storage manager that can positively differentiate between normal data files and data files with remotely stored data. It would also be an advancement in the art to have a hierarchical storage manager that incurred the additional overhead associated with remotely stored files only when such remotely stored files were actually involved in an I/O operation processed by an I/O system.
One advantage of prior art hierarchical storage managers, is that the non-integrated nature of the hierarchical storage manager allows hierarchical storage to be implemented in a system with little or no impact on the existing file system. Such a hierarchical storage manager can examine each call to determine if it involves a stub file. If the call involves a stub file, then the hierarchical storage manager can intercept the call and handle the call. If, however, the call does not involve a stub file, then the hierarchical storage manager can pass the call along to the regular file system. Thus, the file system does not need to know that a hierarchical storage manager exists. Unfortunately, such an approach provides additional overhead for each call that is made even if the call does not involve a stub file. This is because each call must be examined by a hierarchical storage manager. If a system employs multiple hierarchical storage managers, the overhead can rapidly compound. It would, therefore, be desirable to provide a hierarchical storage manager which maintains the benefits of causing little or no change to the existing file system while, simultaneously, minimizing or eliminating any overhead for files without remotely stored data. In other words, it would be very advantageous to have an approach to hierarchical storage that maintained existing access speeds for files without remotely stored data and only incurred additional overhead for files with remotely stored data. It would be extremely advantageous to maintain all these properties even when a plurality of hierarchical storage managers were used in a single system.
Another disadvantage of prior art methods of hierarchical storage management is that the model upon which they are based does not readily allow for incorporation and adaptation to new storage requirements. For example, prior art methods of storing data remotely involved replacement of a normal file with a stub file. Such a stub file replaces virtually all the components of a normal file with those of the stub file. Therefore, when any operation involves the normal file, it typically has to be retrieved from remote storage in order to fulfill the request. It would be very advantageous to allow a greater degree of flexibility in determining what information associated with a particular file is stored remotely so that operations that are likely to be performed with greater frequency may be handled without recalling the entirety of the file from remote storage.