1. Field of the Invention
The present invention is directed to file system operations in a computer network. More particularly, the invention concerns the sharing of files maintained by one network host with another network host. Still more particularly, the invention relates to automated file caching methods for transferring files between a source host and a destination host in anticipation of subsequent file accesses or disconnected operation of the destination host.
2. Description of the Prior Art
By way of background, existing network file systems allow data files maintained on one network host (server) to be accessed on another network host (client) over a network. Some network file systems additionally have the capability of caching a local copy of a server file on the client when the file is first accessed from the client. Subsequent file accesses will then be satisfied from the cached copy. File caching increases the speed of subsequent file access and also facilitates disconnected operation of the client because the cached file is available even when the client no longer communicates with the server. File caching additionally entails the ability to synchronize the original file on the server with updates made to the cached local copy at the client. The foregoing file caching features are useful for persons who have a need to work with local files rather than remote files due to network delay, reliability, or connectivity issues. One such scenario is when a worker whose files are stored on a desktop computer or network server at the office desires to work on some of their files at home on a different computer, such as a portable laptop computer.
In order for disconnected operation to be a productive exercise, it is important that all necessary files be transferred to the client prior to disconnection from the server. One way that this can be handled is for a worker to manually select and transfer each required file. However, transferring files individually takes time and may interfere with the person's other duties. Moreover, some files may be inadvertently missed, requiring a subsequent copy operation provided the server is still accessible. If the server is no longer accessible, the absence of a crucial file may require termination of the remote work session. Another option would be to manually copy entire directories or volumes. However, such transfers take time, particularly if the directory or volume is large. Thus, the worker may be delayed while waiting for the transfer to complete.
There are existing network file system mechanisms known as “hoarding” wherein selected files are automatically prefetched from a server and cached onto a client's disk before the client disconnects from the server. The prefetched files will then be available on the client during the disconnection period and local updates to such files can be propagated back to the server upon reconnection. The selection of files that are to be prefetched may be determined automatically using a predictive hoarding algorithm. The predictive hoarding technique attempts to automatically predict the set of files that should be hoarded by monitoring user file access. Predictive hoarding algorithms range from simple LRU (Least Recently Used) caching to techniques that are considerably more complex.
LRU caching algorithms are exemplified by the “Coda” distributed file system developed at Carnegie Mellon University. Such algorithms maintain the most recently accessed files in the client's cache. Older files are removed from the cache in favor of newer files. This technique can be effective provided users work on the same files for the bulk of the file access monitoring period. If the user has a tendency to jump around from task to task, the effectiveness of the LRU technique will be diminished. To address this problem, user's may specify files that should remain “sticky” in the client's disk cache by boosting their priority so that the LRU caching algorithm treats them as most recently used files. To reduce the burden on users, a spy operation may be performed that identifies files accessed over a given time period and creates a suggested list of files that will receive the priority boost. Users may be given an opportunity to edit the file to override the suggestions if they so desire. However, if for some reason a file is not specified, and if that file is not retained in the local computer's disk cache prior to disconnection, the file will not be available following disconnection.
An example of a complex hoarding algorithm is one described by G. Kuenning et al. in their paper entitled “Automated Hoarding for Mobile Computing,” 16th ACM Symposium on Operating System Principles (1977). This hoarding algorithm is based on a file clustering technique that attempts to assign files to projects, prioritize the projects, and cache files for the highest priority project(s). More particularly, an observer process collects user accesses and feeds them to a correlator. The correlator evaluates the file references and calculates “semantic distances” among various files based on the reference sequence of an individual process. These semantic differences drive a clustering algorithm that assigns each file to one or more projects. When new hoard contents are to be chosen, the correlator examines the projects to find those that are currently active, and selects the highest-priority projects until a maximum hoard size is reached. Only complete projects are hoarded based on the assumption that partial projects are not sufficient to make progress.
Notwithstanding the sophistication of the foregoing complex hoarding technique, subsequent simulation studies by the Kuenning et al. have shown that it is no more effective than a modified form of LRU hoarding in which heuristics are applied to increase performance. G. Kuenning et al., “Simplifying Automated Hoarding Methods,” ACM MSWiM, (2002). These heuristics include identifying and ignoring certain programs that access files for scanning purposes only, forcing the hoarding of critically important files such as shared libraries, forcing the hoarding of files that are rarely accessed but necessary for system initialization, and always hoarding non-files such as directories, symbolic links and device files.
It is to improvements in network file caching technology to which the present invention is concerned. What is needed is a simple yet effective technique by which files maintained on one network host can be automatically transferred to another network host in anticipation of subsequent file accesses or subsequent disconnection, thereby allowing the recipient system to be used for continued file access irrespective of network conditions or the originating system's continued availability for access. What is particularly required is a technique that allows such files to be intelligently copied from one host to another with minimal human intervention and with a high likelihood that all required files will be transferred.