Accessing files across a distributed computer network environment presents several competing problems. One problem involves the time and resources required to transmit data across the network for successive reads and writes. The problem can be minimized by storing data in the client node to reduce network traffic, but such storage creates other problems. For example, if one of the several nodes is also writing to the file, the client node reading the file may not be accessing the latest updated file that has just been written. As such, the file integrity is lost.
It is known to provide a distributed file system (DFS) to help alleviate the problems associated with accessing files across the network. In such systems, such as the DCE Distributed File System, a local cache exists at every node in the distributed network. Remote processes executing at a client node access a file on a server node in a two-step caching scheme. The server node gets blocks of the file from its disk and stores them in the server cache. If necessary, the client node goes out over the network and gets blocks of the file from the server cache and stores the blocks in the client cache. When the application of client node seeks data from any block of the file, the client cache is accessed instead of going across the network for each access. Using the client cache to access and store a remote file significantly improves performance because when frequently-referenced data is cached and is locally available referenced data is cached and is locally available when needed (a so-called "cache hit"), the application does not need to use file server resources. Network traffic is thereby decreased, allowing greater scalability of the network and reduction of overhead.
Although distributed file system implementations provide extensive caching of file data either to disk or memory at the client system, such systems lack policies that enforce equitable use of the cache among the file data in the cache. As a result, a single file's data may consume the entire cache, forcing out data from other files that could otherwise be frequently referenced. The problem is exacerbated when the application program requires file I/O that is not conducive to caching, such as when the application requires sequential I/O on a file that is larger than the entire cache itself. In this case, caching of the entire file causes all then-cached data to be flushed to make room for portions of the new file. The lack of fairness reduces the scalability and performance qualities of the distributed file system.