1. Field of the Invention
This invention relates to the field of distributed file systems, where one or more client machines are communicating with a server machine of a particular file system over a communication network. Specifically, this invention deals with maintaining the consistency of a cached file system directory when one or more tasks on a client machine are updating the directory.
2. Description of the Related Art
As a preliminary to discussing the problem to which the present invention is directed, it will be useful to discuss some basic notions relating to operating systems, file systems in general and distributed file systems in particular.
Operating systems are the software components that perform basic system services for application programs running on a computer system. Among other things, operating systems manage the use by application programs of various system resources such as data files, executable program files, hardware resources such as processors and memory, and the like. An important subset of operating systems is that of UNIX-based operating systems, so called because they conform in varying degrees to a set of standards established by the original operating system of that name created at AT&T Bell Laboratories. UNIX-based operating systems include the Linux operating system as well as the IBM AIX operating system and the UNIX System Services component of the IBM z/OS operating system. UNIX-based operating systems are discussed in more detail in such publications as K. Christian, The UNIX Operating System (1988), and A. S. Tanenbaum, Modern Operating Systems (1992), especially at pages 265-314, both of which publications are incorporated herein by reference.
Operating systems use file systems to organize data and program files so that they may accessed by applications. File systems generally are discussed in the above-identified reference of Tanenbaum (1992) at pages 145-204; UNIX file systems in particular are discussed in the above-identified reference of Christian (1988) at pages 49-62, as well as in Tanenbaum at pages 287-290. Typically, such a file system takes the form of a hierarchical file system (HFS), in which files are logically contained in directories, each of which may be either a root directory or a subdirectory contained in a parent directory.
Distributed file systems, like file systems generally, are well known in the art. As defined in the IBM Dictionary of Computing (1994) at page 209, a distributed file system (DFS) is a file system “composed of files or directories that physically reside on more than one computer in a communication network”. There are a number of commonly available distributed file system products known in the art. Some of these distributed file systems cache the contents of a file system directory and some do not. (Here the term “directory” means a directory of file names that is part of a standard UNIX file system tree.)
Most UNIX file system directories contain the names of all the subfiles and subdirectories in that directory. Each object in a UNIX file system is also identified by a number called an inode number and a unique generation number. (Because many UNIX file systems keep a table of objects, and the inode is an index into the table, the unique generation number is used to detect if a reference to a file is the current file described by the table slot or not) Thus a directory keeps a list of name/inode/unique triples for each object contained in that directory. However, in all of these cases (in distributed systems) where the client caches the contents of a directory, the client invalidates the cached directory contents if a user task on the same client machine makes an update such as the insertion of a name into the directory for the creation of a file. This results in the loss of the cached data, and a future read request to the directory has to send messages to the server to reestablish the cached contents of the directory on the client.
Some examples of prior-art distributed file systems include the following:
Server Message Block/Common Internet File System (SMB/CIFS)—This distributed file system does not cache directory contents, and all directory reads are sent to the server. Thus this protocol is much less efficient than protocols that keep cached directory data on client machines.
Distributed File Services/Distributed Computing Environment (DFS/DCE)—This distributed file system uses a token management scheme to allow clients to cache the contents of directories. However, any time a task on the client machine tries to update the directory contents (such as creating a file in the directory), the DFS/DCE client tosses the directory contents out of memory. This means that a future readdir( ) (read directory) operation needs to call the server.
The DFS/DCE client does have a name-to-inode lookup cache for directories. Basically for each directory, the client keeps a hash table of name-to-inode pairs, which allows the inode number to be determined during a lookup( ) operation. This hash table also keeps track of negative searches; thus, it would also remember if a particular name did not exist in the directory. This scheme still results in unnecessary messages being sent to the server when the DFS/DCE client removes the buffers that contain the contents of directories from its cache.
One of the most common file system functions is this lookup( ) operation, which is used to determine if a particular file name exists in a directory and to provide the address of the data structure that represents the sub-object (normally called a vnode) if the file name exists in the directory. Normally when creating a file, a lookup is first made to determine if the name exists (and normally the name would not exist since the user is attempting to create a new file with the given name) and then if it does not exist a create call is made to create the new file.
Distributed file system clients typically uncache the contents of directory pages when an update is requested to a directory by a user on the client machine. This results in increased server traffic due to the cache miss that results on the next lookup or readdir operation. More particularly, given the fact that any update to a directory (such as a create call) invalidates the directory contents, if a user is attempting to create more than one file, any lookup that occurs after the first create would have to be sent to the server, since the directory buffers are no longer in the cache and the name lookup cache would have no entry (yet) for the new name. Thus, given the fact that any update request immediately invalidates the client's cached buffers for the directory, any future readdir( ) and lookup( ) request for a name not in the lookup cache would require a call to the server. This could potentially more than double the number of calls to a server if the user was removing or creating many files at one time: one call for the create or remove, and one call for the intervening lookup.
The Network File System (NFS) version 4 client behaves very similar to DFS/DCE, except that it does not cache negative lookups, so it is slightly less advanced than DFS/DCE in this respect.
To summarize their shortcomings in this respect, if a user on a DFS/DCE or NFS file system client was creating 100 files, there would be at a minimum 200 calls to the server. More particularly, for each such file, there would be one call for a preceding lookup to check for file existence (which normally indicates the file does not exist yet), which would be sent to the server because the directory buffers were uncached, and one call for each create.