a. Field of the Invention
This invention relates to distributed file systems for use in a network of computer systems having access to shared files.
b. Related Art
In a distributed computer network, multiple data processing nodes, each under control of its own operating system, are joined by a communications network and share files stored on disks. Access to the shared files is provided by distributed file system software, which is implemented in each of the respective systems.
Some of the component functions of a distributed file system that include locking, remote file access, and recovery from failure of a data processing node.
Locking is one method of synchronizing accesses to shared files to ensure that conflicts do not occur between tasks. By this method, a requesting task first obtains access to a data structure known as a lock and then indicates the type of access that is desired in order to either read or modify data in the file, database or other data object that is protected by the lock. Other tasks are then prevented from accessing or given only limited access (e.g. read only) to the protected data until the requesting task changes the indication of the type of access desired and releases the lock so that other tasks can access it.
Commonly, a global lock manager is provided to resolve lock requests among tasks running on different processors and to maintain queues of tasks awaiting access to particular lock entities. Computer Networks having a global lock manager are described, for example, in U.S. Pat. No. 5,161,227 to Dias et al. and U.S. Pat. No. 5,226,143 to Baird et al.
Remote file access is a software mechanism for a data processing node on a network to fetch data stored on disks attached to another data processing node on the network. The node with disk attached is called a server, and the node using the data is called a client. Remote file access can also be used for writing data on disks of a data processing node. Parts of the software that enables this function run on both the client and the server. When a client node needs to look up the contents of a directory, for example, it sends a directory request message to the server, and the server responds with the directory contents. Similarly, when a client wants to create a file in the directory, the client sends a request to that effect, the server carries out the task, and responds with an acknowledgement.
Recovery from a failure of a data processing node is a critical function in a computer network, especially when nodes are related to each other as clients and servers as described above. Data processing nodes may fail or crash because of hardware errors or more increasingly because of software errors. In order to restore access to the disks connected to the failed node either the failed node is restarted or another node is set up in such as way that it can take over the control of the disk and continue to provide access to the disk. The latter method requires independent power source and multiple access ports for the disk. However, restarting the node or take over the disk control by a second node is only part of the failure recovery. The software state of the server should also be reconstructed for proper and uninterrupted functioning of the distributed file system.
For example, the server fails while handling the request for creation of a file in a directory, the system should determine whether the create operation has actually succeeded. The presence of the name in the directory is not enough information to decide this as the file may have existed prior to the creation attempt. In that event, if the server were had not failed the remote create operation would have returned a notification indicating the preexistence of the file.