a. Field of the Invention
This invention relates to non-disruptive recovery from file server failures in clustered (distributed) computing environments.
b. Related Art
In a loosely coupled system, multiple data processing nodes, each under control of its own operating system, are joined by a communications network and have shared files stored on disks. Access to the shared files is coordinated by a set of protocols which are implemented in each of the respective system's control program (operating system). The entity that provides access to the file is known as a file-server, or a data-server.
One concern in loosely coupled systems is that if the file-server fails then the shared data serviced by that file-server becomes unavailable. There are several techniques that have been proposed for handling recovery of file-server failures. Some of the schemes include maintaining replicated copies of the file system state. Other schemes involve restarting of the failed server subsequent to a crash. In the former case, each operation on the file system is done on all copies of the file system, resulting in significant loss of performance for file system operations. In the latter case, any attempted operations on the filesystem while the file server is unavailable will fail until the file server is restarted.