1. Field of the Invention
The present invention relates in general to distributed file systems, and more specifically, to a lease based safety protocol for distributed systems with multiple networks.
2. Description of Related Art
Distributed file systems have increasingly become the principal means for data sharing in distributed applications. A distributed file system presents a local file system interface to remote and shared storage, sheltering the programmer from problems that arise in synchronizing access to data and ensuring the consistency of data across many computers. The Andrew File System (AFS) and its successor the Decorum File System (DFS) expanded the client-server file system concept to distributed file systems by enhancing security and scalability as well as providing a uniform global name space for all clients. As programmers understand local file system semantics well, the distributed file system is easy to use when compared to message passing or network programming models.
In addition to data sharing, distributed file systems commonly centralize storage, which improves both the scalability and manageability of storage systems. If each computer has its own local storage, resources are fragmented across an installation. Some computers may underutilize storage while others have full disks. However, if all storage resources are pooled, they can be shared at a fine granularity. This centralized deployment also allows an administrator to easily replace devices, perform centralized backup, and manage heterogeneous storage hierarchically, e.g. a pool of disks that overflow onto a tape library.
Recent research has focused on achieving scalability and performance for distributed file systems by removing the bottleneck of a centralized file server. One approach is to remove the server bottleneck by removing the server. File systems for parallel computers take this approach as does the xFS serverless file system and JetFile system for networks of workstations.
One way of enabling clients to efficiently share storage without centralizing it behind a server is by using a storage area network (SAN). In a SAN, many computers share access to network attached storage devices. For distributed file systems built on a SAN, clients and servers can share access to all file system data. Examples of SANs include Fibre Channel (FC) networks and IBM""s Serial Storage Architecture. SAN""s which make network attached storage devices available on general purpose data networks like Gigabit Ethernet have also been contemplated. A distributed file system built on a SAN can remove the server bottleneck for I/O requests by giving clients a direct data path to disks.
For exactly the same reason that distributed file systems are easy to use, they are difficult to implement. The distributed file system takes responsibility for providing synchronized access and consistent views of shared data, shielding the application from these tasks, but moving the complexity into the file system. One such problem arises when a computer that holds locks to access data becomes isolated. An isolated computer is either in a network partition or has crashed and the locks held by the isolated computer are in an unknown state. The problem arises in that the remainder of the distributed system cannot differentiate between a crash and network partition.
A crashed computer is no longer able to use its locks and the locks can safely be given to other clients. However, unlike a crashed computer, a computer in a network partition may still have access to network attached storage devices and could potentially use its locks. This occurs in particular when message traffic and data traffic are performed on different networksxe2x80x94message traffic on an IP network and data traffic on the SAN. Computers in a partition could still be writing data and giving other clients new locks on this data. This sacrifices data consistency and the structural integrity of an object storage system. One known solution to this problem is to fence the isolated computer, instruct the storage device to no longer take requests from the isolated computer, and steal the computer""s locks, assume them invalid and redistribute them to other computers.
While fencing is a generally accepted solution it can be inadequate for data sharing environments. Computers cache data, and any unwritten data in the cache must reach a storage device to guarantee sequential consistency within a distributed system that shares storage devices. Fencing an isolated computer prevents cache contents from reaching disk, but does not prevent a client computer from continuing to operate on cached data without correctly reporting errors locally.
The present invention provides a lease-based timeout scheme that addresses fencing""s shortcomings. Unlike fencing, this scheme (or protocol) enables an isolated computer to realize it is disconnected from the distributed system and write its dirty data out to storage before its locks are stolen. In accordance with the invention, data consistency during a partition in a distributed system is ensured by establishing a lease based protocol between in the distributed system wherein a client can hold a lease with a server. The lease represents a contract between a client and a server in which the server promises to respect the client for a period of time. The server respects the contract even when it detects a partition between the client and itself.