1. Field of the Invention
This invention is related to the field of filesystems and, more particularly, to handling record locks implemented by network file systems in a highly available environment.
2. Description of the Related Art
One approach to providing access to a filesystem for networked computer systems is a network filesystem. In a network filesystem, a server exports a filesystem for use by clients. A filesystem may be referred to as “exported” if the filesystem has been made available, according to the network filesystem protocol, for clients to access. The clients import the filesystem, and present the filesystem to the user in a fashion similar to the local filesystem on the clients. Examples of network filesystems are the Network Filesystem (NFS), the Andrew Filesystem (AFS), the common internet file system (CIFS), etc.
An issue that may arise with network filesystems is the support for record locks. A record lock is a lock initiated by a client on a region of a file, which may comprise a portion of the file, the entire file, and sometimes a region of storage beyond the end of the file. The region (or “record”) may be any contiguous set of bytes in the file (or beyond the end of the file, in some cases). Record locks may be of various types (e.g. a shared lock that permits other clients to read the file but prevents write access, or an exclusive lock that prevents read or write access by other clients). NFS record locks may be unmonitored or monitored locks. Unmonitored locks are not monitored for crashes/reboots of the server of the filesystem on which they were created nor for crashes/reboots of the client that created them. Monitored locks are monitored for such crashes/reboots. Monitored locks may be recovered (that is, reestablished prior to granting new locks) when crashes/reboots are detected. Record locks may generally be used to synchronize the access of unrelated processes to a particular file (or record). Record locks may be more succinctly referred to herein as “locks”. Additionally, a lock may be referred to as “on a filesystem” if the record locked by the lock is on a file stored in the filesystem.
NFS-based systems implement record locking using two additional protocols to the NFS protocol: NFS Lock Manager (NLM) protocol and Network Status Monitor protocol (NSM). The NLM protocol specifies a locking model, and the NSM protocol is used to notify clients and servers of the loss of lock state (e.g. due to a crash or reboot of a client or server). Thus, both NLM and NSM are used for monitored locks.
Generally, the server maintains the details of the locks granted by the server to various clients in volatile memory (for performance reasons). Accordingly, when a server crashes or reboots, the details of the locks are lost (e.g. the record that is locked, the type of lock, the client holding the lock, etc.). However, the server also maintains a list, in nonvolatile memory, of which clients have locks in any of the filesystems served by that server. If the server crashes or reboots, the client list is read after the server is brought back up and the clients are notified of the server crash/reboot. The clients are given a period of time (a “grace period”) to reestablish (reclaim) their previously existing locks before new locks may be granted.
In highly available (HA) environments, a cluster of computer systems (nodes) are used to provide the HA services. If one of the nodes crashes, the node or a service executing on the node experiences some other sort of failure, or even just to balance load among the nodes, services may be “failed over” to another node. That is, the service may be stopped on the original node (which stoppage may already have occurred if the original node has crashed) and is started on another node with the service's state at or near the state existing on the original node at the time of the failure.
If the network filesystem is to be HA, each exported filesystem needs to be part of an HA service that should be able to be failed over from one node to another. Thus, the record locks for the failing-over filesystem need to be recovered on the node that the filesystem fails over to. However, recovery is typically initiated by the network filesystem software when it is restarted as part of booting the crashed/rebooted server. For example, in NFS, two software daemons (lockd and statd) are implemented as part of the NLM and NSM protocols. The lockd and statd daemons register and provide remote procedure call (RPC) locking services. The lockd daemon is part of the NLM implementation, while the statd daemon is part of the NSM implementation. When these daemons are restarted, recovery of the locks on the server is initiated using the list of clients in that server's non-volatile memory. When a filesystem is failed over to a “failed to” node, the lockd and statd daemons are already operating on the “failed to” node. Additionally, the list of clients that had locks on the filesystem is on the server from which the filesystem is failing away. Furthermore, the “failed to” node may be serving other filesystems for which the lock recovery is not needed. Currently, when NFS is operating in an HA environment, the statd and lockd daemons are restarted on all nodes when any filesystem fails over. Thus, all locks are reclaimed (even those that were not lost in the fail over). Furthermore, the lock services on the “failed to” node and the node from which the filesystem is failing are interrupted for other filesystems being served on those nodes.