Computer clusters are an increasingly popular alternative to more traditional computer architectures. A computer cluster is a collection of individual computers (known as nodes) that are interconnected to provide a single computing system. The use of a collection of nodes has a number of advantages over more traditional computer architectures. One easily appreciated advantage is the fact that nodes within a computer cluster may fail individually. As a result, in the event of a node failure, the majority of nodes within a computer cluster may survive in an operational state. This has made the use of computer clusters especially popular in environments where continuous availability is required.
Single system image (SSI) clusters are a special type of computer cluster. SSI clusters are configured to provide programs (and programmer's) with a unified environment in which the individual nodes cooperate to present a single computer system. Resources, such as filesystems, are made transparently available to all of the nodes included in an SSI cluster. As a result, programs in SSI clusters are provided with the same execution environment regardless of their physical location within the computer cluster. SSI clusters increase the effectiveness of computer clusters by allowing programs (and programmers) to ignore many of the details of cluster operation. Compared to other types of computer clusters, SSI clusters offer superior scaleablity (the ability to incrementally increase the power of the computing system), and manageability (the ability to easily configure and control the computing system). At the same time, SSI clusters retain the high availability of more traditional computer cluster types.
As the size of a computer cluster increases, so does the chance for failure among the cluster's nodes. Failure of a node has several undesirable effects. One easily appreciated effect is the performance degradation that results when the work previously performed by a failed node is redistributed to surviving nodes. Another undesirable effect is the potential loss of a resource, such as a filesystem, that is associated with a failed node.
Node loss can be especially serious in SSI clusters. This follows because resources can be transparently shared within SSI clusters. Sharing of resources means that a single resource may be used by a large number of processes spread throughout an SSI cluster. If node failure causes the resource to become unavailable, each of these processes may be negatively impacted. Thus, a single node failure may impact many processes. Resource sharing also increases the likelihood that a process will access resources located on a number of different nodes. In so doing, the process becomes vulnerable to the failure of any of these nodes.
In SSI clusters, filesystems are one of the most commonly shared resources. Thus, filesystem reliability and fault-tolerance is especially important to the operation of SSI clusters. Filesystem performance is also important to the operation of SSI clusters. To increase filesystem performance, it is often necessary to aggressively cache filesystem data at each node where the filesystem is used. Caching at each node is complicated by the need to maintain the consistency of the filesystem and the filesystem caches at each node. This is especially true in the event of node failure.
Distributed caching is further complicated by the possibility that files may be shared, or opened for modification on more than a single node. Obviously, shared files must be seen consistently at each node within the cluster. One method for providing this type of consistency is to prevent shared files from being cached. This means that shared files are always directly accessed as part of the filesystem where they are located. This technique provides the required consistency, but limits filesystem performance.
Based on the foregoing, it is clear that there is a need for techniques that balance the need to achieve high-performance filesystem operation and the need to provide fault-tolerance. This is especially in true in the case of files that are shared or opened for modification on multiple nodes.