In a clustered network (e.g., a server cluster), multiple entities (e.g., computer systems and applications running on those systems) may each access the same storage volume. Some of those applications—referred to herein as primary applications—are to be made highly available. The rest of the applications are referred to herein as secondary applications.
For any of a variety of reasons, a primary application may become unavailable. For example, the server on which the application is executing may become unavailable, in which case the application is also unavailable until the server is restored to service. To resolve this situation, the application is restarted on a different server as quickly as possible, a process referred to as failover.
To transfer execution of the primary application from one server to another, the storage volume used by that application is dismounted (essentially, the storage volume is taken offline), then remounted (essentially, it is brought online and again made available to the application, now executing on the second server). To dismount the storage volume, an exclusive lock on the storage volume should be acquired. However, it may not be possible to acquire the exclusive lock if one or more of the secondary applications are continuing to access the storage volume.
Conventionally, the shared storage is dismounted forcefully under such circumstances. This often leads to inconsistent data in the file system. As a result, an administrator has to either manually or automatically run a program such as “chkdsk” to identify and repair errors. If chkdsk is unable to repair the storage volume, then the storage volume is not remounted, which increases downtime and hence decreases the availability of the primary application.
In some clustered network implementations, cluster software can be used to control the start and stop of a primary application so that the primary application will not interfere with an attempt to acquire an exclusive lock. However, the secondary applications may not be under the control of the cluster software and so may have open files in the storage volume. Under those circumstances, the secondary applications might fail to dismount from the storage volume or the dismount may be done forcefully, which may result in write errors that in turn might lead to the file system being inconsistent.