A storage server is a computer system that is used to store and retrieve data on behalf of one or more clients on a network. A storage server operates on behalf of one or more clients to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. In conventional network storage systems, the mass storage devices may be organized into one or more groups of drives (e.g., redundant array of inexpensive drives (RAID)).
A storage server may be configured to service file-level requests from clients, as in the case of file servers used in a Network Attached Storage (NAS) environment. Alternatively, a storage server may be configured to service block-level requests from clients, as done by storage servers used in a Storage Area Network (SAN) environment. Further, some storage servers are capable of servicing both file-level and block-level requests, as done by certain storage servers made by NetApp®, Inc. of Sunnyvale, Calif.
A storage server typically provides various types of storage services to networked clients. One useful feature is the ability to back up or mirror a primary storage server to one or more secondary storage servers, so that data stored by the primary storage server is replicated to the secondary storage servers. When a system failure or a disaster prevents data access to the primary storage server, a secondary storage server not only helps to preserve data, but also may act as a substitute for the primary storage server, thus minimizing interruption to data requests.
However, switching data access from the primary storage server to the secondary storage server generally includes multiple actions. Each action must be performed successfully before the switching operation is deemed a success. When a disaster strikes and the actions are performed hastily by a user (e.g. a system administrator), it is often hard to ensure that each of the switching actions is properly and successfully executed. Without a proper mechanism to ensure this, a user may not be confident that all the necessary data are replicated, that the data sources are in a consistent and useful state before the switching operation, and that a business application will be able to resume operation after the switching operation.
To further complicate matters, some of the actions may fail to start, or result in error before completion. In a catastrophic situation, another user might inadvertently retry the failed actions without realizing its consequence. Or, multiple people might be trying to initiate the same switching operation at the same time. All of these scenarios can cause further confusion and delay in the recovery of the data.