1. Field of the Invention
The invention relates to storage systems.
2. Related Art
Computer storage systems are used to record and retrieve data. In some computer systems, storage systems communicate with a set of client devices, and provide services for recording and retrieving data to those client devices. Because data storage is important to many applications, it is desirable for the services and data provided by the storage system to be available for service to the greatest degree possible. It is therefore desirable to provide storage systems that can remain available for service even in the face of component failures in the storage system.
One known technique for providing storage systems that can remain available for service is to provide a plurality of redundant storage elements, with the property that when a first storage element fails, a second storage element is available to provide the services and the data otherwise provided by the first. Transfer of the function of providing services from the first to the second storage element is called “failover.” The second storage element maintains a copy of the data maintained by the first, so that failover can proceed without substantial interruption.
A first known technique for achieving failover is to cause the second storage element to copy all the operations of the first. Thus, each storage operation completed by the first storage element is also completed by the second. This first known technique is subject to drawbacks: (1) It uses a substantial amount of processing power at the second storage element duplicating efforts of the first, most of which is wasted. (2) It slows the first storage element in confirming completion of operations, because the first storage element waits for the second to also complete the same operations.
A second known technique for achieving failover is to identify a sequence of checkpoints at which the first storage element is at a consistent and known state. On failover, the second storage element can continue operation from the most recent checkpoint. For example, the NFS (Network File System) protocol requires all write operations to be stored to disk before they are confirmed, so that confirmation of a write operation indicates a stable file system configuration. This second known technique is subject to drawbacks: (1) It slows the first storage element in performing write operations, because the first storage element waits for write operations to be completely stored to disk. (2) It slows recovery on failover, because the second storage element addresses any inconsistencies left by failure of the first between identified checkpoints.
Accordingly, it would be advantageous to provide a storage system, and a method for operating a storage system, that efficiently uses all storage system elements, quickly completes and confirms operations, and quickly recovers from failure of any storage element. This advantage is achieved in an embodiment of the invention in which the storage system implements frequent and rapid checkpoints, and in which the storage system rapidly distributes duplicate commands for those operations between checkpoints among its storage elements.