A typical network-based storage server provides storage services to one or more clients coupled to the storage server over a network. For example, referring FIG. 1, storage server 10 services client-initiated read and write requests by reading and writing data to the mass storage devices 12 in the storage subsystem 14 on behalf of a client 16. The mass storage devices (e.g., disks) may be organized into groups of redundant arrays generally referred to as RAID groups, or redundant array of independent disks. With RAID-based storage systems, often the system performance is limited by the processing power of the storage server 10 itself. That is, the storage subsystem 14 is often capable of throughput rates that exceed the throughput rate of the storage server 10. Consequently, in an environment such as that illustrated in FIG. 1, the performance bottleneck is often the storage server 10.
Given that a storage server tends to be the bottleneck, one way to improve throughput performance is to connect more than one storage server to a mass storage subsystem (e.g., a set of disks), such that both storage servers are capable of accessing the mass storage devices of the storage subsystem on behalf of clients. Distributed data storage systems in which one or more disks are owned by more than one storage server are known in the art. However, in such systems, complex data locking mechanisms are required to ensure data consistency, e.g., to prevent one storage server from overwriting data that is being accessed by another storage server. Not only is implementing such a locking mechanism difficult, the processing overhead associated with such a mechanism generally has a negative impact on the overall performance, thereby countering the intended effect of the additional storage server (i.e., improved throughput of the overall system).
Also known in the art are cluster-failover storage system configurations. In such a configuration, two or more storage servers, a primary storage server and a secondary storage server, are connected to a particular set of disks, and the secondary storage server can take over for the primary storage server if the primary storage server fails (i.e., a “failover”). However, the secondary storage server does not have access to the set of disks under normal circumstances (i.e., while the primary storage server is operating normally).