There is an increasing need for higher resilience and availability in storage systems. One such solution is RAID along with synchronous replication between one or more storage systems. The RAID logic at the primary and secondary storage system provides protection against disk failures, and the replication between the primary and secondary storage systems protects against a total failure of the primary storage system. While such an arrangement is very popular, there are many drawbacks associated with such a system. First, duplicate copies of both the RAID hardware and the physical disks are required. Second, data center costs including foot print and energy costs are also increased.
Accordingly, the deployment of dual redundant storage servers is becoming very attractive. Here two controllers are typically housed in the same RAID enclosure and share the same set of physical disks. Thus, the RAID offers redundancy against disk failures and the duplicate set of controllers protects against loss of availability should one of the two controllers fail.
However, while such a system solves many of the problems described above, it also is associated with its own drawbacks. For example, in such systems one controller is typically the primary controller and the other is the secondary controller. The primary controller is pushed to serve all of the I/Os while the secondary controller is only used in the event of a controller failure. Thus, the secondary controller is wasted while the primary controller is overworked.
To provide better usage of both controllers, one solution is to partition the physical disks of the RAID into different volumes or volume sets. The primary controller serves a first set of volumes and the secondary controller servers a second set of volumes. In another solution, the volumes of the physical disk are used to create a virtual volume where each controller serves a different subset of the I/Os received for the virtual volume. In the event of a controller failure either of the controllers would serve the entire virtual volume.
However, in such implementations, even though both controllers are active, each controller still only works one independent set of RAID disks at a given time. For example, a first controller may work on a 7 disk RAID-5 with one hot spare, and a second controller may work on another 7 disk RAID-5 with another hot spare. As a result there is a significant waste of disk space. Continuing the example above, the 16 physical disks will only provide 12 disks of storage capacity. An optimal solution using the 16 disks would be to use a 15 disk RAID-5 with a single hot spare. However, such a solution requires distributed locking and clustering, which is very difficult to implement using two controllers.