The present invention relates to data storage, and more specifically, this invention relates to data storage system for improved dual controller configurations.
Enterprise storage products are typically subjected to an extremely high reliability and data integrity standards. Current industry standards call for a mean time between failures (MTBF) which corresponds to conventional storage products being in uptime between 99.9999% and 99.999% of the time. Another important factor includes how the storage products behave in fatal failure scenarios which the system was not designed to cope with, such as multiple software nodes or concurrent server failures. Conventional enterprise storage products with capacities in the petabyte (PB) range would take days, or even weeks, to recover from back-up following a fatal failure scenario.
Therefore attempts have been made to utilize recovery tools to repair a storage product and avoid recovering from back-ups. However, repairing a storage product using recovery tools is undesirable as well, as doing so typically result in substantial data and metadata loss. Efforts to overcome this loss by continually back-up the data and metadata to persistent storage has severe performance impact on the system, and in most cases is not even a viable option.
Enterprise storage products are also typically expected to provide a customer with a system that is able to achieve high performance, high resiliency and high availability at a low price point relative to the achieved throughput. In order to meet such standards, a dual controller arrangement may be implemented which is able to provide high performance as long as both controllers are functioning. However, when a failure of either one of the controllers occurs, the system is unable to fully maintain its performance by only using the single remaining controller. Accordingly, a common consideration is whether, after the failure of one of the controllers, the system should continue to serve inputs/outputs (I/Os) as the system is performing without redundancy or backup functionality while only one of the controllers is operational.
It follows that conventional products leave the customer with a tough choice of weighing product downtime during recovery with data loss and reduced performance.