1. Field of the Invention
The invention relates generally to high availability storage systems and more specifically relates to use of shared solid-state drives (SSDs) as cache memory for each of multiple storage controllers in a high availability storage system.
2. Discussion of Related Art
High availability storage systems, such as RAID (Redundant Array of Independent Drives) storage systems typically include multiple storage controllers acting in roles such that each controller may assume control from another controller that has failed. All the storage controllers are coupled with a plurality of storage devices (e.g., magnetic, optical, and solid-state storage devices) for persistent storage of user data. Typically the user data is stored in a fashion that provides redundancy information to allow continued operation of the storage system in the event of a failure of one or more of the storage devices (as well as one or more of the storage controllers).
To maintain high performance levels in such storage systems, each storage controller includes a cache memory used by the processor of the storage controller to temporarily store user data until the data is eventually posted or flushed to the persistent storage devices of the system. Write requests received by a storage controller from an attached host system are generally processed by storing the user's data (from the write request) into the cache memory. The write request from the host may be completed quickly after the data is cached. The storage controller may then later post or flush the cached user data to the persistent storage of the system.
In high availability storage systems where each controller may serve as a substitute for a failed controller, the contents of the cache memory of each controller must be available to other controllers to permit the other controller to assume control over the processing of the failed controller. In other words, the cache memory of the various storage controllers must be “synchronized” such that each controller is in possession of the same cached data in case one controller assumes control over the operations of another failed controller. In present high availability storage systems, the cache memory contents may be synchronized among the storage controller by either of two general approaches. In one present practice, the host systems may generate the same write request to multiple storage systems so that each of the multiple storage systems has the same information available. In another present practice, the storage controllers communicate with one another to synchronize cache memory contents so that another controller may assume control from a failed controller.
In all present solutions, the inter-controller communications to synchronize cache memory contents can generate a significant volume of communication overhead. Where a storage system consists of only two controllers (a redundant pair operating either in a dual-active or an active-passive mode), this overhead may be tolerable. However, where a storage system scales up to more than two controllers, the overhead processing and communications to maintain cache content synchronization can be onerous. In addition, where multiple redundant controllers each have private cache memories and they communicate to maintain synchronization, additional problems are presented to determine which cache has the correct data when a failed controller is restored to full operation (e.g., by swapping out the controller, etc.). The communications to update the cache of a replacement controller and to flush data from the private cache memories of the controllers to the persistent storage of the storage devices further over-utilizes the available bandwidth of the switched fabric communications. Still further, other management functions performed by storage controllers in a clustered environment may require similar inter-controller communications and thus add still further to the burden of inter-controller communications.
Thus it is an ongoing challenge to provide for cache content synchronization and other cluster management functions among a plurality of storage controllers in a high availability storage system while reducing overhead processing and communications associated therewith in the storage controllers.