In the field of network data storage, a clustered data storage system links multiple controllers to provide redundancy of storage access. FIG. 1 shows an example of a clustered data storage system. As shown, the system includes multiple controllers 1. Each controller 1 is coupled locally to a storage facility 2. The storage facility 2 is managed by each of the controllers 1. The storage facility 2 may be, for example, one or more conventional magnetic disks, optical disks such as CD-ROM or DVD based storage, magneto-optical (MO) storage, or any other type of non-volatile storage devices suitable for storing large quantities of data. The storage facility 2 can be organized as one or more Redundant Array of Independent Disks (RAID) groups, in which case each controller 1 accesses the storage facility 2 using an appropriate RAID method.
A controller 1 receives and responds to various read and write requests from a host (not shown in FIG. 1), relating to volumes, Logical Unit Numbers (LUNs), files, and/or other logical containers of data stored in (or to be stored in) the storage facility 2. If one of the controllers 1 fails, another controller 1 can take over for the failed controller 1 without sacrificing performance.
Each of the controllers 1 is also coupled to a quorum device 4 via a network 3, which may operate based on a conventional protocol such as InfiniBand or Fibre Channel, Internet Protocol (IP), or other protocol(s). A quorum device, such as quorum device 4 in FIG. 1, is a device that stores state information regarding the cluster and each controller 1 in the cluster, including identification of nodes in the cluster, which nodes are active (or should be active), which nodes are not active, etc. In a conventional technique, the quorum device 4 is implemented with a disk-based storage device (“disk”). Data stored on the disk is frequently updated to reflect the current status of each controller 1. However, each access to a quorum disk is time consuming, such that accesses to the disk decrease the clustered storage system's throughput and performance.
In addition, in one known technique a clustered storage system such as shown in FIG. 1 uses non-volatile random access memory (NVRAM) to ensure data integrity in the event of a failure of one or more controllers 1. Each controller 1 has an NVRAM to store a log of each write request and the associated write data received at the controller 1 from a host. Such log data, which is sometimes called “NVLog”, is also transmitted by the controller 1 that creates it to one or more other controllers 1 in the cluster. Thus, each controller 1 has a local copy of the NVLog of another controller 1. Therefore, if one of the controller 1 fails before completion of a particular write request, a different controller 1 can re-submit the write request to ensure data integrity, by using its local copy of the failed controller's NVLog.
However, transmitting each write request between the controllers 1 in order to share NVLogs incurs substantial network traffic, thus decreasing the clustered storage system's throughput and performance. Even in the case of a very simple clustered system that has only two controllers, a first controller's sending a write request and data to the other controller doubles the load on the first controller in connection with that write request. In a larger cluster, the load would increase geometrically with each additional controller to which the first controller needs to send its received write requests and data.