Data storage systems in general are arrangements of hardware and software that typically include one or more storage processors coupled to arrays of non-volatile data storage devices, such as magnetic disk drives, electronic flash drives, and/or optical drives. The storage processors service host I/O operations received from host machines. The received I/O operations specify one or more storage objects (e.g. logical disks or “LUNs”) that are to be written, read, created, or deleted. The storage processors run software that manages incoming I/O operations and performs various data processing tasks to organize and secure the host data that is received from the host machines and then stored on the non-volatile data storage devices.
In an active/active data storage system, two separate nodes operate to concurrently receive and process host I/O operations that are directed to a single storage object. Load balancing may advantageously be performed in order to spread host I/O operations evenly between the two nodes. Host machines are able to access the storage object through two different access paths, and high availability is provided since one node can continue to receive and process host I/O operations directed to the storage object even in the case where the other node has failed or become unreachable. Host write I/O operations may be mirrored between the two nodes, such that any write I/O operation received and performed by one of the nodes is also mirrored to the other node.
During operation of an active/active data storage system, many types of events occur that require event handling that must be synchronized between the two nodes. Such events include, for example, failure of a data storage device. In the case of detecting the failure of a data storage device, examples of actions that must potentially be synchronized between the two nodes may include i) stopping subsequent host I/O operations from being received and processed, ii) allocating storage on one or more of the remaining data storage devices to be used to replace the storage on the failed data storage device, iii) updating the contents of one or more mapping tables to indicate the newly allocated storage, iv) restarting receipt and processing of host I/O operations, and v) rebuilding the data from the failed device onto the newly allocated storage.