In order to eliminate single points of failure, redundant components are often included in data storage systems. Redundant components allow a system to continue to operate, despite a component failure. For example, multiple array controllers are often included in storage arrays. Each of these array controllers can be configured to automatically take over the operations of a failed array controller.
When a storage system includes redundant components, the redundant components often present several different paths to a storage device. For example, if a storage array has multiple controllers, each controller can be associated with a path (or a set of paths, if multiple ports are connected to each controller) to a storage volume implemented within the storage array. Dynamic Multipathing (DMP) techniques (typically implemented in software) allow a host to detect and use these different paths when accessing the storage device.
The manner in which redundant paths to a storage device can be used varies depending on whether the devices that are associated with those paths are configured to allow active/active access or active/passive access. If active/active access is provided, Input and/or Output (I/O) operations to the storage device are allowed via paths associated with different devices simultaneously (e.g., paths associated with different storage array controllers can be active at the same time). If active/passive access is provided, I/O operations to the storage device are only allowed via one controller at a time (i.e., at any given time, one or more paths associated with one controller are active while paths associated with all other controllers are passive). If active/active access is allowed, DMP software executing on a host can distribute the host's I/O operations to the storage device over multiple paths. In both active/active and active/passive mode, DMP software can detect when errors are encountered on a path and retry failed I/O operations on the remaining paths.
In active/passive mode, the DMP software on a host will retry an I/O operation on a passive path if an error is detected on the active path. In order for the I/O operation to be performed via the passive path, either the paths need to provide auto-trespass functionality or the DMP software needs to know the vendor-specific failover command necessary to initiate a failover from the active path to a passive path. Since DMP software is often designed for use in heterogeneous environments with a variety of different vendors' products, the latter solution is often undesirable because it introduces a substantial amount of implementation dependence into what is intended to be implementation-agnostic DMP software.
When active/passive access is provided, devices (e.g., hardware and/or software components) that are associated with different paths to the storage device often include auto-trespass functionality. Auto-trespass functionality provides these devices with the ability to automatically failover from a device associated with the active path to a device associated with a passive path in response to a host sending an I/O command to the storage device via the passive path. For example, two storage array controllers are each associated with a respective path to a storage device, and the two storage array controllers can coordinate to provide active/passive access to the storage device via their respective paths. At any given time, one storage array controller is active and the other is passive. If a host sends an I/O command via the passive path associated with the passive storage array controller, the passive storage array controller will automatically switch roles with the active storage array controller.
Occasionally, problems can arise due to the use of auto-trespass mode in systems having multiple hosts. In particular, if one host detects an error on the active path and retries the I/O operation on the passive path, the controller defining the passive path will initiate a failover in order to become the active controller. If the hosts do not coordinate among each other, another host may continue to assume that the formerly-active path is still active, and that host may then initiate an I/O operation by sending an I/O command via that path, which is now passive. This may lead to another failover, back to the originally-active path. This pattern can continue, such that each time a different host accesses the storage device, another failover between paths is initiated, causing the active role to be transferred in a “ping-pong” manner between the controllers that are associated with the paths. Each failover may take a significant amount of time, and thus several successive failovers may have a detrimental effect on performance.