Conventional RAID storage devices commonly function as either a target device or as an initiator, or primary device. Remote Volume Mirroring (RVM) is one example where the RAID storage array is used as an initiator. RVM is a feature used in the field to protect information stored on the storage array using real-time data replication to an off-site storage system. RVM can support both synchronous and asynchronous data transfers. Synchronous mirroring provides continuous mirroring between primary and remote volumes to ensure absolute synchronization. A write command being sent by the initiator will only get the status when the write has successfully written on both the primary and remote storage array. Asynchronous mirroring updates data between a primary device and a target device and is updated at periodic intervals.
Another example where a RAID storage array can be used as an initiator is where multiple RAID storage arrays are clustered together. Each storage array in such a configuration receives an input/output (IO) request and automatically forwards the IO request to the appropriate storage array which owns the volume.
Three scenarios commonly occur when using a clustered configuration. First, in the event that a user uses a slower performing remote storage array, the IO requests can be held up on the primary storage array for several minutes until the remote storage array successfully processes the IO requests. Second, in the event that one storage array reboots, the IO requests will stall at the primary storage array. The initiators talking to the primary storage array are not informed of the remote storage array rebooting. The initiators will continue to send IO requests during the reboot. Third, in the event that a user starts an event on the remote storage array which causes the array to perform slower, the IO requests will stall at the primary storage array until the event finishes.
Depending on the particular operating system used, and the different IO drivers, the initiator allows a command timeout of between 30 and 300 seconds. A typical timeout value is 60 seconds. When an IO request times out, the initiator will have to perform an appropriate clean up operation. A clean up operation involves aborting the IO request, then retrying the aborted IO request. Since the IO request is already queued in the storage array RAID engine, the aborted IO request is marked aborted, but will wait in the queue to get processed. When the initiator retries the IO request, the retrieved IO request will have to wait again on the storage array. Over time, the situation gets worse and the storage array will eventually come to a halt. When this happens, the application running on the initiator will encounter an IO error due to the IO timing out.
There are several factors affecting the performance of a storage array. Some of these factors include (i) the total number of volumes implemented, (ii) the amount and type of IO requests being sent, (iii) the type of drives (e.g., SAS, SATA, FC, solid state, etc.), (iv) the storage array controller cache size, (v) the host and drive channel speed, and/or (vi) the background activities (e.g., data scrubbing, dynamic segment sizing, volume copy, etc.)
Existing solutions to such problems include (i) increasing the command timeout at the initiator IO driver and application, (ii) making sure that the primary and remote storage arrays have the same performance capabilities and have the same configuration, and/or (iii) reducing the queue depth on the initiator.
There are several disadvantages to the existing solutions. For example, most host applications will not be able to tolerate a higher command timeout. Command timeouts are not tunable for some applications and/or IO drivers. Existing solutions do not provide a flexible storage solution to the end user if there is a requirement that both the primary and the remote storage arrays are of the same performance capabilities and configuration.
Furthermore, there are several disadvantages of reducing the queue depth on the initiator. For example, for storage array vendors to recommend a queue depth to the end user is almost impossible because there are so many factors contributing to the inter-array performance issue. There is no one number to provide to the user. Even assuming that there is a number, when a user decides to add an additional initiator to the storage array network (SAN), the user will have to re-tune the queue depth on every initiator to accommodate for the new addition.
It would be desirable to implement a system for handling input/output requests between storage arrays with different performance capabilities.