1. Field of the Invention
The present invention relates generally to redundant storage subsystems, and in particular to portable and fast restart methods operable in redundant I/O path controllers such as are often employed in control of storage subsystems.
2. Background of the Invention
Modern mass storage systems are growing to provide increasing storage capacities to fulfill increasing user demands from host computer system applications. Due to this critical reliance on large capacity mass storage, demands for enhanced reliability are also high. A popular solution to the need for increased reliability is redundancy of component level subsystems. Redundancy is typically applied at many or all levels of the components involved in the total subsystem operation. For example in storage subsystems, redundant host systems may each be connected via redundant I/O paths to each of redundant storage controllers which in turn each may be connected through redundant I/O paths to redundant storage devices (e.g., disk drives).
Redundant I/O paths can take any of a number of forms, including but not limited to SCSI buses, host adapters, or RAID controllers. In a system with redundant I/O paths connecting a storage controller to the storage device(s), there is a control sub-subsystem which manages the redundant paths referred to herein "Redundant Dual-Active Control" (RDAC). An RDAC control subsystem is often a layer of software in a hierarchical layering of control software which provides the interface between the host systems and storage subsystems.
One skilled in the art will recognize that the RDAC layer is a logical component, typically embodied as a software module. The RDAC layer typically operates within either the host system (as part of the operating system) or may be operable within intelligent I/O adapters in the host as well as embedded storage controllers within the storage subsystem. The physical components on which the RDAC layer is operable are not particularly relevant to the layered architecture of which the RDAC layer is a component. It is generally desirable that the RDAC layer operate at a higher level thus enabling it to encompass control of a larger number of I/O path elements in its failure recovery techniques.
Further, one skilled in the art will recognize that the RDAC may be generalized to multiple active controllers rather than merely two or dual active controllers. Additional redundancy and scalability of performance may be achieved through use of multiple active controllers. As used herein, RDAC represents both dual-active and multi-active redundant control systems.
The RDAC layer sends I/O requests to a preferred path of the redundant I/O paths which connect it to the storage devices. Typically, the RDAC layer sends its requests to another lower layer of the software referred to herein as the low level disk driver or disk driver. Once sent to the disk driver, the RDAC layer is free to process other requests while awaiting completion of the first request.
It is frequently the case that the low level disk driver uses a queue structure (referred to herein as a dispatch queue) to provide a buffered interface to the higher level (e.g., RDAC) software layers. The low level disk driver's performance is gated by the speed of the disk drives and is therefore substantially slower than the higher level software components. A dispatch queue is associated with each I/O path within the low level disk driver to buffer its slower operation as compared to the RDAC layer. The RDAC layer transfers requests to the low level disk driver layer which in turn queues the generated I/O requests on the dispatch queue for the desired I/O path if the low level disk driver is not prepared to process the request immediately. The RDAC layer does not therefore have direct access to the dispatch queue. Rather, the dispatch queue is a common mechanism used within the low level disk driver used to buffer requests from the higher level software layers. The low level disk driver performance is gated by the performance of the disk drives themselves and thus operates much slower than may the higher level software layers (e.g., the RDAC layer). These dispatch queues can become quite long. It is possible that there may be thousands of I/O requests waiting in the dispatch queue for processing by the low level disk driver.
A variety of failures could occur such that the RDAC layer might not be able to access the storage device via one of the redundant I/O paths (e.g., via the preferred I/O path). A software failure in the low level disk driver module is one example of such a failure. Or for example a hardware failure might occur in the physical connection from the disk driver module to the disk array. In general, all such failures which render the I/O path unusable to the RDAC layer will be identified herein as I/O path failures. An I/O path which has failed is also referred to herein as a bad path or failed I/O path while an operational I/O path is also referred to herein as a good path or operational I/O path. In general, when the RDAC layer becomes aware of such a failure in an I/O path (the bad path), failed I/O requests are redirected (retried) on the other I/O path (the good path).
The low level disk driver processes the I/O requests and notifies the RDAC of success or failure of each I/O request in due course as each request is processed. In the case of a failed I/O request, the low level disk driver may possibly attempt to process the request several times before sending that I/O request back to the RDAC as a failure. The low level disk driver will then move on to the next I/O request in the I/O path's associated dispatch queue and will attempt to process it before sending it back to the RDAC as another failure. Since the cause of the failures is the I/O path itself, the entire dispatch queue of I/O requests is destined to fail. However each request must wait in the dispatch queue within the low level disk driver for its turn to fail, potentially including a number of retries, and only then be returned by the low level disk driver as a failure to the RDAC layer.
One method for returning the failure status of processed requests is to provide a failed I/O queue filled by the disk driver layer for consumption by the RDAC layer. A failure return status for each processed request which failed to complete is placed in the failed I/O queue by the disk driver. Asynchronously with respect to the disk driver, the RDAC layer processes the failed requests in the failed I/O queue by sending them back to the disk driver via the dispatch queue of redundant good path. The disk driver eventually processes the requeued request on the redundant good path to complete processing of the request.
Under this known method, each failed I/O request must make a potentially time consuming "round trip" through the layered system. First, a request is generated in the RDAC layer and transferred to the low level disk driver level. In the low level disk driver level, the request is placed in the associated dispatch queue for the first of two redundant I/O paths which eventually fails. The queued request must wait for all the I/O requests (potentially thousands) ahead of it to be individually processed and failed by the disk driver. In a situation where there is a significant backlog of I/O requests in the first I/O path dispatch queue, the low level disk driver may require a significant amount of time to complete processing of all failed I/O requests. For each request, detection of a failure may require a number of retries of the I/O request. In the case of a certain types of failures of the I/O path, each retry may require a significant period of time before the failure is sensed (e.g., lengthy timeouts). When an I/O request finally fails, it may then wait in the failed I/O queue until the RDAC can reprocess the failed request by sending it to the low level disk driver's alternate I/O path (the good path). The cumulative processing and delay for reprocessing all failed I/O requests can therefore be quite significant. Thus the restart after failover from a bad I/O path to a good I/O path (the redirection of all failed I/O requests from the bad path to the good path) is slowed considerably. The time necessary for the restart to finish detecting each failure and reprocessing it on an alternate path is simply the time for one failure to be detected multiplied by the number of requests in the low level disk driver's dispatch queue when the I/O path is first detected to have failed.
Some prior solutions to reduce this requeueing time involve a host system based approach to solving this problem such as customizing the low level disk driver to provide special failover features. The host system may for example flush the dispatch queue at the first failure using a special access function (API function) within a customized low level disk driver. The flushed I/O requests are re-routed at the host system to another data path. This approach is dependent upon the host system's low level disk driver having a unique ability to, for example, flush pending I/O requests and thus this solution is unique and therefore non-portable between various host systems.
It is clear from the above discussion that a need exists for an improved method for fast restart of failover of I/O requests from a failed (bad) I/O path to an alternate, operational (good) I/O path. In addition, it is desirable that such a fast failover method be portable so as to be easily implemented within any host system.