The primary function of a storage subsystem is to provide storage to an application running on a server. In the simplest arrangement the server has only one means of accessing the storage, and the storage device contains no redundancy. This is generally not a very satisfactory solution, as if the access method fails, or a part of the storage device fails, the server can no longer access the stored data.
A better solution achieves higher availability by using a SAN based storage subsystem in which the storage device is part of a storage subsystem that presents data from the storage device at a number of ports on the SAN. This is generally achieved by having multiple controllers within the subsystem and arranging that each controller can present the data from the storage device to the SAN.
In another improvement, the application host uses a multi-path processor to manage different paths to the same storage device. Multiple paths use multiple ports in the server to access a single or multiple storage controllers. However, it is possible to have multiple paths when there is (a) one port in the server and more than one in the storage controller(s) or (b) more than one port in the server and one port in the controller.
A typical SAN system 10 includes a server 12 and a storage subsystem 13 is shown in FIG. 1. The server 12 includes an application host 22 and multi-path processor 24 accessing a storage logical unit (LUN) 18 provided by the storage subsystem 13 with two controllers 14A and 14B.
The application host 22 may access the LUN 18 via a combination of paths 20A, 20B, 20C and 20D. A multi-path processor 24 in the host 22 chooses to use one or more of these paths via one of two ports 26A and 26B. If a path fails, or a controller in the storage subsystem fails, then the multi-path processor can use an alternative path. When the path or controller is restored, the multi-path processor can consider re-using this restored path or controller. The multi-path processor generally polls the missing path or controller to establish if and when it has become available again. If there are many paths between the host and storage subsystem, and there are many LUNs being presented to the host, it can take a considerable time to poll all the potentially available paths to see if any ‘lost’ paths have come back.
A controller reset can arise in two different circumstances: a controller failure and an intentional controller reset. From the multi-path processor's point of view there is no reasonable way of predicting that an internal controller failure will occur before it actually happens. If on the other hand the controller is being reset prior to a firmware upgrade or other maintenance action, then the multi-path processor will only know a reset will occur if the reset originates from the server. Therefore, without up-to-date knowledge of the controller status the multi-path processor will not efficiently choose the best paths for sending data to the LUN.
Conversely, there are a number of problems in restoring full redundant use of these controllers after a controller is reset. For instance, it may take some time for the multi-path processor to realize that a path has been restored to a storage controller that has recently been reset. Even when a path has been restored, the multi-path processor may not be able to use it because the controller is not yet ready to accept data (e.g., it may not have resynchronized its cache with its partner). For these and other reasons, there is a need for the present invention.