1. Field of the Invention
The invention relates generally to clustered storage systems and more specifically relates to resuming processing of background tasks after a failover occurs.
2. Discussion of Related Art
In the field of data storage, customers demand highly resilient data storage systems that also exhibit fast recovery times for stored data. One type of storage system used to provide both of these characteristics is known as a clustered storage system.
A clustered storage system typically comprises a number of storage controllers, wherein each storage controller processes host Input/Output (I/O) requests directed to one or more logical volumes. The logical volumes reside on portions of one or more storage devices (e.g., hard disks) coupled with the storage controllers. Often, the logical volumes are configured as Redundant Array of Independent Disks (RAID) volumes in order to ensure an enhanced level of data integrity and/or performance.
A notable feature of clustered storage environments is that the storage controllers are capable of coordinating processing of host requests (e.g., by shipping I/O processing between each other) in order to enhance the performance of the storage environment. This includes intentionally transferring ownership of a logical volume from one storage controller to another. For example, a first storage controller may detect that it is currently undergoing a heavy processing load, and may assign ownership of a given logical volume to a second storage controller that has a smaller processing burden in order to increase overall speed of the clustered storage system. Other storage controllers may then update information identifying which storage controller presently owns each logical volume. Thus, when an I/O request is received at a storage controller that does not own the logical volume identified in the request, the storage controller may “ship” the request to the storage controller that presently owns the identified logical volume.
FIG. 1 is a block diagram illustrating an example of a prior art clustered storage system 150. Clustered storage system 150 is indicated by the dashed box, and includes storage controllers 120, switched fabric 130, and logical volumes 140. Note that a “clustered storage system” (as used herein) does not necessarily include host systems and associated functionality (e.g., hosts, application-layer services, operating systems, clustered computing nodes, etc.). However, storage controllers 120 and hosts 110 may be tightly integrated physically. For example, storage controllers 120 may comprise Host Bus Adapters (HBA's) coupled with a corresponding host 110 through a peripheral bus structure of host 110. According to FIG. 1, hosts 110 provide I/O requests to storage controllers 120 of clustered storage system 150. Storage controllers 120 are coupled via switched fabric 130 (e.g., a Serial Attached SCSI (SAS) fabric or any other suitable communication medium and protocol) for communication with each other and with a number of storage devices 142 on which logical volumes 140 are stored.
FIG. 2 is a block diagram illustrating another example of a prior art clustered storage system 250. In this example, clustered storage system 250 processes I/O requests from hosts 210 received via switched fabric 230. Storage controllers 220 are coupled for communication with storage devices 242 via switched fabric 235, which may be integral with or distinct from switched fabric 230. Storage devices 242 implement logical volumes 240. Many other configurations of hosts, storage controllers, switched fabric, and logical volumes are possible for clustered storage systems as a matter of design choice. Further, in many high reliability storage systems, all the depicted couplings may be duplicated for redundancy. Additionally, the interconnect fabrics may also be duplicated for redundancy.
While clustered storage systems provide a number of performance benefits over more traditional storage systems described above, the speed of a storage system still typically remains a bottleneck to the overall speed of a processing system utilizing the storage system.
For example, if a first storage controller processing a background task encounters a failure and stops functioning, it may be necessary to transfer ownership of a logical volume controlled by the storage controller to a second storage controller. The second storage controller may then initiate processing of host I/O requests directed to the logical volume. Further, because the second storage controller does not have access to the failed storage controller, the second storage controller may have to re-start, “from scratch,” the background task that was initiated by the first storage controller. Restarting a background task may therefore waste significant processing already performed by the failed controller and further extend the time to complete the background task. A background task could comprise, for example, a patrol read of a storage device, an expansion of a logical volume, or an initialization related to a logical volume. If the logical volume is a RAID volume, the potential background tasks could further comprise consistency checks for the logical volume, a rebuild for the logical volume, a copy back for the logical volume, or a migration of the logical volume from one RAID level to another.
Thus it is an ongoing challenge to increase the performance of clustered storage systems as related to processing of background tasks.