Upgrading disk drive firmware for devices that are part of a RAID subsystem is often a costly and time-consuming process. During the firmware upgrade process, which may take up to one minute, disk drives are generally unable to respond to other I/O requests, rendering the device unusable to all system (including host-based) I/O. One approach to maintaining data availability during a firmware download is to place the device being upgraded into an unusable state within the logical volume; that is, to induce a degraded mode of operation. During the upgrade process, the changed data for the affected drive is first logged to a repository within the system. Following the firmware upgrade, the data that has been changed during the upgrade is copied from the repository to the drive that was upgraded.
If an unrelated drive within the same logical volume fails while the original drive is having its firmware upgraded, the volume data is no longer available. The logical volume is left in an off-line state, and the user must intervene in some fashion in order to reestablish data availability. Moreover, for volumes configured with redundancy, there is no guarantee that data and its associated redundant data are valid and consistent once recovery has been completed.
Existing solutions to this problem include: (1) preventing system input/output (I/O) during drive firmware upgrades; and (2) copying all data from the drive to be upgraded to a stand-by, spare drive in the system before the download is begun. The stand-by then serves as a replacement for the original drive during the firmware download. The affected volume is still optimal and thus its data is protected from a single drive failure. At this point, the desired drive can have its firmware upgraded without affecting data availability on the original volume. Once the upgrade is complete, data can be copied from the stand-by replacement to the original drive. When the copy is complete, the upgraded drive can be re-integrated into the original volume.
The first solution suffers from the fact that it requires that the storage system be taken off-line from the server's perspective. For customers that require continuous uptime, this solution is unacceptable. The second approach requires all of the data on the affected drive(s) to be copied twice; first to the stand-by spare drive, and subsequently to the original drive. For large-capacity drives, such copy processes can be time-consuming.
Accordingly, it is an object of the present invention to provide a method for recovering from an unrelated disk failure within a logical volume during the period in which another of the volume's disks is temporarily unavailable.
Another object of the invention is to provide a method for recovering from an unrelated disk failure within a logical volume during the period in which another of the volume's disks is in the process of having its firmware updated.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.