The invention relates generally to disk drive systems and in particular to maintenance on a larger scale disk drive system.
Disk drive systems have grown enormously in both size and sophistication in recent years. These systems can typically include many large disk drive units controlled by a complex multi-tasking disk drive controller such as the EMC Symmetrix disk drive controller. A large scale disk drive system can typically receive commands from a number of host computers and can control a large number of disk drive mass storage units, each mass storage unit capable of storing in excess of several gigabytes of data. There is every reason to expect that both the sophistication and size of the disk drive systems will increase.
As the systems grow in complexity, so also does the user's reliance upon the system, for fast and reliable recovery and storage of data. Thus, it is more than a mere inconvenience to the user should the disk drive system go "down" or off-line; and even should only one disk drive go off-line, substantial interruption to the operation of the entire system can occur. For example, a disk drive storage unit may be part of RAID array or may be part of a mirrored system. The resulting lost time can adversely affect a system throughput performance and perceived reliability. This is true even for normally scheduled maintenance wherein, with advance warning to the user, one or more disk drives can be placed off-line for a period of time.
Many disk drive systems, such as the EMC Symmetrix disk drive system rely upon large standardized buses to connect the host computer and the controller, and to connect the controller and the disk drive elements. Periodically, however, the protocol of the system bus must be upgraded to implement performance improvements, to fix discovered programming errors, and for other normal maintenance reasons. The effect of reprogramming the disk drive communications, for example, using a SCSI bus, can be significant. Having to take the drive off-line, load into it the new SCSI code, and then bring the drive back on-line can take substantial time. During this time, the drive is effectively isolated and unavailable for any other purpose. The result can be a significant disruption to the normal operation and performance of the overall computer system.
Typically, a single maintenance command is directed to a single logical volume on a physical disk drive device. Accordingly, as disk drive storage systems grow in size, and the number of logical volumes on a single physical device increases, it becomes increasing time consuming and cumbersome to provide a single command for each of the disk drive logical volumes. Furthermore, the repetitive nature of the commands sometimes leads to human error where one or more of the logical drives is not provided with the commands, for example when a physical device is to be taken off-line and/or replaced. Accordingly, it is desirable to ensure that such human errors do not occur.