This invention pertains to the repair of one or more failed storage drives in a multiple drive library computer system. Repair of a failed drive occurs while one or more good drives in the library system remain online. In this manner, a host may have continual and uninterrupted access to information stored within the library.
Although the online repairable system, components thereof, and method of online drive repair disclosed herein are especially advantageous in removable media type multiple drive library computer systems, they may also be adapted to other multiple component computer systems as well.
In a multiple drive library computer system, the first component to fail due to excessive use is often a storage drive. Typically, when a storage drive fails, the library undergoes a period of down time while the drive is repaired and/or replaced. During this down time, a host (whether it be a UNIX.RTM. work station, PC network server or otherwise) cannot access any of the information contained within the library.
Since loss of information access is unacceptable, efforts have been made to minimize library down time. These efforts include Redundant Array of Inexpensive Disks (RAID) and hot repair technologies.
The RAID technology involves a method of reconstructing data which is lost in the event of a drive's failure. Data is stored in a redundant array of non-removable media type drives, such that when a failed drive is replaced, an error-correction operation can be applied to the data blocks of the remaining storage drives in the redundant array so that the data which was lost can be reconstructed. Although the RAID method helps to minimize library down time, the method is not applicable to removable media type libraries. In a removable media type library, failure of a drive does not result in loss of data. It only results in loss of access to data.
In a hot repairable system, storage drives are guided on rails into hot repair sockets. A failed drive may be removed from its socket without notification to the library system or a host. As the failed drive is pulled from its socket, a system of long and short pins is used to incrementally disconnect power to the drive, to notify the library system and the host that the drive is no longer available, and to make the necessary changes in system parameters to keep the library system online. However, there is a barrier to implementing hot repair technology in a removable media type library. The barrier exists due to the requirement of a removable media type library that the media inserting faces of its storage drives face a robotic media inserter. In order to keep such a system online, it is necessary to remove storage drives by pulling them away from the robotic inserter. To do this, the electrical connection faces of the storage drives (which in standard drives are opposite their media inserting faces) must be free of blockage (sockets, circuit boards, pins, etc.). To affect a hot repair of a removable media storage drive, a storage drive would need to be constructed wherein its media inserting face and electrical connection face were on adjacent sides, rather than on opposite sides. This is an added expense which is unnecessary in view of the online repairable system and method of online drive repair disclosed herein.
It is therefore a primary object of this invention to significantly reduce the down time of a multiple drive library computer system upon failure of one or more of the library's storage drives. In the case of a removable media type library, it is believed that library down time can be eliminated.
It is a further object of this invention to allow a host to have continual access to one or more good storage drives of a multiple drive library computer system while one or more failed drives of the library are repaired.