Blade-based computing systems are increasingly becoming popular in data center deployments. Blade-based systems provide efficient utilization of floor space, ease of installation and management, improved RAS (Reliability Availability and Serviceability), reduction in cabling requirements, integrated networking, and integrated storage.
In one approach to blade-based computing systems, storage controllers are packaged into blade form-factors and integrated into the same enclosure as server blades. Storage devices such as disk drives are housed in a separate enclosure. To further improve packaging densities and to realize a complete system in a frame (i.e., a datacenter in a box) methods are being investigated to package disk drives into blades. This blade-packaging scheme for a bladed storage subsystem provides a complete solution for medium sized configurations, achieves high density in drive packaging, and minimizes cabling requirements.
For example, two or more disk drives are mounted on a tray that is inserted into a canister, an enclosure that houses the trays and a connection interface. Additional trays are also inserted into the canister. This configuration, with multiple drives per tray, achieves higher drive density and efficient utilization of the available space along the depth of the frame as opposed to housing only a single drive per tray.
A bladed storage subsystem comprising more than one disk drive per tray presents a problem in removal and replacement of failed drives. Removing a tray comprising two or more disk drives to replace one failed drive implies that functional drives are also being removed from the storage system. A proposed solution requires the use of higher fault-tolerant RAID codes (for example RAID 6 or RAID 51) that can tolerate the removal of all the drives on a tray. However, in this solution, some schemes (such as RAID 51) do not have high storage efficiency. Other schemes (such as 3-fault-tolerant schemes) exhibit an increased write penalty. Furthermore, some RAID schemes (such as RAID 6) may not be able to support configurations with three or more drives per tray.
Another proposed solution requires relocating all data on the tray with the failed drive onto a spare tray before removing the tray with a failed disk drive. However, relocating all data to a spare tray requires that the service action cannot be performed until the lengthy relocation operation is completed. Furthermore, spare trays may not be available in all configurations.
Accordingly, a solution is required to facilitate removal and replacement of a single failed drive from a tray holding more than one drive that does not affect the storage efficiency, allows the maintenance action to take place after a drive failure is detected without waiting for completion of a lengthy operation such as a data relocation, does not limit the number of drives per tray, and does not depend on the availability of spare trays.
What is therefore needed is a system and an associated method for servicing storage devices in a bladed storage subsystem. The need for such a solution has heretofore remained unsatisfied.