In modern storage systems which comprise multiple disks, disk failure is a common scenario. Disk failure can result from a variety of reasons such as damaged media, protocol failure, internal software errors, mechanical problems, etc. Modern storage systems are adapted to address disk failure situations and the resulting data corruption by different approaches such as implementing data protection schemes (e.g. RAID 5 and RAID 6), self-healing mechanisms, data redistribution, proactive diagnosis procedures such as SMART and more. However, in many cases, disk failure results from a persistent cause which, if left untreated, continues to cause the disk to fail.
In some cases a disk can be fixed simply by “power-cycling” the failing disk, which restores the disk to proper operating mode once the disk, after its power is cut off, is turned on again. For this reason, in case of disk failure, it can be sometimes beneficial to attempt to fix a failing disk by operating a disk power-cycle rather than invoking other data recovery mechanisms, which consume more resources and do not eliminate the actual cause of the disk failure.
The Symmetrix 2.5 (developed by EMC2® corporation) which was implemented with parallel Small Computer System Interface (SCSI) communication protocol, featured a host bus adapter (HBA) with reset and power-cycle capabilities, which enabled to reset and/or power-cycle an HBA along with all the disks which were connected to the HBA. In the Symmetrix 2.5 system a single HBA is connected to 4 disks. Thus, in case a single disk fails, all 4 disks which are connected to the HBA are reset, and all 4 disks are unavailable during the reset process.
In a power-cycle process a disk is cut off from its power source (i.e. turned off) and then reconnected to the power source (i.e. turned on). In a reset process the disk remains connected to the power source while some of its systems are reinitialized. For example, in a reset process, non-volatile memory associated with the disk can be flushed. However, in case disk failure is caused by software failure, it is likely that a reset command would fail as well, and therefore, in such cases reset is inadequate.
Furthermore, unlike the Symmetrix 2.5 system, in some modern storage systems a single HBA can be connected to tens and sometimes hundreds of disks. Therefore power-cycling the HBA, or otherwise power-cycling all disks connected to a single HBA, is very inefficient in such storage systems where a single failing disk would cause a large number of disks to become unavailable (even if temporarily).
Today a common approach to addressing this problem is by power-cycling the disk manually, for example by disconnecting a malfunctioning disk from its respective bay in the enclosure, and then reconnecting the disk.