The present disclosure relates to data storage, and more specifically, to a system and method for the reuse of a problematic disk within a Redundant Array of Independent Disks (RAID) data storage system.
A RAID data storage system can be used to form a group of hard disks, or a logical hard disk by combining multiple independent physical hard disks or disk drives in various configurations. Such configurations can provide higher data storage performance than can be obtained from a single hard disk. A hard disk group or logical hard disk can also be used to provide backup capability for data. In a RAID data storage system, a backup hard disk can be referred to as a “redundant” disk, and the hard disk for storing data for a computer user can be referred to as a “working” disk. When the working disk encounters a data storage error, which can include a hard disk failure and/or disk “event”, it may be isolated from the RAID data storage system, and the RAID data storage system can rebuild data contained on the working disk onto the redundant disk. The working disk can be replaced by the redundant disk.
Working disk errors or events may be classified or described as “hard data storage errors”, “media errors” and “slow disk errors”. A hard data storage error can result from a serious hardware failure of the disk mechanism itself, such as magnetic head, drive motor, or electronic component failure, through which the disk mechanism may become completely inoperable. These types of errors may be classified or described as “irrecoverable” errors. A media error can include some types of recoverable data storage errors, such as a certain sector of the disk becoming unusable. A backup redundant sector can subsequently be used to store data formerly stored on an unusable disk sector. The term “slow disk errors” can refer to certain data storage errors related to disk software which are not caused by a hardware or recordable media failure. However, the existence of both media errors and slow disk errors may cause a reduction of disk performance, and thus can cause a performance reduction of the entire RAID data storage system. Therefore, in a high-end data storage system, in order to maintain the performance of the data storage system, when the count of media errors and slow disk errors of a working disk reach a certain threshold, that particular working disk can be isolated, i.e.; logically or physically disconnected, or removed from the RAID data storage system.
According to embodiments of the present disclosure, in a high-end storage system, a working disk that is isolated due to media errors and/or slow disk errors can be effectively reused without being placed in a “failure state” to wait for a pending repair action.