RAID is a technology that employs collections of disk drives (herein referred to simply as disks) and allows them to be used as a logical unit. RAID provides greater performance and reliability (e.g., through the use of a protection scheme called “parity”) than a single disk. Data is distributed across the disks and, therefore, the performance of a RAID group is dependent on the performance of each disk. Throughout the description, a RAID group shall interchangeably be referred to as a RAID array. If a single disk experiences performance degradation the RAID group will also have performance degradation. Disks exhibiting poor performance should, therefore, be proactively removed from operation within the RAID group and replaced with a new disk drive.
Conventionally, a large number of statistics are available and collected for determining disk-drive performance and health. These include the industry standard Self-Monitoring, Analysis, and Reporting Technology (SMART) data for Serial Advanced Technology Attachment (SATA) disks, and various log pages for Small Computer System Interface (SCSI) disks. Analyzing this massive amount of statistics, however, can be resource intensive, and often results in false positives. Here, a false positive refers to a disk that has been erroneously identified as having performance degradation when in fact it is in normal operating condition. Thus, there is a need for a simple and efficient mechanism to check disk-drive health that will yield a low-rate of false positives.