The present invention relates to storage arrays, and more specifically, this invention relates to identifying a lethargic drive within a storage array.
RAID (redundant array of independent disks) is a data storage approach that combines multiple storage drives into a single logical unit for purposes of increasing performance and/or data reliability. However, when a single drive in the array is slower than the others, it may negatively affect the performance of the whole array.
Modern RAID hardware receives a chain of operations that directs the storage and/or retrieval from individual drives of an array. When a chain of operations takes longer to complete than expected, it may not be possible to observe and time the operations directed to individual drives and determine whether a specific drive is responsible for the delay. Further complexity may arise when performance assists allow operations to be chained for update writes. Any drive in the chain may be the cause of the overall slow I/O.
Accordingly, isolating a lethargic drive (i.e., a drive that has a response time above an established norm) in a RAID configuration may be problematic. While drives that cause timeouts are easily identified because the failed command indicates the failing drive, a lethargic drive may be hidden by the RAID configuration.
In some systems, performance statistics may be accumulated for an array as a whole, so individual drive data is lost. Further, average response time data may be available, but lethargic drive responses may be concealed in an average response time view.