1. Field of the Invention
The present invention relates to a technology for controlling disk drives, and in particular to a technology for controlling read checking of disk data with reduced overhead.
2. Background Information
In certain arrangements of Redundant Arrays of Inexpensive Disks (RAID arrays), facilities are provided for rebuilding data from a failed disk using data from other disks in the array. This is usually achieved by distributing (striping) copies of data from each disk across the other disks in the array, so that it can be retrieved and assembled together on a spare disk if a disk fails.
During RAID array rebuilds, failing hardware and firmware may encounter one or more further errors that prevent array rebuilds from completing and that may result in loss of access to data. When such a double fault occurs within a RAID array, the second fault is typically not discovered until the recovery action to rectify the first fault is implemented.
In a RAID array, when an array member disk is lost due to a hardware or software failure, an attempt is made to rebuild it with a ‘spare disk’, assuming of course that one is available. This reconstruction is achieved by reading data from the existing available disks. However if in the meantime a logical byte address (LBA) on another disk within the same array has also become corrupted but gone unnoticed—this will cause a problem and is often referred to as a “silent error”.
One existing technique to reduce the likelihood of this silent error is an action called “data scrubbing”. Depending upon how data scrubbing or any other active data integrity tool has been set up, the error may still not be found until a read is attempted, for the first time, to that particular LBA as part of the attempt to rebuild the array. This ‘double hit’ means that the array rebuild for the data from that particular LBA area cannot be successfully completed and the data is lost.
A conventional data scrubbing operation as known in the art is instigated by the host operating system and is set up by the customer and tailored for the system's individual needs. In most cases this is configured to run on either on a daily, weekly or monthly basis. In the worse case scenario, LBAs may thus only be checked every 30 days—assuming, of course, that data scrubbing is activated at all.
It would thus be desirable to have a technology for controlling disk drives, and in particular a technology for controlling read checking of disk data with reduced overhead.