The disclosed subject matter relates generally to fault tolerant data storage systems and, more particularly, to a data storage infrastructure that facilitates scalable monitoring of data.
Disk drives are designed for data storage and retrieval. Disk drives are becoming less reliable in performing these functions with increasing capacities and higher densities. Disk behaviors contribute to corruption or loss of data stored on a disk drive.
A first type of error may occur during a write operation when the disk arm and head fail to align with accurate precision on a track that comprises the physical data blocks on which the data is to be written. Tracking errors can occur if either the head is misaligned such that the data is written to an unintended track or if the head is misaligned so that the data falls in a gap between two adjacent tracks. A Far Off-track Write describes a situation when two physical blocks are placed in error because the target block is not overwritten and so comprises stale data and the overwritten block has lost the data that should be there. A Near Off-track Write describes a situation when one block is placed in error because the target block is not overwritten.
A second type of error that also occurs during a write happens when target bits are not changed on the disk as the result of the write operation. For example, the preamp signal may be too weak to change the magnetic setting of the bits on the platter. In this case, the data remaining on the platter is stale (i.e., the data was not updated according to the write commands issued to the drive). These errors are called dropped writes because the bits are not recorded on the platter.
Both of the above-mentioned types of write errors are called “Undetected Write Errors” because the disk drops the write data in the wrong location and does not itself detect the problem.
A third type of error is caused by a misaligned head placement when reading data. In this case, the disk may read the data bits from a completely unintended track (i.e., Far Off-track Read) or from a gap between two tracks (i.e., Near Off-track Read) and return incorrect data. Both of these errors are typically transient and are corrected when a subsequent read occurs to the same track. In addition, if the tracks are read correctly but on the unintended target of a Far Off-track Write, incorrect data will be returned.
In all the above scenarios, the drive does not detect a problem and returns a successful status notice. Other error scenarios may also occur where the disk returns a success status, but the user or application gets incorrect data. Such write or read errors can be referred to as Undetected Disk Error (UDE).