Data corruption is a major problem in large-scale data storage systems and in data transmission systems. In the short term, the corrupted data cause applications to return erroneous results and may result in the failure of the applications. Over the long term, the corrupted data may be replicated through multiple systems. In many instances, if the corruption is detected and the cause determined, the correct data may be recoverable.
Data corruptions may be detected in various ways. For example, one approach has been to associate integrity metadata, such as data checksums, embedded logical block addresses, etc., with the data on writes and to verify the data using the integrity data on reads. However, while integrity metadata can be used to detect data corruption, it cannot by itself determine the cause of the corruption. For example, if a piece of data does not match its corresponding integrity metadata, either the data or the integrity metadata may be corrupt and, without additional information, it is not possible to determine which item is faulty.
Similarly, data redundancy may be used to detect data corruption, but the same problem arises. When the original data and the redundant data do not match, without additional information, it is not possible to determine which of the two copies is correct.