In computing, silent data corruption is the problem of receiving corrupted data from a storage device, such as a hard disk drive, where the operating system and/or application receiving the data is unaware of the fact that the integrity of the received data has been compromised. As used herein, a “storage device” is a device used to store data in a computing system, and can include devices such as hard disk drives, solid state drives, smart cards, floppy drives, etc. A storage device may be a logical drive, which can be spread across multiple physical drives and/or take up only a portion of a physical drive. Corruption of the data could happen due to one or more problems, such as transmission errors over a physical link to storage device, or a bug in firmware on the storage device.
Computing systems have implemented techniques to discover corrupt data that is received from a storage device. For example, an operating system may compute a checksum, such as a cyclical redundancy check (CRC), for each block written to a hard disk drive storage device. The checksum can be kept in volatile system memory. Alternatively, the operating system may pass the checksum to the hard disk drive as metadata for each block along with each data write request. As yet another alternative, the operating system may store data on one hard disk drive, and store checksums for the data on a separate hard disk drive.
Upon receiving the data back from the storage device in response to a read request, the operating system can compute a new checksum from the received data. The new checksum can be compared to the previously-stored checksum. The two checksums indicate data integrity (i.e., that there was no corruption of the data) if the checksums match (i.e., they are the same to the extent they are expected to be the same to validate data integrity).