As the demand for storage continues to grow, larger and more sophisticated storage systems are being designed and deployed. For example, in the High Performance Computing (HPC) community, many sites have deployed distributed file systems containing tens of thousands of disk drives and tens of petabytes of storage. The increase in the number of components and the volume of data results in a significantly increased likelihood of data corruption.
Typically, a distributed file system is made up of a collection of servers that are presented as a single large file system to clients. Each server within the distributed file system stores data or metadata in a locally consistent backing file system that is invisible to the users. In this case, the distributed file system uses checksums of data to detect corruption that occurs during transmission over the network from the client to the storage servers. Further, the backing file system may also perform a separate checksum to detect on-disk corruption.