Some data storage systems avoid storing duplicate data segments to use the available storage space more efficiently. This is particularly applicable for backup systems. A data set of data files is transmitted from a source system to a storage system. At some point, data files in the data set are broken into segments. To make storage more efficient, a reduced number of copies of a segment that appears multiple times in a data set are stored in the storage system along with location information indicating how to reconstruct the original data files in the data set. There are points in the segmenting, transmission, and other processing steps of the files in the data set where corruption can occur. However, traditional means of verifying a data file would require reconstructing the file from the data segments. It would be valuable if it could be determined efficiently that no corruption has occurred so that the data files in the data set could be recovered without any errors.