Some data storage systems attempt to store data segments with no redundancy in order to efficiently use the storage space available. This is particularly applicable for backup systems. The data segments are identified by content derived identifiers derived from the data in the data segments. An example of a content derived identifier includes a hash calculated on the data segment. The content derived identifiers are stored, and used for retrieving the data segments when the original data stream is reconstructed. Since identical data segments result in the same identifier, a duplicate data segment can be identified and prevented from being stored again. Because the content derived identifiers usually have fewer bits than the corresponding data segments, it is possible for two non-identical data segments to have the same identifier, resulting in a collision that may lead to a unique data segment not being stored and thereby not being able to be recovered in the future. Also, specific content derived identifiers may have known non-identical data segments with the same identifiers thus opening up the possibility of malicious data corruption. It would be valuable to be able to detect collisions so that a unique data segment would be recoverable in the future. Furthermore, it would be valuable to report that a collision occurred so that system administrators can assess the collision resistance of the system.