Data backup is critical to storage systems. In backup systems, however, data deduplication technology is a newly introduced technical solution for reducing storage space. The data deduplication technology mainly consists of the following aspects: splitting a data object into non-overlapping data chunks; generating for each data chunk an identification (ID) based on its content; if a certain data chunk has a unique identification, i.e. the data chunk with the unique identification has not been stored yet, then storing the data chunk on a physical storage device; if an identification of a certain data chunk is the same as an identification of a data chunk previously stored on a physical storage device, then discarding the certain data chunk and only storing a pointer pointing to the data chunk with the same identification.
As seen above, the data deduplication technology can significantly reduce the required storage space for backup. However, data deduplication gives rise to the risk that data loss spreads rapidly. For example, when one data object suffers damage, other data objects referencing data chunks in the one data object will suffer damage too. Such chain damage prevents the data deduplication technology from being effectively implemented, and accordingly advantageous effects of the technology cannot be achieved.
To reduce the risk that data loss spreads and bring data deduplication technology into play, there is proposed in the prior art a compromised technical solution, i.e. generating an integral number of copies of data segments based on the reference count. That is, at least one copy should be stored for a data segment; if some data segment is referenced for multiple times, then a plurality of copies will be stored so as to protect this important data segment. By means of this technical solution, after a stored data segment is damaged, since copies of the data segment have been saved, all data objects referencing the data segment will not be damaged, and further the risk that data loss spreads is reduced. However, considering one or more copies should be stored for each data segment, this technical solution will consume lots of storage space, thereby lowering down the utilization rate of storage space. As a result, the effect that data deduplication reduces the storage space requirement cannot come into a full play.