Data deduplication (briefly referred to as deduplication) is also referred to as intelligent compression or single instance storage, and is a storage technology capable of automatically searching for duplicate data, reserving only a unique copy of the same data, and replacing other duplicate copies with a pointer that points to a single copy, so as to eliminate redundant data and reduce a storage capacity demand.
In a data deduplication solution in the prior art, received data is partitioned to obtain data blocks, and then the data blocks form several data segments, an eigenvalue of each data segment is obtained through calculation by using a certain method, and a data segment is represented by an eigenvalue that is obtained through calculation. The eigenvalue of the data segment is matched with an eigenvalue of data stored in a system, a storage area to which a storage address points is used as a similar storage area, where the storage address corresponds to an eigenvalue in the system obtained through matching, data in the similar storage area is loaded into a cache, and duplicate data query is performed on the received data.
The inventor finds in research that, in existing data deduplication, for example, data received for the first time is stored as new data; when data received for the second time changes relative to the data received for the first time, changing data is stored separately as new data; when data received for the third time and the data received for the second time are the same, data that is the most similar to the data received for the third time is probably still the data received for the first time; and in this way, relative to the data that changes for the first time, it is still considered that changing data is new data and is stored, while actually, the changing data has already been stored, and therefore it can be seen that in deduplication processing in the prior art, the more the stored data is, the more the storage areas where the data is dispersed are, and the whole deduplication performance is reduced.