Data deduplication (deduplication for short) is also called intelligent compression or single instance storage, and is a storage technology that can automatically search for duplicate data, only reserve a unique copy for same data, and use a pointer pointing to a single copy to replace other duplicate copies, so as to meet requirements for eliminating redundant data and reducing storage capacity.
Cluster data deduplication (cluster deduplication for short) refers to a technology that organizes multiple deduplication physical nodes to improve deduplication performance and capacity. In the cluster deduplication technology in the prior art, generally, a physical node receiving a data stream divides the data stream into several data blocks, groups the obtained data blocks, and for each group, samples a part of metadata information from metadata information of data blocks in the group and sends the part of metadata information to all physical nodes in a cluster system for a query; each physical node in the cluster system stores a known data block and corresponding metadata information, compares the sampled metadata information with the metadata information stored in each physical node, obtains a target physical node having the most duplicate data blocks from a query result, and then sends all data block information of a data group corresponding to the sampled metadata information to the target physical node for a duplicate data query.
Through the research, the inventor finds that: In the cluster deduplication technology in the prior art, the sampled metadata information needs to be sent to all the physical nodes for a query, which leads to a large number of times of interactions between the physical nodes in a deduplication process, and in the case that there are many physical nodes in a cluster system, when each physical node performs deduplication, a calculation amount is increased with an increase of the number of physical nodes in the cluster system, which leads to degradation of deduplication performance of the system.