With development of information technologies, a volume of data that needs to be stored increases rapidly. A deduplication technology is introduced to relieve a conflict between an infinitely increasing data volume and relatively limited storage space.
In specific implementation, the deduplication technology mainly includes the following steps:
Step 1: A storage device divides one data stream into data blocks, and specifically, using a fixed-length chunking algorithm or a variable-length chunking algorithm.
Step 2: The storage device calculates a fingerprint of each of the data blocks, where a fingerprint is also referred to as a characteristic value.
Step 3: The storage device compares the fingerprint of each of the data block with fingerprints of unique data blocks (also referred to as non-duplicate data blocks) that have been stored in the storage device; and performs step 4 when the fingerprint of a data block of the data blocks is the same as a fingerprint of a unique data block that has been stored in the storage device; performs step 5 when the fingerprint of the data block of the data blocks is different from the fingerprints of the unique data blocks that have been stored in the storage device.
Step 4: The storage device does not store the data block any longer, and increases a reference count of the unique data block that has been stored in the storage device and that has the same fingerprint as that of the data block by 1.
Step 5: The storage device sequentially stores, in the order of logical addresses (LA) of the data block, the data block in physical address (PA) of a data container of the storage device as an unique data block, sequentially stores, in the sequence of the logical address of the data block, metadata of the fingerprint of the data block in physical addresses of a fingerprint container of the storage device, generates address identifier of the metadata of the fingerprint, establishes mapping between the address identifier of the metadata of the fingerprint and the metadata of the fingerprint, and performs step 6. Metadata of the fingerprint of the data block includes the fingerprint of the data block and a physical address in which the data block is stored. An address identifier of metadata of the fingerprint may be a physical address in which the metadata of the fingerprint is stored. In another implementation manner, an address identifier of metadata of a fingerprint may be a logical identifier that uniquely identifies the metadata of the fingerprint. Specifically, the storage device may allocate a globally unique identifier to metadata of a fingerprint corresponding to a unique data block, and address identifiers of metadata of fingerprints of multiple unique data blocks whose logical addresses are contiguous increase linearly. The mapping between the address identifier of the metadata of the fingerprint and the metadata of the fingerprint is established, so that the metadata of the fingerprint can be loaded for a fingerprint query in a subsequent deduplication operation.
Step 6: The storage device establishes mappings between the logical addresses of the data blocks and the fingerprints and establishes mappings between the fingerprints and physical addresses in which the unique data blocks are stored. For a storage device having a deduplication function, it needs to be ensured that a unique data block stored in the storage device can be accessed by using a logical address, and it also needs to be ensured that a fingerprint corresponding to the unique data block is deleted after the stored unique data block is deleted. Therefore, in the storage device having the deduplication function, each of a logical address of a data block, a fingerprint of the data block, and a physical address that is of the unique data block and that is corresponding to the fingerprint is indispensable in a mapping.
However, although continual deduplication performed on stored data by the storage device saves physical space of the storage device, a large quantity of mapping relationships established by the storage device in step 6 occupy large memory space of the storage device.