Data deduplication, also referred to as intelligent compression or single instance storage, is a storage technology for automatically searching for duplicate data, retaining only one copy of same data, and replacing another duplicate copy with a pointer pointing to the unique copy, so as to eliminate redundant data and reduce a storage capacity requirement.
A metadata querying solution in an existing data deduplication technology includes the following: a metadata cache (Metadata Cache), a Bloom filter (Bloom Filter), a full index table (Full Index Table), and a container (Container), where the Metadata Cache is used to cache metadata; the Bloom Filter is used to filter new data blocks to reduce the number of times disk is accessed; the Full Index Table is used to index a storage position of metadata in the disk; and the Container is used to store a data block and metadata after deletion of duplicate data.
An existing metadata querying process is as follows: a metadata cache is first searched for a piece of metadata (Metadata) to be queried; if the same metadata is found in the cache, a block corresponding to the metadata is a duplicate block; if the same metadata is not found in the cache, the Bloom filter is searched; if the same metadata is not found in the Bloom filter, the corresponding block is a new block; if the metadata is in the Bloom filter, the index table is searched for a corresponding container; if the corresponding container is found in the index table, the corresponding block is a duplicate block, and all metadata in the corresponding container is loaded into the cache.
However, the index table in the prior art is a full index table including indices of metadata of all blocks. Such an index table occupies extremely large space, and therefore, storing it on a disk leads to a large number of disk IO operations. As a result, querying performance is degraded.