With rapid development of information technologies, a data volume explosively increases, and a data storage system that can ensure security, high reliability, high extensibility, and the like of data storage becomes a main research focus in the future.
In a data storage system, to ensure a high-performance write operation including insertion, update, and deletion, an implementation manner used in the prior art is append-only. An append-only manner means that update and deletion of data does not modify existing data, but is similar to an insertion operation, that is, update data and deletion data are also written into a storage medium, and finally, final data is obtained in a data combination manner.
Insertion, update, and deletion of data are generally performed in a form of a file, each file may include multiple records, and a record can be uniquely identified using a primary key of data. Therefore, when a data write operation is performed in the foregoing manner, many records are generated, including an insertion record, an update record, a deletion record, and the like, and when data is queried, records that are not combined are also loaded and queried.
To facilitate data query, in the prior art, generally a Bloom filter is used, and a primary key of data in a file is independently stored in the Bloom filter. The Bloom filter is a random data structure with extremely high space efficiency and uses a bit array to indicate a set. When an element is added into the set, the element is mapped to K locations in the bit array using K hash functions, and bit values corresponding to the K locations are set to 1. When data is queried, whether the Bloom filter stores a primary key of to-be-queried data is queried first. If the Bloom filter stores the primary key of the to-be-queried data, a file corresponding to the Bloom filter is loaded and queried, and if the Bloom filter does not store the primary key of the to-be-queried data, this query ends.
In a process of implementing the present invention, it is found that in the prior art, because a quantity of files is extremely large and a corresponding Bloom filter is generated for each file to store a primary key of data, different Bloom filters are generated. Because when each primary key of data is stored, a hash calculation needs to be performed for multiple times using several hash functions, and a bit value corresponding to a Bloom filter is modified according to obtained hash values, an amount of calculation processing is relatively large, especially when there are a relatively large quantity of primary keys of data in a file, and an extremely large quantity of system resources are occupied.