As data structures that can be used for testing existence of a data, a Bloom Filter and a counting Bloom filter are known.
To start with, the Bloom filter will be explained referring to FIGS. 1A and 1B.
As illustrated in FIG. 1A, the Bloom filter is a bit array of m bits (m=18 in FIG. 1A), for example. When the Bloom filter used, k hash functions (k=3 in FIG. 1A) are also prepared (defined). Moreover, each hash value obtained by each hash function is associated with one bit of the Bloom filter. Then, with respect to each existing data, k hash values of the existing data are obtained using the k hash functions, and “1” is set to a bit (initial value of which is zero) associated with each obtained hash value in the Bloom filter.
At the time of testing existence of a given data (denoted hereinafter as the judgment target data), k hash values of the judgment target data are obtained using the k hash functions. Then, if any of the bits associated with the obtained hash values is zero, the judgment target data is judged to be a new data. If not, the judgment target data is judged to be a new data or an existing data.
Thus, by using the Bloom filter, it is possible to judge existence of the judging target data (whether the judgment target data is a new data or a data that can be a new data) rapidly in a way that requires a small storage capacity. The Bloom filter is, however, a data structure from which the registered information cannot be deleted.
Specifically, in FIG. 1A, the bits of the Bloom filter which represent the existence of data x are the 2nd, 6th and 14th bits from the left. Therefore, for the purpose of deleting the data x from the data set {x, y, z}, it is conceivable to change each of the 2nd, 6th and 14th bits of the Bloom filter into zero. However, if such update is made to the Bloom filter, as shown in FIG. 1B, it follows that the data w, which is the same as the data z contained in the data set, is judged to be a new data. Since such situations extinguish the feature of the Bloom filter that false negative results are not yielded, deleting the registered information from the Bloom filter is impossible.
Next, the counting Bloom filter will be discussed referring to FIGS. 2A and 2B.
The counting Bloom filter is an improved version of the Bloom filter which allows the registered information to be deleted. As schematically shown in FIG. 2A, the counting Bloom filter (which will be hereinafter termed the CBF) has such a configuration that can store n (≧2) bits information per hash value.
Contents of the CBF are updated as follows.
In the case of registering information of an existing data: To each of the storage areas associated with the hash values obtained from the existing data, “1” is added.
In the case of deleting information of a data: From value of each of the storage areas associated with k hash values obtained from the data to be deleted, “1” is subtracted.
Specifically, in FIG. 2A, the bits of the Bloom filter which represent the existence of data x are the 2nd, 6th and 14th bits from the left. Therefore, when deleting the data x from the data set {x, y, z}, the value of each of the 2nd, 6th and 14th bits from the left of the CBF is decremented by “1”. As a result, the situation of the CBF will be changed into the situation illustrated in FIG. 2B, i.e., the situation which causes that data identical with any data (w in FIG. 2B, for example) within the data set {y, z} not to be wrongly judged to be a new data.
Thus, the CBF has the feature to delete the registered information. However, required memory size to actualize (implement) the CBF is n times as large as that for the bloom filter.