Bloom filters provide a space efficient way to store data that can be used to test whether an element is a member of a set. A Bloom filter may comprise a bit array of m bits. One or more hash functions k may be used to map a given item or a corresponding one or more locations in the array. For example, an element A may be mapped to a filter location by computing the hash of the element A modulo the size of the array. As an element is added to the set, the corresponding bits may be set, e.g., by changing an initial/default value of “0” to “1”.
When a Bloom filter is used to determine membership in a set, false positives are possible, since for two or more different items the respective hash values modulo the array size may be the same. However, false negatives are not possible, since if the element is already a member of the set the corresponding bit(s) in the filter would be found to have been set.
In some applications, a Bloom filter may be used to determine whether an element is already in a set. If the filter result is positive, a further query, e.g., of a database table, may be performed to determine conclusively whether the element is in the set. If the filter result is negative, the database query does not need to be performed.
Typically, for an array of a given size, the probability of false positives increases the more elements that are added to the set. Typically, the false positive probability increases at a specific, calculable rate. The false positive rate can be reduced by increasing the size of the array, but typically resizing requires that the entire filter be rebuilt, e.g., by iterating over the elements in the set to populate the newly-resized filter array. For a set having a very large number of elements, the time, computing, and other resources required to rebuild the filter after resizing may be prohibitive.