The growing importance of operations such as packet-content inspection, packet classification based on non-IP headers, maintaining flow-state and the like has led to increased interest in the networking applications of Bloom filters. Bloom filters provide a relatively easy method for hardware implementation of set-membership queries. However, the Bloom filters only provide probabilistic test and membership queries that can result in too many false positives.
A Bloom filter is a randomized, memory efficient data structure for performing membership queries. Let S be a set comprising n keys, where each key represents a search term such as might be found in a data packet or datagram. The Bloom filter stores this set in a bitmap (filter) with m bits by hashing each element in S into the bitmap using k independent uniform hash functions h1, h2 . . . hk. A bit in the filter is set to one if and only if one or more keys hash to that location in the bitmap. To check whether some term x belongs to the set, the term x is hashed by computing h1 (x), h2 (x), . . . hk(x) such that the term x is declared a member of the set S if all the bits associated with the computed hashes of x are set to one in the filter. Assuming uniform hash functions it easy to determine the probability that a query membership test will result in a false positive.
In a paper written by Lumetta and Mitzenmacher (S. Lumetta and M. Mitzenmacher, “Using the power of two choices to improve Bloom filters”, Preprint version available at http://www.eecs.harvard.edu/michaelm), for each key one of c sets of hash functions is selected so that the number of ones in the Bloom Filter is reduced. Using this methodology, each query has to hash using all c sets of hash functions and the query is declared to be in the set if it passes any one of these c sets of hash functions.