In peer-to-peer networks and other contexts in which data is transmitted over a computer network, it may be desirable to test for a presence (or lack thereof) of desired data, before actually transmitting the desired data over the network. In this way, for example, network capacity and other network resources may be conserved, and a speed with which desired data is identified and retrieved may be increased. For example, a Bloom filter may be utilized to support the handling of queries which seek to determine whether or not a particular data item is included within a larger dataset.
More specifically, a Bloom filter is a data structure which is designed to include an indicator of existence of a corresponding data item(s) in a corresponding dataset. Thus, a query seeking a particular data item may consult a corresponding Bloom filter to determine whether the desired data item is included within a corresponding dataset. In particular, for example, the Bloom filter may be transmitted to, and stored at, the source of the query. Then, the Bloom filter may be utilized by the query to determine whether the desired data item is present at the remote storage site of the larger dataset. In this way, only queries which are capable of being satisfied using the corresponding dataset need be transmitted to the remote storage site.
Bloom filters are relatively small in size compared to their corresponding dataset, so that fewer network resources are required to transmit a given Bloom filter as compared to its corresponding dataset. Moreover, as previously mentioned, network resources may be conserved by the elimination of the transmission of a significant number of queries which would not have been satisfied in any event. Still further, such Bloom filters are known to provide very fast execution of membership queries, as well as very fast updates or other modifications to the Bloom filter itself.
Bloom filters, however, are prone to providing false positives, thereby erroneously indicating that a specified data item is present within a dataset, when in fact the data item is not present. Consequently, e.g., in the example scenarios described above, such false positive results may result in unnecessary and wasteful transmissions of queries across the network. Moreover, the rate of false positives in a given Bloom filter is generally inversely related to the size of the Bloom filter, so that for the same size datasets a larger Bloom filter may provide fewer false positives than relatively smaller Bloom filters. However, use of larger Bloom filters may mitigate the advantages described above with respect to conservation of network resources and transmission and storage of such Bloom filters. Moreover, conventional Bloom filters must generally be sized at the time of creation and therefore may be difficult or impossible to increase in size without recreating the desired, larger Bloom filter in its entirety. Thus, for these and other reasons, it may be difficult to utilize Bloom filters to facilitate and optimize network queries in a manner which is efficient, dynamic, and convenient for users of such Bloom filters.