As Web usage increases, so does user perceived fetch latency. Distributed Web caching systems reduce fetch latency by maintaining in the cache memory of each Web server information representative of its Web objects as well as that of other, neighboring Web servers. By allowing for retrieval of desired Web objects from a neighbor's cache closer to the client than the original source, such distributed or cooperative Web caching systems reduce user perceived fetch latency.
For such distributed or cooperative Web caching systems to be effective, Web servers must have reasonably accurate information regarding the contents of other Web server caches. One possibility is for Web servers to periodically broadcast a list of their contents to other neighboring Web servers. The natural form of this list would be as a list of Uniform Resource Locators, or URLS, such as “http://www.yahoo.com.” A list written in this textual form might be quite long, as Web servers may cache thousands, tens of thousands, or more Web pages. Moreover, these lists must be broadcast sufficiently often so that they are mostly accurate even as the contents in the Web server caches change over time. Hence this straightforward solution may lead to significant network traffic, undermining the possible advantages of distributed Web caching.
Fan et al. in an article entitled “Summary Cache: a Scalable Wide-area Web Cache Sharing Protocol,” appearing at Proceedings of SIGCOMM '98, (1998: pp 254-265), incorporated herein by reference, disclose a distributed, Bloom filter Web cache server that maintains Bloom filter array data representative of its Web objects, as well as of other, neighboring Web servers, and that periodically broadcasts the Bloom filter array data that represents the contents of its cache to its neighbors whenever sufficiently many changes have occurred since the last broadcast. If a Web server wishes to determine if another neighboring Web server has a page in its cache upon a query miss, it checks the appropriate Bloom filter array data. Message traffic is reduced in Fan et al., since Web servers do not broadcast URL lists corresponding to the exact contents of their memory caches, but rather the succinct Bloom filter array data representative thereof.
The size of the Bloom filter array data is typically determined by the transmission protocol of the communications infrastructure. While the Bloom filter has a zero probability of producing false negatives when queried, it has an optimum non-zero probability of producing false positives for Bloom filter array data of given size. That is, it may incorrectly return that an element is in a set when it in fact is not, which leads to more message traffic and to increased user perceived fetch latency.
There is thus a need to reduce the probability of producing false positives in distributed, summary cache Web servers.