Information about the availability of resources in a communication device is currently used to optimize exchange of digital resources between communication devices.
This is for example the case for caching web proxies that work cooperatively. Each caching web proxy caches a set of digital resources in cache memories and shares its cache information with the other caching web proxies. Then, when a caching web proxy receives a request for a digital resource not stored in its cache memories, it can use the information about the availability of resources in the other caching web proxies to select one of them to handle the request, avoiding requesting each of the other caching web proxies.
Information about the availability of resources in a communication device is generally implemented through a Bloom filter.
A Bloom filter is a compact data structure for a probabilistic representation of a set of elements. In the example above, the elements are the digital resources hosted by the communication device.
Bloom filter theory is for example disclosed in “Less hashing, same performance: Building a better bloom filter” (A. Kirsch and M. Mitzenmacher, Random Structures & Algorithms 33, no. 2 (2008), pp. 187-218).
A Bloom filter representing a set enables to check whether an element is member of the set. For an element member of the set, the Bloom filter will always return the correct result, i.e. that the element is a member of the set. That means that false negatives are not possible. However, for an element not member of the set, the Bloom filter may wrongly return, with a low probability, that the element is a member of the set. That means that false positives are possible. The probability for a Bloom filter to return a false positive is its error rate.
A Bloom filter of size m is composed of k hash functions having values in the range [0 μm−1] and of an array of m Boolean values. m should be greater than k, generally m>>k. For example m=18 and k=3.
To store an element, such as a digital resource, in the Bloom filter, the k hash functions are applied to it to obtain k hash values v1, . . . , vk. Then, for each value vi, the corresponding Boolean value in the array is set to ‘true’, i.e. the Boolean value having the index vi in the array: in other words, the Boolean value at index v1 in the array is set to ‘true’; the Boolean value at index v2 in the array is set to ‘true’; and so on.
Another element may be added to the Bloom filter by again setting the Boolean values at index vi (calculated for that specific element) to ‘true’. Of course, some of these Boolean values may already be set to ‘true’ due to the previous addition of others elements.
Based on a Bloom filter so constructed, the presence of an element in the set represented by the Bloom filter can be tested. To do so, the k hash functions are applied to the element to test, to obtain k hash values v1, . . . , vk. Then, for each value vi, the corresponding Boolean value in the array (Boolean value at index vi) is retrieved.
If all Boolean values at index vi (i=1 . . . k) have the value ‘true’, then the Bloom filter returns that the element to test belongs to the set it represents. However, there is a probability that this result is false: a Bloom filter can return a false positive. This is because the Boolean values at index vi (i=1 . . . k) for the tested element can have been set to ‘true’ to represent some elements members of the set. In general, with a correctly configured Bloom filter, the rate or probability of false positive is quite low.
Otherwise (if any one of these Boolean values is not ‘true’), the Bloom filter returns that the element to test does not belong to the set its represents. In such a case, the result is always correct: a Bloom filter never returns a false negative. This is because if at least one of the Boolean values is not ‘true’, then the tested element has not been added to the Bloom filter.
An issue pointed out by the invention regards the probability of false positive, i.e. of error that an element is detected to be in the set whereas it is not in the set.
A mathematical study of Bloom filters show that the probability of false positive for a Bloom filter storing n elements is roughly equal to:p≈(1−e−kn/m)k 
This formula enables computation of the optimal number k of hash functions to use for minimizing this value when n and m are given:
  k  =            m      n        ⁢    ln    ⁢                  ⁢    2  
Using this value, the probability of false positive can be estimated as:p≈2−k 
For example, when using 10 bits per element (n elements represented by n*10 bits or Boolean values), the number of hash function should be chosen close to:k=10 ln 2≈6.93
Using k=7 hash functions leads to a false-positive rate of:p≈2−7≈0.008
This means that with 10 bits per element, the false-positive error rate is below 1%.
As introduced above, Bloom filters are used by cooperative caching web proxies to share their caching content. The publication “Cache digests” (A. Rousskov and D. Wessels, Computer Networks and ISDN Systems 30, no. 22-23 (1998), pp. 2155-2168) gives an overview of this sharing of caching content.
Each caching web proxy creates a cache digest, i.e. values resulting from hash functions, which represents the content of its cache memories using a Bloom filter.
The caching web proxies share their cache digest with the other cooperative caching web proxies.
On receiving a request for a resource not in its cache memories, a caching web proxy uses the cache digest of the other caching web proxies to check whether any of them has that requested resource in its cache memories. If so, the request is sent to the caching web proxy caching the requested resource.
In this way, the number of requests between proxies is greatly reduced, saving network bandwidth.
A false positive happens with a probability of p. A false positive means that a caching web proxy sends a request to one of the other caching web proxies for a resource that the latter does not store in its cache memories.
Even if false positives consume bandwidth, the overall result is still a large decrease of bandwidth usage.
Publication U.S. Pat. No. 7,937,428 discloses a system and a method for generating and using a dynamic Bloom filter. In detail, several cascaded Bloom filters are used to represent a set of elements, wherein each time a new element is added in the set, it is also added to a current Bloom filter of the several Bloom filters. As a Bloom filter false-positive error rate grows with the number of elements it represent, a Bloom filter is considered full when the number of elements it represents reaches a predefined limit, which corresponds to its error rate reaching a corresponding limit. When the current Bloom filter is full, an additional Bloom filter is created and becomes the current Bloom filter for new elements to add to the set. Given that several Bloom filters then coexist, any request for an element involves checking the current Bloom filter and the previous Bloom filters.
The inventors wished to apply the above Bloom-filter-based sharing of cache information to the “Push” model of communication, in particular to the SPDY protocol.
On the Web, the usual paradigm is the “Pull” model whereby a client device, such as a web browser, requests a main digital resource, such as an HTML web page, from a web server device and receives the requested main resource in response.
The client device then parses the received HTML web page to discover which secondary resources referenced therein (e.g. images, scripts, etc.) are required for fully rendering the web page. It then requests them from the server device and upon receiving the requested secondary resources, it displays the entire web page.
However, new technologies, such as SPDY (standing for “SPeeDY”) improving the well-known HTTP protocol (standing for “Hypertext Transfer Protocol”) for sending web pages over the Internet, have emerged which also provide the above-mentioned “Push” model.
SPDY makes it possible for the server device to push resources to the client device, on its own initiative, over the same network connection as initiated by the original client request. This makes it possible for the server device to push the secondary resources referenced in the main resource requested by the client device, before the latter discovers they are needed.
Thanks to the SPDY push of resources, web pages can be loaded faster and their rendering be obtained faster.
Sharing cache information of the requesting client device is also an issue to avoid bandwidth waste, by enabling the server device to reduce the resources to push: the server will only push resources not yet in the client cache memories.
It is therefore wished to provide information about the client device's cache memories, using for example a Bloom-filter-based representation, to the server device.
For example, when requesting a main resource from a given web server device, the client device can search its cache memories for all the resources already received, in particular those received from that given web server.
The client device then creates a Bloom filter for representing those resources and sends the created Bloom filter array within the request to the web server device.
From the received Bloom filter array, the web server device is then aware of which resources are already available in the client device and which ones need to be sent to it, with a degree of certainty limited by the probability of false positive.
In the above example of a main requested resource and secondary resources referenced therein, the server device can determine which secondary resources are not yet available in the client device using the Bloom filter array and decide to send those not yet available resources to the client device.
To give an exhaustive explanation, a false positive means that the server device will not push a resource (such as a secondary resource) the client device does not have in its cache memories.
This transposition of the Bloom-filter-based sharing of cache information to SPDY to optimize push of secondary resources does not appear fully satisfactory.
In particular, initial experiments have shown that the naïve implementation of the Bloom-filter-based sharing is only partially successful. While Bloom filters provide a compact representation of the client device's cache memories, the experimental false positive rates are much higher than expected. This is probably due to the small number of resources that are processed. Indeed, the approximations given above for computing the probability of false positive are liable to be no longer valid for small numbers.
In addition, the false-positive rate is the same for all the resources, in particular independent of the importance of each resource. Consequently, not sending a needed secondary resource of high importance (even of critical importance) for rendering the main resource statistically happens as often as not sending a needed secondary resource of normal importance for rendering the main resource. This is clearly not satisfactory, and how to reduce the false-positive rate for important resources should be sought, for example to improve the web page downloading time, or at least the user's perceived downloading time.
The present invention has been designed to overcome at least one of the above drawbacks, in particular to provide more efficient cache information, i.e. a more efficient data structure representing the availability of resources at a communication device.
A straightforward possibility is to increase the precision m of the Bloom filter. This is possible in some cases, when the size of the client device's cache memories is sufficiently small. But in other cases, this would make the client request prohibitively large since the increase of the accuracy would also apply to resources of normal importance, with very little benefit.
The present invention has been devised to address at least one of the foregoing concerns, in particular to provide an augmented data structure representing the availability of resources at a communication device with reasonable and controlled increase of size.
The present invention may apply to the Push model of SPDY but also to any case where cache information representing the availability of resources in cache is generated.