The present invention relates generally to compressing data, and more particularly to lossy compression.
In information technology, “lossy” compression is the class of data encoding methods that uses inexact approximations (or partial data discarding) for representing the content that has been encoded. Such compression techniques are used to reduce the amount of data that would otherwise be needed to store, handle, and/or transmit the represented content. The amount of data reduction possible using lossy compression can often be much more substantial than what is possible with lossless data compression techniques.
Using well-designed lossy compression technology, a substantial amount of data reduction is often possible before the result is sufficiently degraded to be noticed by the user. Even when the degree of degradation becomes noticeable, further data reduction may often be desirable for some applications (e.g., to make real-time communication possible through a limited bit-rate channel, to reduce the time needed to transmit the content, or to reduce the necessary storage capacity).
A Bloom filter is a space-efficient probabilistic data structure conceived by Burton Howard Bloom in 1970 that is used to test whether an element is a member of a set. False positive matches are possible but false negatives are not; thus, a Bloom filter has a 100% recall rate. In other words, a query returns either “possibly in set” or “definitely not in set.” Elements can be added to the set but not removed (though this can be addressed with a “counting” filter). The more elements that are added to the set, the larger the probability of false positives.