Many organizations have a need to track the number of times an event, such as the appearance of a value, occurs. For example, an organization may wish to determine the number of times a webpage has been viewed, the number of DNS requests made for a particular URL (Uniform Resource Locator), etc. For example, in indexing a large number of documents by phrases it may be beneficial to count the number of times an n-gram of words, also referred to as a shingle, occurs in the content of crawled web pages. An n-gram is a group of n sequential words, where n is a number, usually between 2 and 8, although it can be any number. Such a count may be stored for each event, for example each occurrence of a shingle, and the counts can be used by the indexing system to optimize the index, for example, by locating phrase posting lists for different shingles within the index according the respective counts for each shingle. But for space-critical applications, such as an index of events with hundreds of billions of records or an index of events stored in main-memory, storing the actual number of event occurrences may require too much memory. In such situations the system may store a logarithmic counter that estimates the actual count, instead of a true counter.
A logarithmic count estimates the actual count by an order of magnitude. For example a binary logarithmic count (log 2) with a value of “2” may represent an actual count of a number ranging from 22 (or 4) and 23−1 (or 7). Similarly, a log 10 count with a value of “2” may represent an actual count of a number ranging from 102 (or 100) and 103−1 (or 999). Substantial memory savings can be obtained with higher orders of magnitude. For example a log 2 counter with a value of 15, which can be represented by as few as four bits, corresponds to an actual count ranging between 32,768 and 65,535, which requires at least 16 bits to store. While the use of a logarithmic counter results in a substantial savings in memory, it has limited use for an active counter that must continue to track new occurrences of an event. The limitation arises because when the system encounters a new occurrence of the event, the logarithmic counter cannot just be increased by one. Doing so would result in an increase in an order of magnitude, rather than adding one to the estimated count of events represented by the value of the counter.