Many organizations have a need to track the number of times a unique value occurs for another given value (i.e. a key value). For example, to detect malicious Domain Name Service (“DNS”) requests, it may be desirable to know how many unique IP addresses are requesting a particular Uniform Resource Locator (“URL”). Alternatively, to assist in the creation of an index for searching document repositories, or to create search results for an Internet search it may be beneficial to count the number of unique URLs for documents that contain an n-gram of words (also referred to a shingle). An n-gram is a group of n sequential words, where n is a number, usually between 2 and 8, although it can be any number. In order to count unique items, a system must remember what items have already been encountered. For example, to determine whether a specific shingle has been seen in contents for a document associated with a URL, the system needs to determine if the URL has already been seen and counted for that shingle.
One method of counting unique items is to store each item as it is encountered with the key in an index table, so that the table includes a row for each item-key pair. But this method requires a large amount of storage. To reduce the amount of memory required to store a row in such a table, some systems may store a fingerprint of the item in the index. A fingerprint is a much smaller unique code generated from a larger data item. For example, a fingerprint of a few bits may be generated from the characters comprising a URL. However, for an index with hundreds of billions of records, even the memory savings of using a fingerprint may be inadequate because each key value will still require one record in the index for each unique fingerprint.