This invention relates to information storage and retrieval systems, and, more particularly, to the use of hashing techniques in caching systems.
Techniques for caching frequently-used data have been in use for many decades, and provide fast access to information that would otherwise require long retrieval times or lengthy computation. A cache is a storage mechanism that holds a desired subset of data that is stored in its entirety elsewhere, or data that results from a lengthy computation. Its purpose is to make future accesses to a stored data item faster. A cache is usually dynamic in nature: items stored in it may not reside there permanently, and frequently those items whose future usefulness is questionable are replaced by items predicted to be more useful. Typically, but not exclusively, older items are replaced by newer ones. Successful application of caching, for example, can be found in the routing caches used by Internet servers to provide quick access to network routing information.
Records stored in a computer-controlled storage mechanism such as a cache are retrieved by searching for a particular key value among stored records, a key being a distinguished field (or collection of fields) in a record, which is defined to be a logical unit of information. The stored record with a key matching the search key value is then retrieved. Though data caching can be done using a variety of techniques, the use of hashing has become a popular way of building a cache because of its speed advantage over other information retrieval methods. Hashing is fast compared to other information storage and retrieval methods because it requires very few key comparisons to locate a requested record.
Hashing methods use a hashing function that operates on—technical term is maps—a key to produce a storage address in the storage space, called the hash table, which is a large one-dimensional array of record locations. This storage address is then accessed directly for the desired record. Hashing techniques are described in the classic text by D. E. Knuth entitled The Art of Computer Programming, Volume 3, Sorting and Searching, Addison-Wesley, Reading, Mass., 1973, pp. 506-549, in Data Structures and Program Design, Second Edition, by R. L. Kruse, Prentice-Hall, Incorporated, Englewood Cliffs, N.J., 1987, Section 6.5, “Hashing,” and Section 6.6, “Analysis of Hashing,” pp. 198-215, and in Data Structures with Abstract Data Types and Pascal, by D. F. Stubbs and N. W. Webre, Brooks/Cole Publishing Company, Monterey, Calif., 1985, Section 7.4, “Hashed Implementations,” pp. 310-336.
Hashing functions are designed to translate the universe of keys into addresses uniformly distributed throughout the hash table. Typical hashing functions include truncation, folding, transposition, and modulo arithmetic. A disadvantage of hashing is that more than one key will inevitably translate to the same storage address, causing collisions in storage. Some form of collision resolution must therefore be provided. Resolving collisions within the hash table itself by probing other elements of the table is called open addressing. For example, the simple open addressing strategy called linear probing, which views the storage space as logically circular and consists of searching in a forward direction from the initial storage address to the first empty storage location, is often used.
Another method for resolving collisions is called external chaining. In this technique, each hash table location is a pointer to the head of a linked list of records, all of whose keys map under the hashing function to that very hash table address. The linked list is itself searched sequentially when retrieving, inserting, or deleting a record, and insertion and deletion are done by adjusting pointers in the linked list.
Open addressing and external chaining each enjoy advantages over the other. Though external chaining can make better use of memory because it doesn't require initial pre-allocation of maximum storage and supports concurrency with the easy ability to lock individual linked lists, its individual record access time can be slower because of memory allocation/de-allocation and pointer dereferencing. Furthermore, because successive records in a linked list rarely reside in physically consecutive memory locations, external chaining cannot take advantage of memory paging and physical memory caching.
In the design of routing caches, it is important to protect the system against a security threat known as a denial of service (DOS) attack Attackers could target the routing cache by sending the server carefully crafted service requests aimed at creating excessive collisions, thereby degrading cache storage and retrieval times. (These kind of DOS attacks are called algorithmic complexity attacks.) While there are effective techniques to protect against such attacks in chain hashing, there is a need to implement a data cache that provides the speed of open-addressed hashing while, at the same time, avoiding vulnerability to denial of service algorithmic attacks and allowing maximum concurrent access to records.
Although a hashing technique confined to linear probing for dealing with expiring data is known and disclosed in U.S. Pat. No. 5,121,495, issued Jun. 9, 1992, and can be used to generally reduce the number of probes required to locate a record, that technique suffers from the following drawbacks: it does not limit the number of probes to a predetermined number; and it is confined strictly to linear probing and single-threading, and does not extend to other open-address collision resolution techniques or to multi-threading. Accordingly, there is a need to develop open-address hashing techniques that overcome these inadequacies.