The present invention relates to computer-based information storage and retrieval systems, to the use of hashing techniques in such systems, and more particularly, to the use of hashing techniques in networking applications.
Information or data stored in a computer-controlled storage mechanism can be retrieved by searching for a particular key in the stored records. The stored record with a key matching the search key is then retrieved. Such searching techniques require repeated accesses or probes into the storage mechanism to perform key comparisons. In large storage and retrieval systems, such searching, even if augmented by efficient search algorithms such as a binary search, often requires an excessive amount of time.
Another well-known and much faster method for storing and retrieving information from computer store involves the use of so-called xe2x80x9chashingxe2x80x9d techniques. These techniques are also sometimes called scatter-storage or key-transformation techniques. In a system using hashing, the key is operated upon by an operator to produce a storage address in the storage space. The operator is called a hashing function, and the storage space is called a hash table. The storage address is then used to access the desired storage location directly with fewer storage accesses or probes than sequential or binary searches. Hashing techniques are described in the classic text by D. Knuth entitled The Art of Computer Programming, Volume 3, Sorting and Searching, pp. 506-549, Addison-Wesley, Reading, Mass., 1973, and more recently, in the contemporary classic text of R. Sedgewick entitled xe2x80x9cAlgorithms in C++xe2x80x9d, pp. 231-243, Addison-Wesley, Reading, Mass., 1992.
Hashing functions are designed to translate the universe of keys into addresses uniformly distributed throughout the hash table. Typical hashing operations include truncation, folding, transposition and modulo arithmetic. A disadvantage of hashing techniques is that more than one key can translate into the same storage address, causing xe2x80x9ccollisionsxe2x80x9d in storage or retrieval operations. Some form of collision-resolution strategy must therefore be provided. For example, the simple strategy of searching forward from the initial storage address to the first empty storage location will resolve the collision. This technique is called linear probing. If the hash table is considered to be circular so that addresses beyond the end of the table map back to the beginning of the table, then the linear probing is done with xe2x80x9copen addressing,xe2x80x9d i.e., with the entire hash table as overflow spare in the event that a collision occurs.
An alternative to linear probing is a technique commonly referred to as xe2x80x9cdouble hashingxe2x80x9d or xe2x80x9cmultiple hashingxe2x80x9d. When more than one key translates into the same storage address using the first hash function, the collision can be resolved by selecting a different hash function and xe2x80x9crehashingxe2x80x9d those keys (that had returned identical results using the first hash function) in order to differentiate between them. Of course, there is a finite probability that more than one key will translate into the same storage address using the second hash function, in which case the new collision can be resolved by selecting a (different) third hash function and xe2x80x9crehashingxe2x80x9d those keys once again in order to differentiate between them. This process can be repeated until all collisions have been resolved. According to Sedgewick, double hashing uses fewer probes, on the average, than linear probing. Sedgewick cites several examples of improved hashing methods, but cautions against
xe2x80x98premature use of advanced methods except by experts with serious searching applications, because separate chaining and double hashing are simple, efficient, and quite acceptable for most applications.xe2x80x99
One area in which multiple hashing is less effective or even problematic is network applications. Although the average speed is an important parameter in such applications, a more important and often overriding requirement is a highly predictable, deterministic operation. For example, voice and video recordings can be transmitted as data via the Internet using a digital data channel. The Internet network utilizes routers to direct the data from the sending address to the destination address. Routers using multiple hashing routines to locate the destination address and deliver these data packets will have a characteristically high variance in the time required to locate the address. In most cases, typically about 70%-80% of the time, the multiple hashing technique will locate the destination address in the first memory access. However, in about 20%-30% of the time, a second memory access is required. Often, a third, fourth or fifth memory access is required in order to locate the address. Moreover, in the case of voice transmission, a high variance of this kind results in a broken up, non-uniform sound message. These disturbances are often referred to as xe2x80x9cjittersxe2x80x9d.
There is therefore a recognized need for, and it would be highly advantageous to have, a high-performance method for directly referencing records in a table that is more deterministic and has appreciably less variance than existing multiple hashing techniques, a method that both provides fast access to the memory and is efficient with regard to memory space.
The present invention is a search method for storing and retrieving information records that is fast, efficient with regard to memory usage, and deterministic.
According to one aspect of the present invention there is provided a method for storing a key using a hash table, the method comprising the steps of: (a) hashing the key using a first hash function, thereby transforming the key into a table address in the hash table; (b) hashing the key using a second hash function, distinct from the first hash function, thereby transforming the key into a key signature; and (c) entering the key signature in a signature page located at the table address.
According to further features in preferred embodiments of the invention described below, the method for storing and retrieving information records further comprises the step of: (d) entering a pointer in the signature page, the pointer corresponding to the key signature and pointing to the key.
According to another aspect of the present invention there is provided a method for retrieving a tabulated key using a search key, comprising the steps of: (a) providing a plurality of signature pages, each of the plurality of signature pages having a table address and space for at least one key signature, each of at least one key signature having (i) a corresponding tabulated key, and (ii) an associated key pointer pointing to the corresponding tabulated key; (b) hashing the search key using a first hash function, thereby transforming the search key into a table address; (c) hashing the search key using a second hash function, distinct from the first hash function, thereby transforming the search key into a search key signature, and (d) searching the plurality of signature pages for a signature page having the table address of step (b) for at least one match between the search key signature and at least one key signature.
According to further features in preferred embodiments of the invention described below, the method for storing and retrieving information records further comprises the step of: (e) if the search for the at least one match succeeds: searching for a match between the search key and the corresponding tabulated key.
According to another aspect of the present invention there is provided a system for storing and retrieving data using a search key, the system comprising: (a) a data storage module having a plurality of tabulated signature pages, each of tile tabulated signature pages having a table address and at least one key signature having an associated pointer, the pointer pointing to a corresponding key in the data storage module, and (b) a processor that is operative to perform operations including: (i) hashing the search key using a first hash function, thereby transforming the search key into one of the table addresses; (ii) hashing the search key using a second hash function, distinct from the first hash function, thereby transforming the search key into a search key signature; and (iii) searching the tabulated signature page having the one of the table addresses for a matching key signature to the search key signature.
According to further features in preferred embodiments of the invention described below, the processor is further operative for: (iv) if the search for the matching key signature succeeds: searching for a match between the search key and the corresponding key of the matching key signature.
According to still further features in preferred embodiments of the invention described below, the processor includes an Address and Policy Resolution Logic (APRL) unit.
According to still further features in preferred embodiments of the invention described below, the system further comprises an input/output unit for transferring, to and from the processor, information selected from the group consisting of tabulated keys, search keys and instructions.
According to still further features in preferred embodiments of the invention described below, the input/output unit includes a switching fabric controller.
According to still further features in preferred embodiments of the invention described below, the input/output unit includes a link controller.
According to still further features in preferred embodiments of the invention described below, the link controller is a MAC controller.