The present invention relates generally to associative memory systems, and more particularly to associative memory systems that use hash functions.
Associative memory systems can typically receive a first set of data values (xe2x80x9ckeysxe2x80x9d) as inputs. Each key maps to an associated data value of a second set (xe2x80x9cassociated dataxe2x80x9d). Keys and their associated data form a database. The database can then searched by applying a key value to the associative memory system.
Associative memory systems have a variety of applications. Some applications may be optimized to accommodate large data structures, others may be optimized for transaction accuracy or reliability, still others may be optimized for search speed or update speed.
A content addressable memory (CAM) is one type of associative memory. While CAMs can provide relatively fast search speeds, CAMs also have a relatively high component cost. Therefore we seek to achieve high associative memory throughput using denser, less expensive random access memories (RAMs).
One way to provide fast access times is to form an associative memory system in which a RAM memory location is provided for every possible input key. One example of such a system is shown in FIG. 7. The system of FIG. 7 can receive input key values having xe2x80x9cnxe2x80x9d bits. Three key values are shown as K1, K2 and K3. Input key values can be applied to a memory 700 that includes 2n entries. Consequently, for each possible input key value, there is a corresponding memory 700 entry. In the particular arrangement of FIG. 7, a memory 700 is a random access memory, and key values can be applied to the memory 700 as addresses. Three entries corresponding to the key values K1, K2 and K3 are shown. Each entry is accessed by an address that is a key value, and stores data associated with the key value. For example, the application of key value K1 results in the associated data value DATA Z being provided by memory 700.
A system with direct mapping can be feasible when the number of possible input key values is small, as for example when the key is a binary number only a few bits wide. However, for wider key values (larger key domain), direct mapping is impractical, as the resulting memory size becomes undesirably large. Further, in most applications, a system stores only a tiny fraction of all possible key value permutations. In such a case, a direct mapping approach results in inefficient use of memory.
For larger key domains, hashing is another conventional approach. A hash function translates values in one address space to values in a smaller address space. For example, if a system received 128-bit key values, such key values could be translated by a hash function into a set of 16-bit hash bucket addresses.
xe2x80x9cCollisionsxe2x80x9d present the major practical challenge in using hash functions for associative data systems. In our 128-bit key example, if a hash function h(x):{0,1}128xe2x86x92{0,1}16 maps 128-bit keys to 16-bit hash bucket indices, a simple counting argument shows that many different possible 128-bit keys must hash to each of the 64K different addressable locations (xe2x80x9cbucketsxe2x80x9d). If the keys stored in the associative memory system include multiple keys that hash to the same bucket b, then when an input search key hashes to bucket b, some further xe2x80x9ccollision resolutionxe2x80x9d method is required to determine which of the keys stored in bucket bxe2x80x94if anyxe2x80x94matches the search key. Further, even if a bucket b holds only one key, and a search key hashes to the same bucket b, it is possible that the search key is not the same as the key stored in the table, but is an xe2x80x9caliasxe2x80x9d to that key, that just happens to hash to the same bucket. Therefore, even when a single candidate search result is found, the key stored in the table must be compared against the input search key to resolve such aliases.
Mathematics has proven that there does exist, for any particular static set of keys and any table size larger than the number of keys, one or more xe2x80x9cperfectxe2x80x9d hash functions for which no two keys in the set collide. However, mathematical results have also shown that for large key sets (thousands to millions of keys), the computational complexity of finding such perfect hash functions is extremely high; and further, the storage complexity of describing a hash function that has been found is also high. These results make perfect hashing impractical for large, dynamic data sets.
A number of conventional approaches have been proposed for addressing hash collisions. One possible approach would be to select a new hashing function, and then re-translate the entire current data structure into a new data structure without a collision. Such an approach is undesirable as it can consume considerable time and consume considerable computing resources.
Other conventional approaches for addressing hash function collisions include using a xe2x80x9clinked-list.xe2x80x9d A linked list can access a number of memory entries in series. An example of a system having a linked-list is shown FIG. 8.
A key value K21 is applied to a hash function 800. The output of hash function 800 is an address to a memory 802. In FIG. 8, three different table entries (for keys K01, K97 and K21) map to the same memory location or hash bucket. Thus, the address for one entry 804 is shown as (H(K01)=H(K97)=H(K21)). The entry 804 includes one of the key values K01 and its associated data. Further, the entry 804 is linked with a linked-list xe2x80x9cnextxe2x80x9d pointer 806 to a second entry 808 that includes the key value K97 and its associated data. Entry 808 is linked with a linked-list xe2x80x9cnextxe2x80x9d pointer to a third entry 810 having the key value K21 and its associated data. The xe2x80x9cnextxe2x80x9d pointer of this third entry is null, indicating that there are no more entries in the list.
In the arrangement of FIG. 8, when the key value K21 is applied, hash function 800 accesses entry 804. The applied key value K21 is compared to the stored key value K01. Because the key values are different, the next entry 808 at the linked-list pointer 806 is accessed. The applied key value K21 is compared once again to the stored key value K97. Because the key values are again different, accesses continue according to the linked list pointer 806. Entry 810 is then accessed. The applied key value K21 is once again compared to the stored key value K21. Because the key values are the same, the corresponding associated data DATA can be provided as an output value.
A drawback to the above-described arrangement is that multiple memory read accesses and compare operations may be required, up to the length of the longest linked-list in the table in a worst case search. The length of the longest linked list depends on the table contents and can grow large.
Another conventional approach for addressing hashing function collisions includes a search tree. In one particular case, a search tree uses a number of search criteria to arrive at the desired associated data. An example of a collision resolution system having a binary search tree shown FIG. 9.
The example of FIG. 9 includes some of the same general items as FIG. 8. A key value K31 is applied to a hash function 900. The output of hash function 900 is an address to a memory 902. In FIG. 9, four different key values (K62, K45, K72 and K31) hash to the same memory entry. Thus, the address for one entry 904 is shown as (H(K62)=H(K45)=H(K72)=H(K31)). The entry 904 can activate a binary search operation to select among the data associated with the four possible key values (K62, K45, K72 and K31). As just one example, a particular pointer value SEARCH can be stored in entry 904. The output of this value SEARCH can cause a particular binary tree search to be performed.
One particular binary search arrangement is illustrated by search steps 906-1 to 906-3d. In search step 906-1, the applied key value is compared to a predetermined value to select two of the four possible key values. Search steps 906-2a and 906-2b can select one key value from two. Search steps 906-3a and 906-3b can provide the data associated with a particular key value at the leaf level. In FIG. 9, data values DATA I, DATA J, DATA K and DATA L are associated with key values K31, K45, K62 and K72, respectively. At the selected leaf of the binary tree search, a compare against a stored key value is performed, to resolve aliasing.
A drawback to the above-described arrangement is that the various search steps add to an access time. In particular, a binary search among xe2x80x9cmxe2x80x9d different values can require log2m search and compare steps. The number of collisions that occur at each table location is dependent on the contents of the table. For randomly distributed hash function output, the number of collisions per location tends to be relatively small, but there is a certain probability of encountering a larger number of collisions, which would result in longer search time. This property makes it impossible to set a tight upper bound or worst-case on the number of search steps. In many real-time applications, deterministic performance is required. Further, for maximum throughput performance, it is desirable to fully pipeline a search algorithm such that each discrete step can be executed by separate dedicated hardware. Without a deterministic number of steps, an algorithm cannot be fully pipelined in this way.
It would be desirable to arrive at some way of providing an associative data system that can have the memory size advantages of search systems using hash functions, but not suffer from indeterminate access times that may arise from hash collisions. Such a system, to be practical, must also permit efficient update of table contents, without the large pre-processing times required for perfect hash functions of large key sets.
According to one embodiment, an associative data system can receive input key values. A first hashing function maps the input key values into first output values. The number of first output values is smaller than the number of all possible key values. When a set of different table key values collides at the same first output value, a second small perfect hash function maps the set of colliding key values to second output values. Thus, essentially all key searches can be accomplished in two accesses.
According to another embodiment, a first hash function maps input key values to first memory locations. When multiple keys map to the same first memory location, the first memory location is a xe2x80x9cchunk pointerxe2x80x9d entry that provides second hash function parameters. Second hash function parameters are used to generate a second perfect hash value that selects between the multiple key values that collide at a particular first memory location.
According to one aspect of the above embodiment, a pointer entry can include a chunk base address within a second memory. The chunk base address can be combined with second outputs generated by the second hash function to generate a second memory address. The second memory address stores a pointer to a location in a third memory, where a key value and associated data corresponding to the key value are stored.
According to another aspect of the above embodiment, when one input key maps to a first memory location at which there is no collision, the first memory location points to an entry in a third memory that includes a key value and associated data corresponding to a table key.
According to another aspect of the above embodiment, when no input keys map to a first memory location, the first memory location can be a null entry that includes data indicating that there is no stored data that is associated with the corresponding input key, or with any input key that hashes to that address.
According to one aspect of the above embodiments, a second hash function can map input key values to a second output space. The size of the second output space can be selectable.
According to another aspect of the embodiments, a first and/or second hash function can include Galois field multiplication and division (modulo operation).