Embodiments of the present invention relate to memory lookup operations using hash functions and, particularly, to such operations that are designed for large scale memories.
“Hashing” generally describes a technique for searching for data within a memory system. Given a set of input data, a hashing function generates an index value. When applied to a memory, the index value should cause requested data to be read therefrom. Unfortunately, depending upon the hash function used, index values may not uniquely identify the requested data. It is possible that a hash function can generate the same index value for two or more unique input values. This is called a “collision.” To guard against the possibility of collisions, the index value typically is used as a pointer to a linked list of data. Each element in the linked list typically contains the data being sought (called, the “payload” data herein), a copy of the input data to which it relates and a pointer to the next element in the linked list. In such systems, it becomes necessary to examine each element in the linked list serially until the copy of the input data confirms that responsive data has been found or until the linked list is exhausted.
In those systems described above, the serial examination of each element in the linked list wastes time. It can be particularly disadvantageous in high-performance applications or those involving massive data sets (millions of memory entries or more). Consider, for example, the process of searching an established connection table to support the well-known transmission control protocol (TCP). Given an input tuple that includes an IP source address, an IP destination address, a TCP source port and a TCP destination port, the process must search a memory to retrieve data representative of the connection state. Using a conventional linked list implementation, as the number of active connections grows, the rate of collisions and the length of the linked lists also grow. Hypothetically, if an index hits a linked list with six entries, a system must read each entry in order to detect a match. Because each entry in the list includes a pointer to the next entry in the list, the various entries cannot be read in parallel. Up to six sequential memory reads would be required before it could be determined whether the input data hit or missed the memory. Thus, the latency problems of such implementations can be severe.
In one well-known TCP implementation, IP source and destination addresses each are represented as 32 bit quantities and TCP source and destination ports are represented as 16 bit quantities. To accommodate all possible variations in these values, a TCP connection table would require 296 entries if implemented without a hash function. A hash function that generates a 32 bit hash value, however, reduces the size of the connection table to 232 entries (about 4.3 million entries). In another TCP implementation, where IP source and destination addresses are represented as 128 bit values, a TCP connection table would require 2228 entries. The 32 bit hash value again would reduce the size of the connection table to 232 entries. In this latter implementation, a vastly larger number of unique combinations of input data would map to the same 4.3 million hash values, which raises the collision rate proportionally when compared to the first implementation.
The inventors perceive a need in the art for a high performance hashing algorithm that provides improved performance for large scale memories. They further perceive a need in the art for a hash-based lookup system that avoids the problems of serial reads throughout linked list data structures.