The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Network elements, such as routers and switches, are capable of performing operations relative to packet flows that pass through those network elements. A packet flow is defined by a set of specified attributes; all data packets that possess the specified attributes belong to the packet flow. For example, if a particular packet flow is defined by a particular source Internet Protocol (IP) address and a particular destination IP address, then all IP data packets containing both the specified source IP address and the specified destination IP address in their packet headers belong to the particular packet flow. Only data packets that possess a packet flow's specified attributes belong to that packet flow.
Typically, when a network element receives a data packet, the network element classifies the data packet into a packet flow by reading specified attributes of the data packet (e.g., source IP address and destination (IP address) and combining the specified attributes into an input key that is provided as input into a hash function. Based on this input key, the hash function produces an output value that is called a hash key. The quantity of unique hash keys that the hash function can produce is less than the quantity of unique input keys that the hash function can accept. In fact, the hash function may produce the same hash key from two or more different input keys. For example, if the hash function produces a hash key by dividing an input key by 6 and then outputting the integer remainder of that division, then the hash function will produce a hash key of 1 for all input keys that are one more than a multiple of 6.
Once the hash function has produced the hash key for a particular data packet, the network element uses the hash key as an index into a multi-entry hash table. For example, if the hash key is N, then the hash table entry that corresponds to the hash key typically is the Nth entry in the hash table. Each entry of the hash table may contain a pointer or reference to a corresponding entry of a separate multi-entry data structure called a flow table. After the network element has located the hash table entry that corresponds to the hash key, the network element follows the hash table entry's pointer or reference to locate a corresponding flow table entry.
Each populated entry of the flow table contains, among other information, an identifier. Assuming that the network element has located a populated flow table entry referenced by the hash table entry, the network element compares the flow table entry's identifier with the input key that was provided to the hash function to generate the hash key. If the identifier matches the input key, then the network element performs operations that are associated with, and possibly specified within, the flow table entry. For example, the network element may increment a counter that is contained in the flow table entry to keep a tally of the number of a particular flow's data packets that have been received by the network element.
However, because the hash function may produce the same hash key for multiple different input keys, the identifier might differ from the input key. Under these circumstances, a hash collision has occurred, and the operations associated with the flow table entry should not be performed. To compensate for hash collisions, a technique called “paging” may be employed.
Each hash table entry may be visualized as a row of the hash table. Employing the paging technique, each hash table entry may contain multiple columns called pages. Each page of a hash table entry may contain a separate pointer or reference to a different flow table entry. Using the paging technique, after the network element has located the hash table entry that corresponds to the hash key, the network element follows the hash table entry's first page's pointer or reference to locate a flow table entry. If the identifier contained in that flow table entry does not match the input key, then the network element follows the hash table entry's second page's pointer or reference to locate a different flow table entry. The process continues for each successive page of the hash table entry until the network element locates either a flow table entry that contains an identifier that matches the input key (making that flow table entry the “matching” flow table entry) or, if there are no matching identifiers, an unused page that does not yet contain a reference to any flow table entry. If the network element locates an unused page, then the network element sets that page to refer to a currently unpopulated flow table entry, and populates that flow table entry. As part of populating a flow table entry, the network element assigns the input key to the flow table entry's identifier.
The number of pages per hash table entry is configurable. For any given hash function, as the number of pages per hash table entry increases, the average time required to perform the above process also increases.
Where there are many pages per hash table entry, it might take a long time to locate a matching page within a particular hash table entry. To reduce the average time required to locate a matching page, multiple separate hash tables may be used. Each hash table corresponds to a different hash function. Using this multiple hash table approach, the network element inputs the input key into each hash function. Each hash function produces a separate hash key, and each hash key corresponds to an entry in a separate hash table. For example, given the same input key, a first hash function might produce a first hash key that corresponds to an entry in a first hash table, and a second hash function might produce a second hash key that corresponds to an entry in a second hash table. In a manner similar to that described above in relation to a single hash table entry, the network element determines whether any of the several corresponding hash table entries contains a page that refers to a matching flow table entry. If the network element determines that none of the corresponding hash table entries contains a page that refers to a matching flow table entry, then the network element selects the hash table entry that has the most unused pages, and sets an unused page of that hash table entry to refer to a flow table entry as described above.
For practical and economic reasons, the hash tables typically are stored within smaller, faster memory, while the flow table typically is stored within larger, slower memory. Reading an identifier from the flow table to determine whether the identifier matches the input key is a relatively expensive operation in terms of computing resources. Consequently, it is desirable to minimize the number of times that an identifier is read from the flow table.
Under one theoretical approach, the identifier might be stored in a hash table entry. Because the memory in which the hash tables are stored typically is faster than the memory in which the flow table is stored, identifiers could be obtained from the hash tables more rapidly. However, identifiers often are quite large; an identifier might comprise 389 bits, for example. Because the memory in which the hash tables are stored typically is smaller than the memory in which the flow table is stored, storing such large identifiers in hash table entries often is not feasible.
According to another approach, compression mechanisms are used in order to generate and store compressed keys, rather than the full identifiers, in the hash tables. Compression may be accomplished by hashing the full identifiers, for example. This compression-based approach allows a network element to compress an input key and compare it with the compressed keys that are stored in the pages of a hash table entry. The network element does not need to read a full identifier from a flow table entry unless the compressed key that is stored in a page that refers to the flow table entry matches the compressed input key. Because it is possible for different input keys to be compressed into the same compressed input key, it is still necessary to check the full identifier from the flow table entry at least once to ensure that a “compression collision” has not occurred.
Thus, applying the compression-based approach to the multiple hash table approach described above, a network element determines whether any of the several corresponding hash table entries (one in each hash table) discussed above contains a page that contains a compressed key that matches the compressed input key. Unfortunately, when more than one such hash table entry, in separate hash tables, contains a page that contains the matching compressed input key, the network element has to read multiple identifiers from the flow table and compare each with the input key. Each read and compare operation degrades the network element's performance.
As described above, a network element may input a first input key into two different hash functions to obtain two different hash keys. The network element can use a first hash key to locate a first hash table entry in a first hash table, and the network element can use a second hash key to locate a second hash table entry in a second hash table. If the network element cannot find, among the pages of the first and second hash table entries, a compressed key that matches the compressed input key, then the network element may insert the compressed input key into an unused page of either the first or second hash table entries. Unfortunately, even if the network element checks all of the populated pages of the first and second hash table entries for a matching compressed key prior to inserting the compressed input key into an unused page of the first hash table entry, there still exists a possibility that the compressed input key already exists in a third hash table entry that corresponds to a second input key which, like the first input key, also corresponds to the first hash table entry.
For example, given two input keys K1 and K2, K1 might hash to entry X in the first hash table and entry Y in the second hash table, and K2 might hash to entry X in the first hash table and entry Z in the second hash table. Furthermore, due to the “lossiness” of compression, the compressed versions of input keys K1 and K2 might be identical. Checking to make sure that neither entry X nor entry Y contains the compressed key before inserting the compressed key into entry X does not protect against the possibility that entry Z already contains the compressed key. If the compressed key is inserted into entry X, then the compressed key ends up being aliased, undesirably, across entry X in the first hash table and entry Z in the second hash table; both entries correspond to input key K2.
If more than one compressed key matches in both the first and second hash entries discussed above, then the correct flow table entry may be determined by comparing each of the full identifiers from the corresponding flow entries with the original input key. However, this requires multiple flow table lookups, which complicates algorithms and impacts performance adversely. If there are N hash tables, then, in the worst-case scenario, N flow table lookups might need to be performed in order to determine the correct flow table entry. Based on the foregoing, there is a clear need for a solution that can ensure that the above scenario does not occur.