Hashing is a conventional technique commonly used in various applications for mapping a set of signal elements (arguments) to a limited range of numeric identifiers (keys) by means of a hash function. In hashing, a given signal element is mapped to an identifier based only on the signal element or appropriate parts thereof as input to the hash function, without any knowledge of the mapping between other signal elements and identifiers. Ideally, signal elements having the same content should be mapped to the same identifier, whereas signal elements of different contents should be mapped to different identifiers. However, hash functions are usually not capable of mapping all unique signal elements to distinct identifiers, and there is a considerable risk of different elements being mapped to the same identifier (a hash collision, also referred to as a clash).
Therefore, a lot of research has been directed towards finding optimized hash functions with random and uniform distribution characteristics. However, the number of hash collisions is usually still considerable in many applications even though a “good” hash function is used. In many cases the number of hash collisions may be considerable even when the number of unique and simultaneously active signal elements to be mapped to the identifiers is as low as 30–40% of the total number of identifiers.
Other attempts for reducing hash collisions include resolving the collisions by means of various complicated circuitry, for example as described in U.S. Pat. No. 5,920,900.
U.S. Pat. No. 6,097,725 describes a method for searching a bit field address in an ATM system by computing a hash key for pointing to a first address among a large number of addresses followed by sequential reading of a smaller number of subsequent entries until a match occurs.
For a more thorough understanding of conventional hashing and the problems associated therewith, hashing will now be described with reference to the particular problem of selecting context identifiers in Internet Protocol (IP) header compression.
IP header compression reduces the negative impacts of large IP headers significantly and allows efficient bandwidth utilization. Header compression is generally based on the observation that in a packet stream, most header fields are identical in consecutive packets. For simplicity one may think of a packet stream, sometimes also referred to as a session, as all the packets sent from a particular source address and port to a particular destination address and port using the same transport protocol. A basic principle for compressing headers is to establish an association between the non-changing fields of the headers in a packet stream and a context identifier (CID), which is selected to represent the headers. Headers are then replaced by compressed headers, each of which contains CID information and possibly also information that is unique to the individual packet header.
FIG. 1 is a schematic diagram illustrating a full header 1 with CID association as well as a corresponding compressed header 2. Typically, the header fields can be categorized into different categories depending on how the fields are expected to change between consecutive headers in a packet stream. Header compression standards such as RFC 2507 and RFC 2508 of the Internet Engineering Task Force provide such a classification for IPv6 base and extension headers, IPv4, TCP and UDP headers. In these standards, fields that are not expected to change are classified as NO_CHANGE, fields that can be inferred from other fields are classified as INFERRED, and fields that change in an unpredictable manner are classified as RANDOM. Information in RANDOM fields is normally included in the compressed headers, whereas information in INFERRED fields really does not have to be included in the compressed headers.
FIG. 2 is a schematic diagram of two interconnected routers A and B, each with header compression/decompression capabilities. The routers 10, 20 are interconnected by one or more (bi-directional) links. Each router comprises a compressor 11/21 and a decompressor 13/23, each of which is connected to a respective context memory 12-1, 12-2/22-1, 22-2. To compress the headers of a packet stream, the compressor 11 in router A selects, for each packet header, a CID to represent the non-changing fields of the header and stores header information, possibly together with additional information, as a compression context in the context memory 12-1 of router A. The initial packet of the packet stream is transmitted with a full header (FH), including the selected CID, over a given link to router B, allowing the decompressor 23 of router B to extract the compression context and the CID. The extracted compression context is stored in the context memory 22-2 of router B. Subsequent packets belonging to the same packet stream are then transmitted with compressed headers (CH) to router B. The decompressor 23 of router B can use the corresponding CID values to lookup the appropriate compression context in the context memory 22-2 of router B, thus restoring the compressed headers to their original form. In order to alleviate problems with incorrect decompression, full headers are typically transmitted periodically, or with an exponentially increasing interval in slow-start mode, to refresh the compression context.
Although many aspects of header compression are specified in detail in existing header compression standards, the CID selection mechanism is not. The maximum range of CID values is specified in the standards. TCP packets and non-TCP packets normally use separate sets of CID values with different maximum ranges. Different routers have to negotiate on which CID range to use before initiating transmission. In general, different links also use separate sets of CID values. The actual mechanism for generating and selecting CID values, however, is unspecified.
There are some basic requirements on CID generation and selection. The CID values should be unique for all packet streams that are active on a given link at any given time so that different streams are mapped to different CID values. If two or more active packet streams map to the same CID (clashing), the degree of compression is reduced since each clash requires a new full header, redefining the context of the CID, to be transmitted instead of a compressed header. Generating a unique CID for each new packet stream is therefore very important for the overall efficiency of the compression algorithm.
CID selection is also complicated by the fact that there is no mechanism for determining when a stream has terminated.
Conventional methods for generating CID values are typically based on hashing, taking the non-changing header fields as input to a hash function to generate a corresponding CID value.
In header compression applications, the total number of possible headers may be extremely large, while typically the CID range is maximized to 28 for TCP traffic and 216 for non-TCP traffic.
FIG. 3 illustrates hash-based generation of CID values for addressing a context memory according to the prior art. For an incoming packet, the header fields classified as NO_CHANGE are used as input to a hash coder 30. A suitable hash function is implemented in the hash coder 30 to generate a CID value based on the given input. The generated CID value acts as an index to the context memory 12/22 and points to a specific address in the context memory to be used for storing corresponding header information as compression context. The context memory 12/22 has a limited size, here illustrated with 128 memory positions from 0 to 127. The corresponding CID values that are used for addressing the context memory define a CID space ranging from 0 to 127 (the CID range being equal to 128).
FIG. 4 illustrates the problem of CID clashing as two packet streams map to the same CID value. If a first packet belonging to stream X is mapped to the CID value 120 in the CID space, the corresponding header is stored as compression context in position 120 of the context memory. When a subsequent packet belonging to another stream Y also is mapped to the CID value 120, we have a CID clash. When the clash occurs, the CID value 120 is redefined to represent the new stream Y and the compression context previously stored in memory position 120 is overwritten by the header of the new packet belonging to stream Y. In the overall header compression scheme, this also means that the full header of the new packet of stream Y has to be transmitted to the decompressor on the receiving side. The two packet streams X and Y will continue to clash during the entire time period in which both packet streams are active, alternately overwriting each others compression contexts and necessitating the transmission of full header packets. In conventional hash-based CID generation, clashes will be common even when the number of simultaneously active sessions is relatively small compared to the total CID range, leading to a significant reduction of the compression efficiency.
In computer systems using cache memories, a similar problem is encountered when several memory addresses are mapped to the same cache line.