Hash functions are used to compute a digest (or fingerprint) of data (or key) with the aim of allocating each data/key instance to a deterministic hash bucket. To be effective, the hash function should produce near uniform distribution of the keys to buckets, so that no bucket should be overloaded while some buckets remain empty. There are many known hash functions with different properties. For data lookup purpose, the hash function need not be cryptographically strong, but the hash distribution and computation speed are important.
Certain central processing units (hereinafter “CPUs”) implement instructions that have been reported to provide fast calculation of certain hash functions. The CRC32 instruction in the Intel SSE 4.2 instruction set is one notable example. One of these functions is the CRC32, and it is recommended as a good hash function.
Hashes are regularly performed in a network networking environment. It would be advantageous to use the CRC32 instruction as a hash function in such environment. However, a problem with the CRC32 function is that it fails to produce high quality hashes from short hash inputs, which are likely in many lookup scenarios. These include, for example, short (4-byte, 8-byte, and/or 12-byte) inputs with only one bit set (to 1), all others set (to 0).
Another test case computes hash for distributing a packet across multiple output ports, measuring the distribution of test packets to ports. Here the quality criteria is as uniform as possible distribution to the output ports, so as to avoid burdening one port with excess traffic, while other ports may be sent too little traffic. This hash is also calculated over short input. The plain CRC32 hash function also fails this test.