Compression schemes, such as Lempel-Ziv based compression schemes, are often used in data centers to compress data, thereby enabling compute devices in the data center to store more customer data in a given amount of data storage capacity and/or transmit more customer data in a given amount of network bandwidth. When compressing data pursuant to a Lempel-Ziv based scheme, the compute device searches for the longest string from a history buffer (e.g., a sliding window of previous data from an input stream) that matches a string starting at the present position in the input stream (e.g., a number of bytes into the input stream). To do so, the compute device typically produces a hash by performing a hashing function on a prefix of a string of symbols (e.g., each a byte or other unit of data) starting at a present position and including a predefined number of additional symbols from the input stream. In typical systems, the total length of the prefix is three symbols. Typical systems then use the resulting hash as an index into a hash table that includes, for each hash, a set of pointers that point to other strings in the history buffer that produced the same hash.
The compute device, in typical compression systems, then compares one or more of the strings associated with the pointers found in the hash table to the string at the present position to find one or more matches, and selects the longest matching string. The compute device then replaces the symbols in the string at the present position with a much shorter reference back to the earlier occurrence of the string, to produce compressed output data. However, given that only three symbols are used as the prefix size, the set of pointers that refer to other strings in the history buffer that produced the same hash can be relatively long. As such, to improve the chances of finding a long string that matches (e.g., has the same sequence of symbols) the string at the present position, and thereby obtain a relatively high compression ratio, the compute device may be required to search through many (e.g., a hundred or more) pointers and compare each referenced string to the present string for a potential match, incurring significant latency in producing the compressed output data.