This disclosure generally relates to data compression. More specifically, this disclosure relates to prefix compression for keyed values.
Numerous techniques exist for compressing data, e.g., (1) “String searcher, and compressor using same,” Phillip W. Katz, U.S. Pat. No. 5,051,745, (2) David A. Huffman, “A Method for the Construction of Minimum-Redundancy Codes,” Proceedings of the IRE—Institute of Radio Engineers, pp. 1098-1101 (September 1952), and (3) Jacob Ziv and Abraham Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE transactions on information theory, Vol. IT-23, No. 3 (May 1977), to name a few popular compression techniques.
Existing techniques suffer from a number of drawbacks. Specifically, when these techniques are used for high compression, they are typically slow. Additionally, in some techniques, the result of decompression is the large dataset that was originally compressed. While this type of compression helps with data transfer, it does not really help the processing of the data on the consumer. The consumer needs to perform the potentially lengthy process of decompression and then must read through the large amount of repeated string data. For the purposes of string interning, the consumer still needs to process the incoming strings, e.g., performing hashing. Moreover, even when string interning is used, there can still be a considerable amount of duplicate data if numerous strings are only unique by a suffix. In other words, the common prefix across many strings will still be duplicated in memory.
Some compression techniques are based on custom coded statistical encoding. These techniques suffer from the above-mentioned drawbacks. Note that existing compression libraries most likely make use of statistical encoding, possibly after performing a previous transform.
The amount of data being produced continues to increase at unprecedented rates, and there is a continuing need for techniques and systems to compress data, thereby improving the efficiency with which data can be stored and communicated.