Two methods of data compression and decompression that can be integrated, separately as well as jointly, into software and hardware, are Ziv-Lempel compression and Huffman Coding. Ziv-Lempel is an example of parse tree based compression, where a value conversion dictionary is stored as a tree and parsed to locate conversion values. Huffman Coding, meanwhile, is a type of entropy coding, which compresses digital data by representing frequently occurring patterns with few bits and rarely occurring patterns with many bits. While Ziv-Lempel enables the compression of data of a variable length to symbols of a fixed length, Huffman Coding enables the compression of data of a fixed length into variable length code words. Systems utilizing these types of compression can store either a symbol (in Ziv-Lempel) and/or a code word (in Huffman Coding) in place of the data and through decompression, the data represented by either a symbols and/or a code word can be retrieved.
In Ziv-Lempel, program code searches in plain text for entries in a pre-determined dictionary and substitutes unique symbols, all of a consistent length, for the identified entries. To enable compression and decompression, one or more resources in the computer system stores the dictionary, which can be represented by one or more parse trees. Because the symbols utilized to represent the identified text are all of a fixed length, if a given string is not represented by a symbol in the dictionary, the fixed length must be extended to enable compression of this string. Because all symbols are of the same width, in order to compress this one unrepresented string, the size of the system as a whole must be inflated.
In Huffman Coding, the frequency of a certain strings is inversely proportional to the length of the code word used to represent it upon compression. Thus, strings that appear with a higher frequency, are represented by shorter code words, while strings that appear less frequently, are represented with longer code words. In order to translate a string into a code word, or vice versa, the frequency/rank of the word and/or the words themselves must be stored. In a type of Huffman Coding called Canonical Huffman Coding, the memory stores the ranks of the strings. The efficiency of systems utilizing this compression and the resources required to store the code words is determined by the frequency of certain data strings. Thus, a lack of repetition in values could result in an increased storage requirement.
Both Ziv-Lempel and Huffman Coding have drawbacks that impact software and hardware systems through limitations in data retention and retrieval. Ziv-Lempel and Huffman Coding can be utilized together within the same system, which can decrease overhead overall, but even this scheme has inefficiencies, as it introduces not only an additional memory access, but may increase the amount of system memory required overall. An example of a hardware architecture where these two compression methods are jointly implemented is the z/Architecture offered by International Business Machines Corporation (IBM); z/Architecture is a registered trademark of International Business Machines Corporation, Armonk, N.Y., USA. In hardware design, storage (bandwidth) can be finite on certain resources, so eliminating any inefficiencies in data retention and retrieval is desirable in order to maximize the functionality of the existing hardware.
Systems combining Ziv-Lempel and Huffman Coding can take advantage of situations where compression from variable length to fixed length is more desirable and when compression from fixed length to variable length is more desirable. But when these methods are combined, the system performs a translation from symbol (in Ziv-Lempel) to rank (in Canonical Huffman Coding) and performing this translation requires an additional memory access that can be both expensive and unpredictable. This lookup is additional because, as explained above, data compression and decompression with Ziv-Lempel and Huffman Coding already necessitates memory accesses. Also, performing this additional lookup may also require additional space in memory. Thus, a need exists for a method that takes advantage of existing data compression technologies reducing additional memory accesses or space.