This application is related to the application having Ser. No. 09/820,493, filed on the same date as this application, entitled xe2x80x9cSystem and Method for Encoding and Decoding Dataxe2x80x9d, by inventor Terrence Shannon. That application is incorporated herein by this reference.
Data is often compressed in order to reduce computer memory requirements or to reduce transmission time. One popular data compression technique may be referred to as xe2x80x9centropy codingxe2x80x9d.
Huffman coding is one example of an entropy coding technique. A Huffman encoder will typically utilize a Huffman statistical model to convert a string of tokens into a series of variable length entropy codes. The Huffman encoder operates to assign short entropy codes to the tokens that occur most often and longer codewords to the tokens that occur the least often. The codewords that are used in Huffman encoding are typically obtained from one or more tables, known as Huffman Tables.
While prior art entropy coding techniques have proven to be very useful, improvements in these techniques are always needed in order to reduce the memory requirements of computing systems as well as to improve the speed at which data can be communicated from one computer to another.
The present invention may be implemented as a method of compressing a first string of tokens. The first string of tokens including a group of tokens that immediately follow a token having a first value. The method includes selecting a token value that occurs in the group based upon the group""s local frequencies; and producing a second string by substituting each occurrence of a token-pair unit in the first string with a single token having a second value; and wherein the token-pair unit includes a first token having the first value and a second token having the selected value.
The present invention may also be implemented as apparatus for compressing a first string of tokens. The first string of tokens includes a first group of tokens each having a first value and a second group of tokens each immediately following a unique one of the tokens in the first group. The apparatus is operable to identify a token value from the second group based upon the local frequencies of the second group The apparatus is further operable to define a token-pair unit, the token-pair unit including a first token having the first value and a second token having the selected value. The apparatus is further operable to convert the first string into a second string in part by substituting each occurrence of the token-pair unit in the first string with a single token having a third value.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.