(1) Field of Invention
This invention relates to the field of data compression and decompression. More specifically, this invention relates to data encoding and encoding.
(2) Description of Prior Art
Data compression is a process of converting data such that the converted data can be encoded in fewer bits or bytes. When being decompressed, the compressed data is recovered to its original form. Compressed data saves data storage since smaller-sized data takes less space to store. Compressed data can also enhance data transmission rates since smaller-sized data requires less time to transmit. Compression techniques have long been used in storage media such as tapes, disks, and flash cards, as well as communication equipment, such as the modem and Ethernet. With the rapidly growing demand for the Internet as well as handheld and wireless devices, compression plays an even bigger role than ever in terms of data transmission and storage.
As shown in FIG. 1, a typical LZ data compression method processes an input data stream 10 to generate a compressed output data stream 17 by comparing an unprocessed data portion 12 of the input data stream 10 to already processed data in a history buffer 11. If a data string 13 found in the history buffer 11 matches a current data string 14 in the unprocessed data portion 12, the current data string 14 is replaced by a pointer (P, L) that corresponds to an offset P 15 and a match length L 16. The offset P 15 and match length L 16 are then encoded individually with either a fixed-length coding or a variable-length coding process. Thus, the current data string 14 is represented by a shorter coded pointer data (P, L) 18 in the compressed output data stream 17.
As shown in FIG. 2, fixed-length coding uses a fixed number of bits to encode a range of numbers. For example, if the number xe2x80x9c7xe2x80x9d were coded in a 13-bit fixed-length coding scheme, the encoded bit string would be xe2x80x9c0000000000111xe2x80x9d. In another case, if the number xe2x80x9c4096xe2x80x9d were coded in a 13-bit fixed-length coding scheme, the encoded bit string would be xe2x80x9c1000000000000xe2x80x9d. Thus, an n-bit fixed-length coding scheme encodes up to 2n numbers ranging from 0 to (2nxe2x88x921). Variable-length coding, on the other hand, uses a variable number of bits to encode a range of numbers. There are many different variable-length coding schemes suited to different needs. FIG. 3 illustrates an example of a variable-length coding table that encodes up to 8,191 numbers ranging from 0 to 8,190. Variable-length coding typically encodes smaller numbers in fewer bits and larger numbers in more bits. For example, coding the number xe2x80x9c4096xe2x80x9d using the variable-length coding table shown in FIG. 3 would take 25 bits. However, coding the number xe2x80x9c7xe2x80x9d using the same variable-length coding table would only take 7 bits. Thus, compared to variable-length coding, fixed-length coding generally tends to be more efficient in coding larger numbers, but less efficient in coding smaller numbers.
How to encode data efficiently is essential to achieving a good compression result. Prior methods typically encode match length L 16 with variable-length coding and offset P 15 with fixed-length coding. While match length L 16 is normally a small number, offset P 15 falls within a wide range of values determined by the size of the history buffer 11. Offset P 15 has an upper bound that increases as the size of the history buffer 11 increases. Thus, coding offset P 15 strictly with fixed-length coding as do most of the prior methods is less desirable when offset P 15 turns out to be a small number, e.g. xe2x80x9c7xe2x80x9d. On the other hand, coding offset P 15 strictly in variable-length coding is less desirable either when offset P 15 happens to be a large number, e.g. xe2x80x9c4096xe2x80x9d. Thus, combining different coding schemes into a single coding scheme is able to maximize the efficiencies of data encoding.
Since smaller data requires fewer bits to encode prior methods try to reduce the number of bits to encode data through methods such as replacing a matching data string with a pointer data (P, L) as illustrated in FIG. 1. However, it is desirable to further reduce the value of a data before encoding.
An offset-difference coding process for improving coding efficiencies and compression performance described in accordance with the principles of this invention comprises an encoding process and a decoding process. The encoding process of offset-difference coding encodes paired input data by first determining the greater of the two input data, then calculating the difference between the two input data, replacing the larger input data with the calculated difference, and encoding said calculated difference and the smaller input data. The encoding process of offset-difference coding also generates an indicator if the larger input data that is replaced by said calculated difference is not statistically larger than the smaller input data. The decoding process of offset-difference coding decodes encoded data in an input data stream by first detecting whether an indicator exists, then decoding the encoded data and restoring the original data in response to detecting whether an indicator exists.
An alternative embodiment of an offset-difference coding process, composite offset-difference coding, described in accordance with the principles of this invention compares the calculated difference with a predetermined first threshold if the larger input data is statistically larger than the smaller input data and compares the calculated difference with a predetermined second threshold if the larger input data is not statistically larger than the smaller input data. Only if a calculated difference does not exceed the predetermined first threshold or the predetermined second threshold, then the encoding process of composite offset-difference coding replaces the larger input data with the calculated difference, and encodes the calculated difference and the smaller input data. The encoding process of composite offset-difference coding also generates an identifier if the calculated difference does not exceed the predetermined threshold as well as an indicator if the larger input data that is replaced by the calculated difference is not statistically larger than the smaller input data. The decoding process of composite offset-difference coding decodes encoded data in an input data stream by first detecting whether an identifier exists, then detecting whether an indicator exists if an identifier is detected, and decoding the encoded data in response to detecting whether an identifier exists or whether both an identifier and an indicator exist. The offset-difference coding process and the composite offset-difference coding process minimize the number of bits required to encode a data by further reducing the value of the data, thus improve the coding efficiencies and compression performance of LZ-based compression methods without compromising the memory resource or compression/decompression speed.