1. Technical Field
The present invention relates generally to an improved data processing system, and in particular, to a method and apparatus for compressing data. Still more particularly, the present invention provides a method and apparatus for reducing dictionary sizes and the compression of data.
2. Description of Related Art
With more and more documents, graphics, video, and databases being created and used, storage space for storing these and other types of data has become an issue. Often times, the amount of data may exceed the currently available storage space, requiring removal of some data or acquiring additional storage space. Another solution to this storage space issue is the use of data compression to make more space available.
Data compression involves encoding data to take up less storage space. Digital data is compressed by finding repeatable patterns of binary zeros and ones. The more patterns can be found, the more the data can be compressed. Text can generally be compressed to about 40 percent of its original size, and graphics files from 20 percent to 90 percent. Some files can only be compressed by a small amount. The amount of compression that may occur depends entirely on the type of file and compression algorithm used.
Numerous compression methods are presently used. Two major compression technologies are Huffman coding and Lempel-Ziv-Welch (LZW), representing examples of the statistical and dictionary compression methods. These compression techniques are based on a dictionary approach. In all such techniques, repeating patterns in the input data are essentially replaced by the index numbers, referred to as a code word, of the patterns when they were first encountered. Since the compressed file includes the dictionary in the compressed file so that the file can be uncompressed back to the original state, the size of the dictionary also plays a role in the effective compression rate.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions to reduce dictionary sizes to improve compression ratios.
The present invention provides a method, apparatus, and computer instructions for compressing data. A segment of data within the data to be compressed is selected. A determination is made as to whether the data segment matches a previous segment within the data based on a transform performed on the data segment. The data segment of data is replaced with a code word in response to a determination that a match is present between the data segment and the previous data segment. These steps are repeated for subsequent data segments within the data until all of the data has been processed to form compressed data.