1. Technical Field
The present invention relates to a method and system for data processing in general, and in particular to a method and system for compressing files. Still more particularly, the present invention relates to a method and system for compressing files utilizing a dictionary array.
2. Description of the Prior Art
Dictionary-based compression algorithms that are derived from the classical Lempel-Ziv scheme find their applications in various compression software, ranging from the age-old UNIX "compress" utility to the more recent gzip, pkzip, and winzip compression programs. In a dictionary-based compression algorithm, an input file is typically examined sequentially on a byte by byte basis for unique patterns. Each unique pattern found in the input file is then stored in a dictionary in association with a compact label. During the course of the examination of the input file, when a repeat pattern is found, the repeat pattern will be replaced by its corresponding compact label. Accordingly, the more repeat patterns that are found in the input file, the higher the compression ratio will be. The sequence of data, including compact labels, is subsequently stored along with the dictionary as a compressed file. To decompress the file, each compact label in the compressed file is located in the dictionary, and the corresponding pattern is inserted into the file in lieu of the compact label, thus restoring the compressed file back to its original form.
Because the dictionary is stored along with the data as part of the compressed file in the dictionary-based approach, the size of the dictionary is the limiting factor for achieving a good compression ratio. In fact, the dictionary size is such a dominating factor to the compression efficiency that a compression ratio of better than 50% is generally unachievable with randomly chosen binary files. Another drawback associated with the dictionary-based approach is that the overhead for writing and reading the dictionary to perform the compression and decompression, respectively, is relatively large.
With the ever increasing demand in data throughput rates in today's information age, data compactness is very important. For example, in applications such as palm-top computers or personal digital assistants where the space for data storage is rather limited, it is very crucial to have a relatively high data compression ratio for data that can be stored in a compressed form. As a result, any improvement in compression ratio to yield a more compact form of data for storage and transmission can result in a considerable competitive advantage. Consequently, it would be desirable to provide an improved dictionary-based method for compressing files within a data-processing system.