The present invention relates to data compression, and more particularly to dictionary-based data compression.
Data compression is an area of active exploration and development in conjunction with a variety of applications. Data compression refers to a data conversion wherein the converted data is represented in fewer bytes than the unconverted data without information loss. Such conversion is possible, for example, because standard codes utilize more bits for letters, words, and phrases than are actually necessary to represent the information therein. Presently, data compression is perhaps of most interest in the fields of communication and storage. In communications, data compression results in lower transmission time and therefore reduced communication cost. In storage devices, compression results in the ability to store more information in a given physical storage area.
Prior data compression schemes have included techniques for converting textual data to numeric data on a word-by-word basis. Such techniques are illustrated in U.S. Pat. Nos. 4,295,124 issued Oct. 13, 1981 to Roybal and entitled COMMUNICATION METHOD AND SYSTEM; 3,393,270 issued July 16, 1968 to Simjian and entitled COMMUNICATION SYSTEM EMPLOYING CHARACTER COMPARISON AND CODE TRANSLATION; and 4,558,302 issued Dec. 10, 1985 to Welch and entitled HIGH SPEED DATA COMPRESSION AND DECOMPRESSION APPARATUS AND METHOD. Although these techniques produce some data compression, they do not provide the compression rates desired in many segments of the industry.
Another compression technique is frequency-based compression. This technique is adaptive to the character stream and provides compression for the most frequently occurring information. Again, the achieved compression rates are not as high as desired.
Other data compression techniques remove unnecessary bits and/or bytes (e.g. blank spaces) from the data stream. However, these techniques provide relatively little compression in comparison to present industry demands.