1. Field of the Invention
The present invention relates to methods for compressing digital data. More particularly, the present invention relates to a method for compressing digital data using flag bit partitioning.
2. Discussion of Background
Information processing systems, data transmission systems and the like frequently store large amounts of binary data in a mass memory storage device or transfer binary data from one memory storage device to another. Memory storage devices include tape drives, hard disk drives and other magnetic or optical media, all of which have a limited amount of space. To make more efficient use of the fixed storage capacity of memory storage devices, methods have been developed to "compress" the stream of data before it is stored. "Compressing the data" means that data is not stored literally but rather, where possible, some of the data is replaced with shorter expressions of it. These shorter expressions can be decoded to restore the data to its original, literal condition when the data is brought from storage.
There are two major families of data compression methods. Both of these families are derived from methods developed by Ziv and Lempel. The first family of methods is known as LZ77 and the second family is known as LZ78. Both methods compress the data stream by dividing the input data stream into a sequence of data strings (each string typically being at least one byte in length) and then replacing strings that repeat a previous string with shorter codes to indicate the string that is replaced is a duplicate of its predecessor.
In the LZ78-based compression method a dictionary of data strings is built when the strings are read for the first time. When data strings are encountered a second time, they are then represented in storage by short codes that reference the location of the string in the dictionary. One of the most widely-used LZ78-based compression method is Lempel-Ziv-Welch (referred to as LZW), which is described in U.S. Pat. No. 4,558,302, issued to Welch on Dec. 10, 1985.
In contrast to the dictionary used in LZ78-based methods, LZ77-based compression methods use previously read input data byte strings as the dictionary. Therefore, the codes representing the repeated data byte strings consist of "pointers" that point to matching strings of data bytes, rather than indexes in an independent dictionary. The pointers are an ordered pair of values representing a length and an offset. The length indicates the number of data bytes in the string being repeated while the offset indicates the location of the initial data byte in that string.
In this compression method, the pointer is sometimes longer than the string of data bytes being represented, creating data "expansion" for that string. Consequently, a variation of LZ77-based compression methods was introduced by Storer (see U.S. Pat. No. 4,876,541) and Szymanski to eliminate this problem. In their method, which is known as LZSS, symbols taken directly from the input data stream are used whenever a pointer would be longer than the repeating data string being represented. Also, a flag bit is added to each pointer and each data byte to distinguish them.
Although the LZSS system compresses data into a smaller amount of memory, the addition of flag bits required to distinguish pointers from data bytes causes LZSS-based compression and subsequent decompression to occur more slowly, because the flag bit "bumps" the eighth data bit of each byte to the first position of the next register. In decompressing the data, the bits must be read bit by bit. Thus, there is a need for improving the speed of data manipulation in LZSS-based compression and decompression methods without significantly decreasing the degree of compression.