Lossless data compression is known. Lossless data compression may be used where ever a data file must be passed through some sort of process where the process itself limits the volume of the data yet still requires that the data be accurate. Examples of such a process include data storage and retrieval or data transmission systems.
Where the limiting process is data storage and retrieval, the object is to be able to store large amounts of data in a memory device that would otherwise be too small for the data. A processor associated with the memory device first compresses the data under an appropriate compression process for storage within the memory device. The same or another processor may decompress the data using a similar process.
Where the process is a communication system, the data inputs to a voice channel may include a real-time voice signal and control information necessary to maintain the voice channel. The communication system may require a certain information transfer rate for each voice signal on each channel to maintain acceptable voice quality in the presence of a variable rate of control information. Data compression (e.g., Huffman coding) may be used in such a case to pack the control information and voice signal into an allocated spectrum without loss of voice quality.
Where the process is a data transmission system and includes data and control information, exclusively such as for interconnections between computers, then data compression may become even more important. In such cases data compression becomes important not only to reduce the cost of maintaining such connections but also to optimize the value and utility of such connection between geographically diverse data processing and storage facilities.
As an adjunct to the benefits of data communications between computers, data compression has also become important in the area of data storage. Data comprised for purposes of communication may also be stored in the compressed state and vice versa. Improved communication efficiency and storage makes remote storage and retrieval practical at geographically diverse research or manufacturing facilities.
Data compression has also had an impact on small portable data processing devices. As data processors (e.g., laptop, notebook, or palmtop computers) have become smaller and more portable and product features more dependent upon stored software, it often becomes necessary to more efficiently use the higher capacity hard disk drives and floppy disks that have also become available.
Because of the availability of the larger drives, specialized portable data processing devices (e.g., data loggers) have become practical in applications (e.g., electroencephalography, electrocardiography, etc.) not practical in prior years. Analog data, that previously had to be stored under an analog format on an audio tape and later converted to a digital format and processed for relevant information may now be stored digitally. The specialized data loggers now typically process and store the information digitally on a hard disk drive. Unfortunately, data compression methods have not kept pace with the recent developments in data logging.
Lossless data compression is generally implemented using one of two types of modeling. The first of the two types of modeling is referred to as statistical modeling and the other is referred to as dictionary based modeling. Statistical modeling is a compression technique based upon statistical probability of use of a particular symbol in any given time period. The more likely that a particular symbol may be used results in fewer bits being used to describe the symbol.
Statistical modeling (e.g., Huffman coding, Shanon-Fano coding, etc.) is typically based upon use of a look-up table (static model) of compression codes. Compression codes (and the look-up tables containing the compression codes) are often modeled to the type of data that is to be compressed and is often adapted during use.
During use, as each symbol is received by a compressing processor, the symbol is compared with the look-up table and the compression code retrieved and substituted into an output stream in place of the received symbol. The compression code may then be transmitted or stored depending on the process for which compression is being used.
Decompression of data under statistical modeling is also based upon a look-up table. Again, as a compression code is received (i.e., from memory or from a communication receiver) the code is compared with a look-up table. A corresponding symbol is identified from the look-up table and used as an output of the decompression process.
Dictionary based compression schemes are also based upon look-up tables. Dictionary based compression schemes, on the other hand, look for groups of symbols within the look-up table. As input symbols are read, a compressing processor looks for groups of symbols that appear within the dictionary. If a match for the symbol-string is found, a pointer or index identifying the string is output. The longer the match, the better the compression ratio.
Recent improvements in dictionary compression schemes (e.g., LZ77 developed by Jacob Ziv and Abraham Lempel) rely on an adaptive dictionary. Under LZ77 currently received data is compared with a dictionary of previously transmitted data. Where symbol-string matches are found, the matches are encoded as dictionary pointers into the output stream.
Prior compression techniques (i.e., statistical and dictionary compression methods), while effective, are based upon an assumption that the transmitted data has repetitive elements that may be used in compressing the transmitted signal based upon an appropriate algorithm for identifying the repetitive elements. Where the transmitted data does not contain repetitive elements then the prior art compression techniques break down.
In data logging, for instance, where the events logged contain certain random elements (e.g., noise, or other nonrepetitive events), compression techniques are not especially effective. Because of the importance of data gathering, a need exists for a compression technique that is effective with nonrepetitive data.