Data compression schemes are widely employed in many areas including the fields of communication systems and computer systems. Where communication systems are concerned, for example, one way to improve communication bandwidth is to transmit data in a compressed format. This permits the data to travel efficiently along data networks, such as local area networks (LANs), wide area networks (WANs), and the like without unduly constraining network resources. In this regard, innovative data compression and decompression schemes have evolved to improve upon communication bandwidth. In the field of computer systems, it is also desirable to compress data as this offers greater storage capabilities. Thus, for a storage device of a given capacity, more information can be stored in a compressed format compared to an uncompressed format.
The primary objective of data compression technologies is to minimize the amount of data transmitted or stored. Most compression schemes operate by detecting repeatable patterns or redundancies in the data and leveraging these patterns to compress the data. Generally speaking, the greater the redundancy the more efficient the compression scheme because redundant data may be represented with fewer bits, thereby reducing the total number of bits necessary to represent the information.
While there are a variety of compression schemes known in the art, they can all be considered to fall within one of two major categories: “lossless” or “lossy”. A “lossless” data compression technique is employed when it is imperative that the restored data be identical to the original data—that is, when one can ill-afford to lose a single bit of data during the compression/decompression process. Situations in which a lossless technique is necessary include, for example, the compression of executable code, word processing files, tabulated numbers, etc. On the other hand, if absolute data integrity is not essential and some degradation from the original data can be tolerated, then a “lossy” compression technique may be preferred. Lossy compression methods, such as those promulgated by the Joint Photographic Experts Group (JPEG) and the Motion Pictures Experts Group (MPEG) are commonly used to manipulate digitized video and voice data, image files and the like, while lossless compression techniques are commonly used in computer systems to maximize the storage capacity of media such as hard disks. To this end, well known lossless compression methodologies which are both statistical and dictionary-based include Huffman coding (symbol-entropy-based), run-length encoding (RLE) or a modified form of RLE such as the Lempel-Ziv-Welch (LZW) algorithm (dictionary-based), or string table compression.
Designing compression and decompression algorithms that will excel in most situations is exceedingly difficult. Often, the best approach is to identify the type of data to be compressed and design an algorithm particularly suited for that data type, with the understanding that the algorithm will likely fail if applied to data of a different type. This concept can be appreciated, for example, when compression schemes such as PKZip and Bzip are applied against data which has been previously encrypted.
In the past, various approaches have been taken to compress both patterned (or redundant) data, as well as random data. For example, the lossless LZW method is described in U.S. Pat. No. 4,558,302 to Welch. Another example of a lossless compression approach is described in U.S. Pat. No. 5,594,435 to Remillard. U.S. Pat. No. 5,488,364 to Cole relates to a recursive data compression approach in which data is reconfigured in a manner that increases bit redundancy and thereafter compressed in an iterative or recursive manner until the desired compression ratio is obtained.
Not surprisingly, compression of random or encrypted data is inherently more difficult than patterned or redundant data since there are usually no patterns to be found. Randomized digital data of this nature, thus, cannot typically be compressed by normal compression algorithms. U.S. Pat. No. 5,486,826 also to Remillard employs entropy adjustment in connection with the compression of randomized digital data, irrespective of whether a prior compression technique has been applied to the information. Entropy is a term which relates to the randomness of the information. U.S. Pat. No. 5,533,051 to James discusses a variety of data compression approaches, one of which appears to be particularly directed to compacting a stream of randomly distributed data. According to this particular scheme, the data stream is divided into a plurality of blocks of randomly distributed data, at least one of which is selected and divided in the first and second portions. The occurrences of a pre-determined word within the first portion of the block is counted and the data within the second portion is compressed.
Unquestionably, a great deal of research has been conducted in attempts to compress random or encrypted data, as well as data which has previously been compressed. The above-mentioned patents provide some evidence of the amount of work performed in these areas. Aside from patented technology, information also periodically percolates in the field making lofty claims of compression capabilities, but it can sometimes be difficult to validate the veracity of such claims, or even the existence of programs which reportedly achieve them.
The pervasive nature of compression/decompression schemes illustrates the continuing need to provide improved approaches for effectively compressing and decompressing data. While various compression (and decompression) algorithms exist which are suitable for use with patterned data, a need particularly remains for a compression algorithm which will also reliably compress random data. Along these same lines, it is desirable to have both a compression and decompression scheme for use with random data, and particularly one which is flexible in nature to allow a user to selectively tailor various parameters to suit his/her particular needs.
The foregoing examples of the related art and its/their related limitations are intended to be illustrative and not exclusive. Other limitations may become apparent to those practiced in the art upon a reading of the specification and a study of the drawings.