The present disclosure relates to systems and methods for processing digital data. More particularly, the invention relates to systems and methods that improve data transmission efficiency.
Printer manufacturers, such as Hewlett-Packard, place great emphasis on cost minimization without sacrificing function or performance. Printers generally receive data to be printed in a high level language (HLL) or a page description language (PDL). The printers must receive this data, load any non-resident fonts identified by the print language, and convert the data stream into a recognizable data stream for the particular print engine. Conventional printers rely on various combinations of hardware, firmware, software, and internal memory to convert the received data stream into a printed product. The internal memory is often a random access memory (RAM) and read and write operations to the RAM often present a data processing bottleneck in the printing system.
The amount of RAM required in a particular printer design is directly related to the memory required to meet the needs of the height of the print line that the print engine can produce. For example, entry level laser printers may include a print engine capable of generating a xc2xexe2x80x3 high strip of printed information. In order to generate a printed page, the printing system may receive, translate, and buffer that amount of data required to generate a plurality of the xc2xexe2x80x3 high strips. Once a predetermined amount of information is available in a format understood by the laser mechanism, the printer may be configured to start the laser.
Conventional laser printing mechanisms cannot be stopped once the printing process has been initiated. Once the laser is started, the data processing system is in a race with the laser mechanism to continuously process the high level data sent by a host computing device into a format recognizable by the laser mechanism before the laser mechanism processes the buffered data. If the data processing system fails to remain ahead of the laser mechanism, the laser mechanism is generally programmed to abort the printing process. Consequently, printer manufacturers attempting to lower the per unit cost of their printers are interested in further exploiting data compression techniques in order to increase data processing efficiencies thereby reducing the amount of RAM required in the print system to maintain the laser mechanism. The resulting reduction in the amount of RAM required reduces the per unit cost of the associated printer.
With this and other continuous demands for improvements in data transmission efficiency and data storage capacities, improved data compression techniques are continually sought. Of the many classes of data compression, one of the most useful is the class of dictionary based compression techniques. Among these, the most useful today are the so-called Ziv-Lempel variable-length encoding procedures credited to J. Ziv and A. Lempel who suggested the xe2x80x9cLZ1xe2x80x9d length offset encoding scheme. The LZ1 process uses a sliding xe2x80x9chistoryxe2x80x9d window when processing past source data strings as the dictionary. Matches are encoded as a xe2x80x9cmatch lengthxe2x80x9d and an xe2x80x9coffsetxe2x80x9d from an agreed position.
Because LZ1 scrolls the source string over a fixed sized sliding history window to create an adaptive dictionary, identification of duplicate xe2x80x9cmatchingxe2x80x9d strings in the source data is at first difficult, but becomes very efficient over time. Once a matching string is encoded as a xe2x80x9clengthxe2x80x9d and xe2x80x9coffset,xe2x80x9d the necessary decoding process is rapid and efficient, requiring no dictionary pre-load.
Sliding window data compression processes suffer from what may be called xe2x80x9cstart-up lossesxe2x80x9d and xe2x80x9cnon-redundancy lossesxe2x80x9d in compression efficiency. Because each source string or block begins with an empty xe2x80x9cdictionary,xe2x80x9d the first source symbols must be transmitted as raw words without compression. Similarly, a string of input data which has already been encrypted or compressed and lacks substantial redundancy, lacks the matches required for compression and the source symbols must also be transmitted as raw words without compression.
Only after accumulating a substantial dictionary, by having the sliding window fill up with input data having substantial redundancy, are matches found for increasing numbers of sub-strings which allow encoding efficiency to increase to desired compression ratios.
In the original LZ1 arrangement, called, xe2x80x9cLZ77,xe2x80x9d all source input is output in the form of a tripartite token having the length and offset together with a flag, which is the first character of the compressed sub-string. Techniques known as adaptive data compression (ALDC) overcome the problem when a non-redundant character is encountered by not sending the three part token, but instead providing the character unchanged, and providing it with a designation to indicate that it is not compressed. The unchanged raw character together with the designation is called a xe2x80x9cliteral.xe2x80x9d A typical designation is an added xe2x80x9czeroxe2x80x9d bit for each word of the source string. Thus, when encountering a string of non-redundant input data, the compression is expanded by a much smaller length than is likely with the original LZ1 technique. However, LZ1 techniques such as ALDC still must actually expand the data by one bit for every word, typically a 9/8 expansion to output them as literals.
Because of this problem, alternative dictionary based compression techniques have been designed to offer special advantages in particular circumstances. An example is LZ2 compression (also known as LZ78 or the related version known as LZW) which captures redundancies and maintains them in a dictionary, as described in xe2x80x9cThe Data Compression Book,xe2x80x9d M. Nelson, M and T Publishing, 1991, pp. 277-311, the contents of which are hereby incorporated by reference. Thus, the opportunity for having redundancies is expanded, at the cost of an expanded dictionary buffer. In LZ2, the expansion for literals may be more than one bit.
In addition to the aforementioned LZ compression techniques, a host of other known data compression techniques are known. For example, delta row encoders monitor changes in a bitmap used to describe a subsequent row or string of data from a bitmap used to describe a previous row. Under certain conditions, (e.g., slowly changing data from row to row) a delta row compression scheme may be particularly efficient. Other known data compression techniques such as Huffman and run length encoding may be well suited to other data conditions.
U.S. Pat. No. 6,008,743 describes a method and apparatus for switching between data compression modes by comparing the data compression efficiencies of the compression modes. The ""743 Patent uses a multi-bit mode switch character to controllably select a particular data compression mode. Significantly, the switching technique described in the ""743 Patent enables a data compression mode switch only upon an occurrence of an accumulated comparison measure reaching a threshold value.
Stated another way, a straight comparison is not used in formulating the switching decision. Rather, switching between modes takes into account the potential cost of the swap (in terms of the length of the added special character indicating the switch). More specifically, the apparatus disclosed in the ""743 Patent does not switch between data compression modes until the savings (in terms of the length of the data saved) in making the switch is likely to meet a threshold which is directly related to the cost of the switch. Thus, the modes for compressing the input data are switched only upon the comparison indicating the compression efficiency of a proposed mode is less than the compression efficiency of the present mode by the threshold value.
U.S. Pat. No. 5,870,036 describes a method and apparatus for compressing data using a plurality of data compression mechanisms. Representative samples of each block of data are tested to select an appropriate one of a plurality of data compression mechanisms to apply to the block. The block is then compressed using the selected mechanism and the compressed block is provided with an identifier of the selected mechanism. In accordance with the teachings of the ""036 Patent and similar to the teachings of the ""743 Patent, if the best sample compression does not satisfy a threshold condition (e.g., 30% compression), then it is decided not to compress the data.
In an associated system for decompressing data, the coding identifier of the selected compression mechanism is examined, identified, and used to select an appropriate one of the plurality of decompression mechanisms to apply to the block. Multi-bit control codes and compression system identifiers provided in the output data as described above to indicate a special situation in data handling techniques, such as switching between compression modes, degrade the efficiency of the compression by the length of the character. From the above, it will be appreciated that it is desirable to provide further advances in data compression/decompression systems.
In response to these and other shortcomings of the prior art, the present invention relates to systems and methods for a data compression/decompression data processing system. Briefly described, in architecture, a data compressing system can be implemented with an input memory device configured to receive a data stream; a plurality of data compressors coupled to an output of the input memory, each of the plurality of data compressors configured to apply a data compression algorithm to the data stream to generate a respective compressed version of the data stream; and a controller coupled with each of the plurality of data compressors. In preferred embodiments, the controller is configured to determine which of the plurality of data compressors is most efficient over a select portion of the data stream, to create a record of the selected data compression algorithm, and to forward the data.
A related data decompressor may be implemented with an input memory device configured to receive a compressed data stream; a controller coupled to an output of the input memory, the controller configured to generate a control signal responsive to the data compression algorithm used to compress the data stream; and a plurality of controllably selectable data decompressors coupled to an output of the input memory, wherein a particular data decompressor is selectively coupled to the input memory in response to the control signal.
Some embodiments of the data compression/decompression system can also be viewed as providing methods for reducing memory requirements in a communication system. In this regard, a method for compressing data may include the following steps: a) receiving a data stream; b) applying a plurality of data compression algorithms to the data stream; c) determining which of the plurality of data compression algorithms was most effective; d) buffering the compressed data; and e) generating a record indicative of the applied data compression algorithm. Conversely, a method for decompressing data may include: a) receiving a compressed data stream; b) identifying which of a plurality of data decompression algorithms is appropriate to apply over a sub-portion of the data stream; c) determining the length of the compressed sub-portion of the data stream; and d) applying the identified data decompression algorithm to the received compressed data stream for the length in bytes generated in the determining step.