This invention relates to data compression for imaginal data processing systems and, more particularly, to digital data compression for binary raster scanned imaging systems and the like.
Documents (e.g. printed and written pages, drawings, and photographs) essentially are more or less continuous, two dimensional patterns of reflectance. Accordingly, imaginal data processing systems classically include raster input scanners for serially remapping or converting the information content (i.e., graphic images) of input documents into corresponding, one dimensional video signals and raster output scanners for serially printing replicas or facsimiles of the input documents in response to the video signals. There are hybrid systems in which provision is made for converting to or from a raster scan video signalling format so that raster input and output may be interfaced with devices having other signalling formats, such teletypewriter terminals using an ASCII code. Usually, however, raster input and output scanners are employed in complementary combinations to form so-called raster scanned imaging systems.
Raster input and output scanning feature a characteristic scan structure, whereby a graphic image is represented by a video signal containing a predetermined number of picture elements (sometimes referred to as "pixels") for each of a plurality of substantially equidistantly spaced scan lines. Thus, the resolution of a raster scanner is customarily expressed in terms of a given number of scan lines/inch along, say, a vertical axis by a given number of picture elements or line pairs/inch along an orthogonal or horizontal axis. For example, the Xerox 200 Telecopier facsimile transceiver, which is manufactured and sold by Xerox Corporation, offers a choice of speed dependent resolutions which are conventionally specified (with reference to nominal document transmission times for a standard 81/2 inch .times. units inch document) in unuts of scan lines/inch vertically by picture elements/ inch horizontally as being approximately: 96 .times. 96 for document transmission times of 3 and 6 minutes; 64 .times. 96 for a document transmission time of 4 minutes; and 77 .times. 80 for a document transmission time of 2 minutes. While those are more or less standard resolutions for existing facsimile systems, it should be understood that they are close to the lower end of the useful range of resolutions for raster scanners in general. Significantly coarser resolutions are normally avoided because they involve an unacceptably high risk of losing essential image detail.
Raw video signals of the foregoing type commonly contain a significant amount of redundant information. Therefore, if a video circuit for a raster input or output scanner comprises a limited bandwidth transmission medium or a limited capacity storage medium, increased data handling efficiency can often be realized by including an upstream data compression stage for removing redundant information from the video signal and a downstream data decompression stage for restoring that information. Binary video signals are especially well suited to data compression and decompression because the picture elements are either black or white ("1" or "0"), thereby excluding all intermediate shades of gray. For that reason, substantial effort and expense have been devoted to the development of digital data compression and decompression methods and means.
Run length encoding and decoding have gained widespread attention as techniques for compressing and decompressing, respectively, binary video signals having a raster scan format. Basically, the encoding converts the white and/or black runs of a binary video signal into corresponding message codes, and the decoding reconverts those codes into white and/or black runs of the appropriate length to reconstruct the video signal. In that context, a "run" is defined as being an uninterrupted series of one or more picture elements at the same logic level, and the "length" of a run is determined by the number of picture elements therein.
To carry out the encoding, the message codes are preselected to uniquely identify the lengths of the encoded runs. Preferably, in keeping with the teachings of D. A. Huffman, "A Method for the Construction of Minimum -- Redundancy Codes," Proceedings of the I.R.E., September 1952, pp. 1098-1101, the message codes are of variable length (i.e., different bit counts) and are assigned to the run lengths which are to be encoded in accordance with a predetermined run length probability distribution to the end that the code assigned to a given run length is no longer than the code assigned to a less probable run length.
Unfortunately, however, an unbounded set of documents does not yield a meaningful run length probability distribution because the redundancy of all graphic images, as an unlimited class, is completely random. Thus, to take advantage of a run length probability distribution in assigning the message codes, it is necessary to focus on a subset of documents which share a common image characteristic. For example, to optimize the data compression provided for ordinary business correspondence, the run length probability distribution may be based on run length frequency statistics gathered by prescanning a relatively few sample documents composed primarily of alphanumeric characters. Of course, that subset still permits of sufficient variations in page coverage and formatting and in character size and style to warrant weighting the run length frequency statistics in favor of those samples which are subjectively judged to most closely approach a preconceived norm.
Others have recognized that the basic run length encoding process can be modified to achieve increased data compression. In general, the proposed modifications have been directed toward increasing the average length of the runs which are presented for encoding.
More particularly, H. E. White et al., "Dictionary Look-Up Encoding of Graphic Data," Picture Bandwidth Compression, ed. T. S. Huang, Gordon and Breach, 1972, pp. 267-281, suggest the encoding of the "derivative or transitional equivalent" of the original image. To accomplish that, the definition of a run is expanded to include not only an uninterrupted series of picture elements of one logic level, but also a single terminating picture element of the opposite logic level.
Another interesting proposal relates to a pre-encoding process known as difference modulation. To perform that process, corresponding picture elements for successive scan lines are differentially compared, thereby generating a binary prediction signal (hereinafter referred to as a difference modulated video signal) which distinguishes the picture elements for the later scan line which are at the same logic level as the corresponding picture elements for the preceding scan line from those that are not. Run length encoding of the difference signal can usually be carried out with relatively few message code bits because there normally is sufficient inter-scan line redundancy to cause the difference signal to have relatively long runs at a logic level indicating that the picture elements for the two scan lines are the same. However, there is the risk that errors made in recovering the picture elements for one scan line will be propagated through subsequent scan lines. Consequently, to limit the propagation of those errors, it is desirable to periodically encode and decode a scan line of raw or unmodulated picture elements, such as suggested in U.S. Pat. No. 3,830,966 of W. H. Aldrich et al., which issued Aug. 30, 1974, for "Apparatus and Method for Transmitting a Bandwidth Compressed Digital Signal Representation of a Visible Image."