The invention relates to data compression techniques for digital images, and more particularly, to an apparatus for encoding the ordinary and extended run-length codes required by such techniques.
A digital image is a two-dimensional array of image points, each of which represents the light intensity of a small area of a physical picture. For black/white images, each image point is a single bit of information with a value of either 0 or 1 to indicate, respectively, that the corresponding area of the picture is light or dark. These images are normally generated by scanning pictorial data, such as 8 1/2 .times. 11 inches documents. Thereafter, the scanned pictorial data can be stored, viewed from a display, transmitted, or printed.
A variety of data compression techniques have been devised for reducing the storage requirements for digital images, and for reducing the bandwidth required for their transmission. Most of these techniques are based on some form of run-length coding.
In its simplest form, run-length coding of images involves two steps. First, there is the partitioning of each row of the image array into a sequence of runs, with each run comprising one or more adjacent image points with the same binary value, i.e., 0 or 1. Second, it is necessary to replace each run of image points with a single integer that specifies the length of the run. For example, a run of 10 successive image points with the value of 0 can be replaced by the single integer 10. It is not necessary to identify explicitly the binary value of each run. It is sufficient to specify the binary value of the first run in each row, since the binary values of successive runs alternate between 0 and 1.
More efficient run-length coding techniques use a variable-length binary code word, rather than integers, to represent the lengths of the various runs. The run-length codes used with such techniques are designed so that the shorter code words are used to represent more frequently occurring runs and the longer code words are used for less frequently occurring runs. For typical applications the runs of lengths 1 to 5 occur most frequently, and the probability of occurrence for successively longer runs tends to increase steadily thereafter. There is one single exception, that is the longest possible run. Such a run can, for example, result from an all-white line on a printed page, which occurs frequently. Since the probability of occurrence of a run tends to decrease with the length of the run, the length of the code word used to represent a run generally increases with the length of the run. For example, a run of length 20 is normally represented by a code word that is longer than the code word used for a run of length 10.
A slightly different group of run-length coding techniques have been used when the number of image points with a binary value of 0 far exceeds the number of image points with a binary value of 1. These techniques partition each row of the image array into a number of runs of 0's, each separated by a single 1. Then, only the runs of 0's are encoded. Although it is sometimes necessary to encode the run of no 0's that appears between two adjacent 1's in a row of the image array, it is not necessary to encode any runs of 1's. This strategy is particularly effective when used in conjunction with predictive encoding, which transforms an original image array into a new array that includes few 1's. See, for example, L. Bahl et al., U.S. Pat. No. 3,769,453, "Finite Memory Adaptive Predictor."
Finally, a few sophisticated data compression techniques for images use run-length codes that have been extended to include a number of special code words in order to represent certain special situation. These special code words are an addition to the regular code words used to represent runs. An example is the code described by I. Gorog et al in the article entitled An Experimental Low Cost Graphic Information Distribution Terminal, 1971 SID International Symposium of Technical Papers. Gorog's code includes three special code words for special situations. These special situations are the occasion that a run in one row of an image array either ends directly beneath the end point of a corresponding run in the previous row, or ends one position to the left or right of this end point.
The primary disadvantage of previous run-length coding systems is that they have used an ordinary or extended run-length code which represented a compromise along three coding objectives. The objectives are high efficiency for typical images, uniformly high efficiency for a class of images, and an economical implementation. In this regard, reference should be made to N. Abramson, "Information Theory and Coding," McGraw Hill Book Co., New York, 1963 at pp. 85-88 for a discussion for code efficiency. Abramson's efficiency measure is based upon the value of a symbol from an information source S, which can be measured in terms of an equivalent number of binary digits needed to represent one symbol from that source. The average value of a symbol from S is denoted by H(S). Note that ##EQU1## where p.sub.i is the probability of the ith source symbol. Given that l is the average code word length for any uniquely decodable code for the source, it is the case that l cannot be less than H(S). Accordingly, the efficiency of the code is the ratio of H(S)/l.
Taking the above coding objectives into account, the most easily implemented run-length code, which uses the fixed-length binary integer i as a code word for runs of length i, is not nearly as efficient as a variable-length code. On the other hand, the most efficient extended run-length code possible for a sample of images is the Huffman code based on the relative frequencies of runs and special situations in a sample of images. However, since run-length codes for images typically require 1,000 to 5,000 code words, the Huffman code is normally difficult to implement.
A second disadvantage of most previous run-length coding systems is their inflexibility -- they are only able to implement a single code specifically designed for a particular data compression technique and a particular class of images. Although a few previous systems can be adjusted to implement the new code required for a new data compression technique or a new class of images, these systems must use an amount of table storage that is proportional to the number of code words. Since ordinary and extended run-length codes require 1,000 to 5,000 code words, this table storage is quite large, and hence is expensive.
A third disadvantage of most previous run-length coding systems is their awkward treatment of the special code words required by extended run-length codes. In most cases these special code words have necessitated a great deal of special circuitry.
A fourth disadvantage of most previous run-length coding systems is that the encoder serves both to identify the lengths of successive runs and to generate the code words appropriate for these lengths. Such encoders must generate, for each bit of a run received, the code word that would be appropriate if the run should end with this bit. This generation of unnecessary code words is quite time-consuming.