Data compression and decompression is a highly useful technique which finds considerable application in image facsimile transmission and storage systems. Data compression and decompression is a technique for reducing the redundant transmission of data. For example, in the reproduction of an image of printed material in which data is derived to represent the presence or absence of print within incremental areas in a rectilinear matrix, it is unnecessarily burdensome on the data processing system employed to require transmission of redundant data, such as would occur in the generation of separate signals for each incremental area in, for example, the margin of the printed page to be reproduced.
One method of data compression is the assignment of variable length binary symbols to each possible data message to be transmitted. The arrangement of these symbols where the length of each symbol is inversely proportional to the probability of the occurrence of the message it represents is termed a minimum redundancy code. That is, the most probable message may be represented by a single binary bit, the second most probable message may be represented by two binary bits, and the third fourth and fifth most probable messages may be represented by four binary bits each. The binary symbols associated with each message are called "Huffman Codes" and are described in the publication by David A. Huffman "A Method for the Construction of Minimum Redundancy Codes", Proc. I.R.E., September, 1952, Volume 40, pp. 1098-1101. A portion of a Huffman minimum redundancy code structure is set forth below in Table I.
TABLE I ______________________________________ Symbol Index Probability Symbol Size Symbol ______________________________________ 0 .621 1 0 1 .078 3 100 2 .037 4 1010 3 .033 5 10110 4 .030 5 10111 5 .026 5 11000 6 .023 5 11001 7 .019 5 11010 8 .017 6 110110 9 .014 6 110111 . . . . . . . . . . . . 31 .002 10 1111111111 ______________________________________
Several characteristics of the employed minimum redundancy code formating should be noted. One feature is that minimum binary values are used to define valid codes among binary symbols of a particular code length. The maximum binary values of symbols of that code length are not valid symbols, but are instead assigned as the initial bits of binary symbols having a larger number of bit positions. More specifically, and by way of example with reference to Table I, there are two bit values having a single bit position. These values are "zero" and "one". The employed minimum redundancy code format will assign the code "zero" to the most probable message to occur, while the code "1" is not a valid binary symbol, but instead is the prefix of the symbol "100". It is fundamental in this regard that a valid binary symbol of a particular bit length cannot employ the same bit value permutation which serves as the initial bit structure or prefix of a valid binary symbol of a greater bit length.
Another feature of minimum redundancy codes is that there is a predictable number of different binary symbols of each symbol length within a minimum redundancy format containing a specified number of different binary symbols with a known probability of occurrence. That is, and with reference to Table I, in a minimum redundancy code format containing 31 different binary symbols with known probability, there is one binary symbol having a bit length of 1, there is one binary symbol having a bit length of three, there is one binary symbol having a bit length of four, and there are five binary symbols having a bit length of five. The number of valid binary symbols of each bit length within a particular minimum redundancy code format having a prescribed probability distribution is certain. However, where the probability of occurrence within a minimum redundancy code format varies, the number of binary symbols of each length may also vary. That is, for example, the code format of Table I contains no binary symbols two bits in length. However, one binary symbol two bits in length could be employed in a minimum redundancy code format having a higher probability for the occurrence of the second symbol.
Heretofore, techniques of data compression in transmission and storage applications have often avoided utilizing the existing compression capability of minimum redundancy codes because of the expense and complexity of their implementation. While the degree of compression achieved is highly variable and depends to a large extent upon the observable activity, of events to be coded, minimum redundancy code format size, the electronic components employed and other features, target compression ratios in excess of other techniques are readily achievable under most conditions.
It is an object of the present invention, therefore, to improve the degree of data compression obtainable in data transmission under comparable operating conditions and utilizing a common minimum redundancy code as contrasted with prior data compression/decompression techniques.
Yet an additional object of the invention is to effectuate data compression and decompression without the necessity for storing the entire array of binary symbols within a minimum redundancy code format. Prior art devices require large storage capacity memories, such as large ROMs to store the lengthy binary symbols associated with low probability events. This requisite storage capacity exists in prior devices despite the fact that large memory storage areas are left vacant by the higher probability binary symbols. According to present practice, binary symbols are stored in a memory storage device, such as a ROM at an addressable location or a plurality of locations. In response to a message input, the minimum redundancy binary symbols are accessed out of the ROM. Either an excessively large ROM is required to access out the long binary symbols in a single parallel output, or recirculating techniques are required to access out a single binary symbol in response to a message input. In either event inordinately large data storage capacity is required.
The present invention obviates the entire binary symbol storage problem, however, by eliminating the necessity for storing the minimum redundancy binary symbols. Instead, the present invention takes advantage of the fact that no valid binary symbol can employ the bit permutation of a shorter valid binary symbol as a prefix. It is thereby possible to store the required symbol lengths and to regenerate each symbol as it is required for use rather than to store the entire variable length code ensemble. This regenerative technique is especially powerful in decoding where table or ROM addressing by symbol would require storing a table with data in only one out of every thirty two locations with respect to the smallest symbol in the case of the minimum redundancy code of Table I.
One prior proposed system for minimum redundancy code processing of an array of 69 variable length message requires read-only memory storage for a two level code ensemble with the necessary bit length of the maximum length symbol. This requires a 69 by 10 ROM, a size table (69 by 4 bit ROM), a threshold table for each size against the maximum length (20 by 10 bit ROM), and a translation table (69 by 5 bit ROM). For the same code ensemble, the system of the present invention requires read only memory storage for only the count of symbols for each size (20 by 4 bit ROM), and a two way translation table (138 by 5 bit ROM). In addition the aforesaid prior proposed system requires internal registers to be of the maximum symbol length of 10 bits. The present invention, on the other hand, requires only four and five bit internal registers.
In the foregoing prior proposed method of encoding minimum redundancy codes, a 10 bit shift register and a four bit size register are required to encode the required minimum redundancy bit format of Table I. The shift register is loaded with the desired symbol followed by zero bits to fill out the register. The size register is loaded with the symbol length, both values being obtained from ROM storage. The symbol is then shifted out one bit at a time from the shift register while the size register is decremented. The process is continued until the size register becomes zero, resulting in the transmission of the appropriate length symbol. A corresponding proposed decoding technique recognizes the presence of a minimum redundancy code in a particular size symbol only when the binary code representation of the symbol is smaller than the stored binary encoded threshold value of minimum redundancy codes of that symbol size. That is, with reference to Table I, if the binary symbol is five bits in length the designation "10110" is stored along with a number indicative of the relative value of the specific symbol of that symbol size which is the symbol to be identified. Again with reference to Table I, the prior proposed technique for storing the symbol of index number "6" would require binary storage of the threshold value of minimum redundancy codes of five bits in length, i.e. the value "10110" along with associated storage of the number "100" which denotes that the identified symbol is the fourth sequential one of those symbols which are five bits in length. Symbols of size 1 and 2 bits beginning with zero are decoded by separate circuitry. Symbols beginning with a 1 are shifted bit by bit through a 10 bit shift register. A subtraction of the shift register contents from the threshold is made following each shift. If the subtraction result is positive, a run length is decoded. The shift register is reset after decoding. If a negative number results, the shift procedure continues.
The present invention represents a vast improvement over this prior operation. The present invention recognizes that the threshold minimum redundancy code values for each binary symbol length can be regenerated even if only the symbol size count is stored instead of the symbol size threshold code value. This involves storage of much smaller binary numbers. Storage of the symbol size count in lieu of the symbol size threshold value is possible only when one realizes that the size threshold value can be regenerated within the confines of a register containing only a small portion of the binary size threshold value. The sequential regeneration of threshold values corresponding to symbols of increasing size can be performed within the confines of shift register of size log.sub.2 n bits where n is the total number of messages. The carry function performed by the addition of bits to the least significant bit position of such a register will in no way even affect the value of bits already shifted out of the register. However, for the employed minimum redundancy code formating, any effect of the carry function will occur within the least significant log.sub.2 n bit positions, so that bits of a binary symbol threshold preceding the log.sub.2 n least significant bit positions will never be altered.
A further object of the invention is a reduction of register sizes and a reduction of required ROM memory size in the compression and decompression of data using minimum redundancy codes. The size reduction achieved is in the order of two to one for the implementation of minimum redundancy code formats of 69 codes. The present invention allows much larger code lengths to be processed, since no register is required to be as long as the maximum code length.
An additional object of the invention is to combine both the encoding and decoding functions into the same arrangement of hardware registers and ROM storage. This is possible because decoding involves a regeneration of the size threshold value, which is the same technique utilized to derive the minimum redundancy binary symbols in encoding.
A related object is to increase the flexibility of system application. It is possible to utilize the same hardware arrangement with a simple change in ROM contents. Thus, different code ensembles may utilize with the same hardware configuration. Any smaller code ensemble can always be substituted for a larger one, and the concept can be scaled up easily to expand the code ensemble.
Another object of the invention is to economically provide for the storage of more than one minimum redundancy binary symbol ensemble or format so that the optimum choice of a compression scheme can be made for each image to be prepared. With the elimination for the necessity of creating and storing symbols in advance, the possibility of scanning a document, compressing the data into temporary storage, evaluating the probability statistics, and then creating the optimum minimum redundancy code format for that particular document in real time as it is being transmitted is entirely feasible.
Comprehension of the underlying concepts of the invention and of the particular techniques employed in the implementation thereof may be enhanced by explanation with reference to the accompanying drawings.