Data compression schemes were developed in order to reduce the amount of data transmitted. Reducing the amount of data that must be sent in order to transmit an image results in a faster transmission of the image. Images include pictorial images as well as textual images.
In an equal-length, or fixed-length, encoding method, each data element in an image to be transmitted is assigned a codeword of the same length. For example, an image to be transmitted may contain only four unique symbols W, X, Y, and Z, where W occurs fifty times, X occurs thirty times, Y occurs ten times, and Z occurs ten times. If each symbol is represented by a codeword of length two (e.g., 00, 01, 10, and 11, respectively) then it would take (50.times.2)+(30.times.2)+(10.times.2)+(10.times.2)=200 bits to transmit the image. Here, the length of the average codeword is 200/100=2.
In a unequal-length, or variable-length, encoding method, each data element in an image to be transmitted may be assigned a codeword of a different length. To minimize the total number of bits that must be transmitted, the shorter codewords are assigned to the data elements that appear most frequency in the image and the longer codewords are assigned to the data elements that occur less frequently in the image. By employing an unequal-length encoding method, the average length of a codeword transmitted is shorter than the average length of an equal-length codeword and, therefore, fewer bits are required to transmit an image. Using the example above and assigning codewords 0, 10, 110, and 111 to the symbols, respectively, it would take (50.times.1)+(30.times.2)+(10.times.3)+(10.times.3)=170 bits to transmit the image. This is less than the 200 bits required to send the image using the equal-length encoding method described above. In this example, the length of the average unequal-length codeword is 170/100=1.7.
Huffman coding is one form of unequal-length encoding method. Huffman coding is a method of constructing codewords for symbols in order to minimize the length of the average codeword. The frequency of occurrence of each unique symbol in the image must be determined. A codeword is assigned to each unique symbol in the image. To achieve the goal of minimizing the average length of the codewords, a complex equation is used to find the optimal way of assigning the shortest codewords to the symbols that occur most frequently in the image and assigning the longer codewords to the symbols that occur less frequently in the image. One of the difficulties in using an unequal-length encoding method where one codeword is assigned to each unique symbol to be transmitted is that the length of the codewords tends to grow rather quickly as the number of unique symbols grows. The longer the codewords, the more memory is required to store them. Storage may not be a problem for a user with a stand-alone mainframe computer that has nearly unlimited storage capability, but with ever increasing use of hand-held devices having ever increasing functionality, there is a need for a data encoding method that not only minimizes the number of bits to be transmitted but also minimizes the number of codewords required to encode data to be transmitted. The present invention is just such a method.
Data encoding may be used to encode each data element of an image to be transmitted, where the data element may be of a relatively high level of abstraction (e.g., a letter of an alphabet) or a primitive element (e.g., a dot, or pixel, or a certain color). Each data element may be encoded and transmitted. If the image to be transmitted contains a limited number of data elements (e.g., white pixels and black pixels in a facsimile transmission) where each data type element appears a number of times in a string, or run, of that data type then the data that must be transmitted may be greatly simplified. For example, a line of a facsimile may consist of 23 white pixels followed by 56 black pixels followed by 75 white pixels followed by 34 black pixels followed by 16 white pixels. In a conventional method, 204 codewords would be sent, one codeword per data element in the line. Instead, only five codewords need be sent, a codeword representing a run of 23 white pixels, a codeword representing a run of 56 black pixels, a codeword representing a run of 75 white pixels, a codeword representing 34 black pixels and a codeword representing 16 white pixels. Such an encoding method is referred to as run-length encoding. Run-length encoding greatly simplifies data encoding, but prior art run-length encoding methods still require a rather large number of codewords to represent all of the possible run lengths for each type of data element (e.g., pixels of any color). The prior art method described below illustrates this point.
The International Telecommunications Union (ITU) sets international telecommunication definitions and standards of facsimile equipment. One such standard is referred to as group 3 and is a standard for a facsimile device that enables a typical A4 (or United States 8.5 by 11.0 in.) page to be transmitted by a digital modem over a telephone-type circuit in one minute or less by employing digital-data compression techniques. Devices that follow the standard of group 3 are most relevant to the present invention. The ITU standard concerning group 3 (i.e., ITU-T Recommendation T.4) discloses the use of a modified Huffman code for unequal-length encoding. The modified Huffman code used in ITU Recommendation T.4 contains codewords that were generated using a numeral base of 64. One codeword is used to represent each unique data element to be transmitted. Two sets of 64 codewords each represent run lengths of 0, 1, . . . , 63 white and black pixels, respectively. Two sets of 27 of codewords each represent run lengths of 64, 128, . . . , 1728 white and black pixels, respectively. One set of 13 codewords represents run lengths of 1792, 1856, . . . , 2560 pixels of either color. Run lengths greater than 2624 pixels are indicated by words of 2560 pixels as needed. One codeword is used to represent an end-of-line (EOL) symbol. Therefore, to encode an 8.5 by 11.0 in. page (e.g., 5100 pixels at 600 dots, or pixels, per inch) using the modified Huffman code as in ITU-T recommendation T.4, 196 codewords are required. The present invention discloses a method that may be used to encode the same type of facsimile transmission using far fewer codewords.
U.S. Pat. No. 4,096,527, entitled "RUN LENGTH ENCODING AND DECODING METHODS AND MEANS"; U.S. Pat. No. 5,626,829, entitled "DATA COMPRESSION USING RUN LENGTH ENCODING AND STATISTICAL ENCODING"; U.S. Pat. No. 4,922,545, entitled "FACSIMILE IMAGE ENCODING METHOD"; U.S. Pat. No. 5,357,546, entitled "MULTIMODE AND MULTIPLE CHARACTER STRING RUN LENGTH ENCODING METHOD AND APPARATUS"; U.S. Pat. No. 5,541,595, entitled "VARIABLE LENGTH CODE DECODER FOR SIMULTANEOUS DECODING THE MOST SIGNIFICANT BITS AND THE LEAST SIGNIFICANT BITS OF A VARIABLE LENGTH CODE"; and U.S. Pat. No. 5,710,639, entitled "SCAN LINE COMPRESSED FACSIMILE COMMUNICATION SYSTEM," each disclose an encoding method involving equal-length encoding, unequal-length encoding, run-length encoding, the use of Huffman codes, and the application of the same to facsimile transmission, but none of these patents disclose a method of minimizing the number of codewords required to transmit an image as does the present invention. U.S. Pat. Nos. 4,096,527; 4,626,829; 4,922,545; 5,357,546; 5,541,595; and 5,710,639, are hereby incorporated by reference into the specification of the present invention.