The present invention relates to a data compression/expansion method and apparatus which compresses or expands binary image data such as font data, and to a printing apparatus.
Conventional printing apparatuses and word processors store image data such as font data in an internal ROM. When a character code is inputted and an output of the character to a recording medium or a screen is instructed, font data corresponding to the character code is read out in accordance with the instruction. The font data is developed on a bit map, and an image is formed. In a printer such as an ink-jet recording device or a laser beam printer which can perform recording at high resolution, a large amount of image data is required to print a predetermined area, and higher resolution is also required for the font data. Thus, as the resolution increases, the number of dots forming a single character increases, resulting in the amount increase of the whole font data.
FIG. 2 is a diagram showing the font data of an alphabetical character "B" formed on 48.times.48 matrix as an example. In such font data, 288 bytes (2304 bits) is required for a character. In the case of Japanese writing system having approximately 7,000 characters for a single font, approximately 2M bytes is required for a memory capacity to store all the character font. This results in cost increase.
To solve the above problem, it is considered to compress the character font. As a method of compressing binary font data, run length method is well-known. The font data can be also compressed by MH coding or MMR coding which is used for a system performing an image transfer such as a facsimile.
A font data compression method is described with reference to FIG. 3.
In FIG. 3, the font data comprised of 20.times.20 dot matrix is shown. The black portion 301 is the data portion to be printed or displayed. In case of this font data, since 90% of the data is composed of white data, the frequency of continuously appearing white data is rather high. In fact, columns 1-7 and 16-20 are all white data in FIG. 3. When there is black data in a column, a header 302 provided in one bit basis for each column is marked. On the other hand, when there is no black data in a column, the header 302 is not marked. Thus, the font data excluding the white data columns can be stored by providing the header 302 and referring the presence of black data. Alternatively, it is possible to determine the presence of a black dot by the header 302 provided in a word unit or a byte unit.
In general, coding by run length is performed in accordance with run length of black and white data. However, it is required to analyze the data to determine what code allocation is appropriate prior to the coding (in fact, the MH coding method is an international standard of the coding to efficiently compress data when a run length coding is allotted to the image data of a standard original). To obtain an average information amount in a group of run length, it is appropriate to obtain the entropy of the image information of the data group. For example, if the probability where a run length is k is P.sub.k, the entropy as an information source of a single run length can be expressed as follows: ##EQU1## wherein "L" is the number of groups of run length.
The equation (1) indicates a theoretical limitation of the average number of bits which is required for a binary expression of a single run length. Since the actual number of bits is an integral number, it would be large than H.sub.run.
FIG. 4 is a chart showing an example where the frequency of appearance of white run and black run in approximately 8000 character patterns. The chart particularly shows the run lengths of white data and black data of the font data whose maximum run length is 48.times.48 bits (2304 bits), the frequency of appearance of each run length, and an average run length. From the data, the entropy of black data H.sub.black is 3.18, while that of white data H.sub.white is 4.64. For example, when the average length 5.83 of black data is coded by the binary system, 3.18 bits is required from H.sub.black. This means that a black dot can be expressed by 3.18/5.83 (bits). However, this is only the case for a black dot. ##EQU2## This value is regarded as a theoretical compression rate. In this embodiment, H.sub.pel is 0.4238 and the data occupies 42.38% of the information amount of original image data. However, since a code is expressed in an integral number, and an identifier is needed for expressing black/white data, the actual value is larger than 0.4238. Thus, when the run length method is used, the font data can be reduced at the theoretical compression rate 42.38. However, when an error between an address table to refer the font data and the actual value is considered (since a decimal expression is impossible in the number of bits), the actual compression rate becomes approximately 45.0%. For example, when three character fonts (mincho-tai, gothic-tai, and mouhitsu-tai) of the Japanese writing system are internally stored in a printing system, without compressing it, approximately 8M bytes (48.times.48 bits per character) is required for a memory capacity. On the other hand, if the run length coding is performed on the data, only 8M.times.0.45=3.60M (bytes) is required. However, there still remains a problem in the cost.
In a case where the header indicating a presence of black data is added to the above-described font data shown in FIG. 3, the compression rate is approximately 84.0% when the header is added on column-by-column basis, 81.6% on word-by-word basis, and 76.5% on byte-by-byte basis. This is because, in the actual font data, the font data of white dots is less present, while the character font where black dots are scattered is much present.