This invention relates generally to the printing and displaying of text fonts, and more particularly to data compression techniques which are especially applied to printing and storing of the Japanese, Chinese, and Korean language character sets, each of which has several thousand unique characters that must be available in a computer system to allow the eloquent expression of the respective language.
Japanese, Chinese, and Korean writing differ from the written forms of Indo-European languages in that each character usually represents a whole word and sometimes even a phrase. Because of this, the written language is not as inextricably tied to the spoken language as it is where the alphabet is used. For example, even though the Peoples' Republic of China (PRC) has a population that speaks hundreds of different dialects of Chinese (that really are different languages), the Chinese all write in a common text that is universally and perfectly understandable throughout the PRC. (And just as Emperor Han of China had intended when he commanded 2,000 years ago that all of China must use the same writing so that his far-flung bureaucracy could function.) In contrast, the English language alphabet of 26 letters is used to assemble words ranging from a single letter to words comprising dozens of letters. The spellings in English are (more or less) directly related to how spoken English is enunciated. Each of the written European languages is inextricably bonded with the corresponding spoken languages. Because the Far Eastern language writing characters were coined two thousand years ago, certain more modern concepts and objects lacked a corresponding written character. Combinations of the earlier characters came to be employed to represent the new word or phrase. For example, the equivalent of the word "bright" had not existed in the original Japanese Kanji (which is a descendant of the Chinese Hansi). There was, however, a character for "sun" (FIG. 1(a)) and a character for "moon" (FIG. 1(b)). Together, the light from the sun and the moon together would be bright. Therefore the character for bright (FIG. 1(c)) is the concatenation of the earlier two characters. Similarly, the character for "mountain pass" is the concatenation of three characters representing: "mountain" (FIG. 1(d)), "up" (FIG. 1(e)), and "down" (FIG. 1(f)). The original thinking may have been that you must go up a mountain to go down through a notch at the summit that is the mountain pass. The character for mountain pass results (FIG. 1(g)). And as a final example, the character representing "lend" is made up from the characters for "substitute" (FIG. 1(h)) and "shell" (FIG. 1(i)). Since shells were used thousands of years ago as money, the modern combination is really "substitute money," which is the real effect of lending. The character for "lend" therefore has two parts (FIG. 1(j)).
It has been determined by the present inventor that the storage of a near complete Japanese Kanji character set (approximately 7,000 characters) in a computer memory will usually require 512K bytes. But to store just the basic one element sub-set of that with the multi-element characters entered as being combinations of the single element characters, requires only a quarter as much, i.e., 128K bytes. Such a method of reducing the storage requirements is a type of data compression and is an advantage of the present invention.
Another characteristic of the written Far Eastern languages is that the perfect written expression of each character will include brush stroke artifacts. These include the fat beginning of a brush stroke and the thin tail at the end of the same stroke. These artifacts are impossible to represent with a single line of uniform width. They can, however, be represented by bitmap and outline font methods. Bitmap and outline font methods are conventional, e.g., the Apple Computer Macintosh II uses both for display and printing. Such artifacts and details in written English language characters can make one font recognizable over another. For example, the capital "I" in a popular laserprint font called "Times" has serifs, but the capital "I" in another popular laserprinter font called "Helvetica" does not. Being able to see details as fine as a serif in Kanji or in Hansi is more important, very fine differences can be the only way a reader will be able to discern between two otherwise very similar characters having two totally distinct meanings.
In the bitmap method, each character is represented by a matrix of dots, some black and some white. The fewer the number of total dots used in the matrix, the more ragged the final characters will appear. High quality therefore requires using bitmap matrices having at least 16 dots per side. The number of bits required to store the bitmap goes up as the square of the number of dots that are on a side. Bitmaps have the advantage of being easily communicated to a printer by a computer, but have the concomitant disadvantage of consuming large amounts of memory. Bitmaps have the further disadvantage of not being readily scaled up or down in size. In the Apple Macintosh, each bitmapped font has a whole set stored for each of the common point sizes (e.g., 8, 10, 12, 14, 18, 20, and 24). Odd sizes, or larger sizes, are interpolated or extrapolated--with rather poor results. If the whole Japanese Kanji character set was to be stored in bitmap form, the amount of memory consumed would be intolerable. Keeping more than one size would only exacerbate the problem. The temptation in the prior art is then to store less than the full set.
The outline font method scales up and down easily and produces high quality smooth edges at all sizes. Coordinates for the position and formulas for the scaling and borders of characters are maintained in the outline font method. Only one set is necessary for each character, since high quality characters can be scaled up or down and placed in any desired position. The outline font method is particularly suitable for Kanji and Hansi characters, and is used in the embodiment of the present invention described below. Codes are used to fetch boundary data from font storage and the outlines formed from the boundaries are filled by bits. The resulting bit fill is then set to a printer for output. A Japanese Patent Application, laid open in JPO Official Gazette No. 50-14230, discloses a system in which the character pattern is stored in accordance with coordinate data, a scaling of character sizes and of the forms of character is thereafter easily accomplished. The outline font method has an advantage of reduced memory needs, but not as much of a reduction as might be expected, because now border data must be kept for all the characters.
A Japanese Patent Application, laid open in JPO Official Gazette No. 49-129447, discloses a system in which the individual stroke components of a character are resolved into top, bottom, right, and left side boundary data and stroke data. A Kanji character is then reconstructed by assembling standardized stroke data using constituent strokes of a character. The system has the advantage of needing less memory for storage. However, since the pattern for each component has been standardized, the quality of the reconstructed characters is poor.
The prior art has failed to provide a solution that simultaneously provides high quality and modest memory storage space requirements. Outline data for the relatively straight portions of a character conveys very little in the way of useful artifact details, and yet requires just as much memory as more useful portions. The starts, ends, and bends of a character's constituent strokes are rich with artifact details. The present invention retains the details of the starts, ends, and bends of a character's constituent strokes using border data, and steps back to using imaginary baselines to describe the backbone of a character's component strokes.