As used herein, the term “font” refers to visible markings characterized by typeface and style. Typeface is a coordinated set of glyphs that establishes a consistent visual appearance for a family of characters, such as an alphabet of letters, numerals and punctuation marks. Common typefaces include Arial, Courier, Helvetica and Times New Roman. Style references the aesthetic representation of a typeface specified through one or more parameters, such as plain, boldface, italic, and underline. A typeface category is a schema for organizing typeface families. The schema may rely upon criteria such as serif, sans serif, proportional and mono-space. Serifs are end strokes within letters. San serif characters omit end strokes within letters. For example, Times New Roman is a serif font, while Helvetica is a san serif font. A proportional typeface contains glyphs of varying widths. A mono-spaced (non-proportional) typeface uses a single standard width for all glyphs.
A Hyper Text Markup Language (HTML) document includes tag delimited segments. A tag for a segment characterizes the font for the text in the segment. For example, an HTML file may have the following segment:
<pre style = “font-family: Sans-Serif”>English: Michael Jackson was the king of pop....</pre>
The tag, marked by <>, specifies a typeface category or font-family of Sans-Serif. As a result, the text “English: Michael Jackson was the king of pop.” will be rendered in accordance with this font family. The exact mapping of this font family (Sans-Serif) to an actual specific font (e.g., Arial, Courier) may be done by default or some other mapping, which may result in a sub-optimal output, particularly if the output is contrasted with other selected fonts for other languages in the multi-lingual input file.
Known techniques are used to convert an HTML file to a Portable Document Format (PDF) file. Similarly, known techniques may be used to convert text in an HTML file to a format suitable for a computer screen, print drivers or a publishing application. However, complications may arise in the event of multi-lingual text within an HTML file. For example, if an HTML file with multi-lingual text specifies a Times New Roman typeface, a resultant PDF file will accurately represent Latin character text, but not Chinese characters. Similar problems arise when the HTML file does not have clear font definitions. For example, the font definition may only specify a typeface category.
It is desirable to select fonts with a similar appearance when rendering multi-lingual text. That is, if clashing fonts are used for different languages in a multi-lingual file, then the resulting document is visually unpleasant.
In view of the foregoing, it would be desirable to provide improved techniques for rendering multi-lingual text.