The appearance and layout of a typical text document, e.g. a word processing document, or a media presentation document, is determined by the selection of fonts used to display the characters which comprise the text document. Although a document can be rendered using native fonts, i.e. fonts stored on an end user's computer, in order to ensure faithful rendering on any computer system, such as a computer system where the fonts used in a document may not be available, the fonts have to be embedded in the document itself. For example, font sets can be stored either on a computer system as part of the computer's operating system, such as Microsoft Windows® or Macintosh®, and/or font sets can be embedded within a text document and/or transmitted with the multimedia content for playback on a remote computer or mobile device. Although embedding a text document with a font set used in the document would allow the document to be faithfully rendered on any computer system regardless of what fonts are stored on the computer system, the font embedding increases the size of the document which, consequently, leads to a document which requires more memory to store and more bandwidth to be transmitted electronically.
One previous method to reduce the size of an electronic document with embedded fonts is to subset a font in the document. Prior subsetting methods selectively store glyphs that represent the characters or character sets (e.g., all Latin characters) used in a document. Each character represents a unit of text content, while a glyph is a unit of text display that determines the appearance of a character—a specific symbol representing a semantic or phonic unit of definitive value in the writing system. In a font, a glyph refers to any symbol representing a character, whether it be a letter, number or punctuation mark. In digital fonts there may be multiple versions of different glyphs representing the same characters.
For many languages, there is a simple one-to-one character to glyph mapping and the process of font subsetting is straightforward and easy to implement. However, for many complex language scripts, such as Arabic and Indic where the appearance of a character depends on its position in a word and/or adjacent characters, font subsetting is complex. For example, fonts that support complex language scripts may contain multiple different glyphs mapped to the same character code, i.e. the Unicode or hexadecimal code which corresponds to the character in the font set. These glyphs usually represent different forms of a character, such as when the character is isolated or by itself, as the initial character of a word, in a medial position of the word, or in a final position of the word.
In addition, some language scripts have glyphs that represent different ligatures. For example, the combination of characters may create ligatures, which are defined as two or more letter forms written or printed as a unit, such as “fi” becomes “fi” and “fl” becomes “fl.” As a result, a single character or ligature may represent a combination of characters present in the document. In some scripts (such as Latin), the use of ligatures is optional, while in other language scripts, ligature support is mandatory.
The prior subsetting methods accommodate all possible glyph forms of a character by storing all of the glyph forms for a particular character, regardless of whether the glyph forms are actually used in the document. Consequently, the prior processes are inefficient and require storing a significant number of glyph variants that are never used in the document.
One disadvantage with prior font subsetting methods is that such methods are not well suited for font sets and/or complex language scripts which have multiple glyphs which represent a single character, or single glyphs which represent ligatures—the combinations of characters present in a document. In an effort to ensure that all combinations of glyphs corresponding to each character are available, the prior subsetting methods typically include glyphs which are not used in the document and, therefore, result in a document with embedded fonts that is unnecessarily larger in size. The larger document size requires more memory to store the document and precious bandwidth to transmit the document.
There is a need in the art for an improved font subsetting method which more effectively and efficiently embeds fonts used in a document.