As the use of computers and computer-based networks continues to expand, content providers are preparing and distributing more and more content in electronic form. This content includes traditional media such as books, magazines, newspapers, newsletters, manuals, guides, references, articles, reports, documents, etc., that exist in print, as well as electronic media in which the aforesaid content exists in digital form or is transformed from print into digital form through the use of a scanning device. The Internet, in particular, has facilitated the wider publication of digital content through downloading and display of images of content. As data transmission speeds increase, more and more page images of content are becoming available online. A page image allows a reader to see the page of content as it would appear in print.
Despite the great appeal of providing digital images of content, many content providers face challenges when generating, storing, and transferring the images of content, particularly when the accuracy of recognizing text in images is important. For example, to enable users to read page images from a book or magazine on a computer screen, or to print them for later reading, the images must be sufficiently clear to present legible text, including when scaled to various sizes. Typically, the images are translated into computer-readable data using various character recognition techniques, such as optical character recognition (OCR), which includes digital character recognition. Whether or not OCR is used, a page image may be processed and stored with reference to various glyphs appearing in the page. Glyphs may represent, for example, marks, characters, symbols or other elements appearing in the page. These glyphs may be defined in various ways, including by storing contour or outline information, such as outlines defined by Bezier curves.
One challenge faced by digital content providers is identifying individual glyphs from image data. This may be particularly difficult, for example, when an image includes cursive writing. Another challenge is the cost of storing and transferring glyph-based content. For example, for works written in certain languages, a large number of glyphs are often defined and stored in association with a glyph-based file in order to represent each distinct character or symbol appearing in the work. For glyphs corresponding to text in certain languages, for example, thousands of glyphs may be stored for a given glyph-based file.