As the use of computers and computer-based networks continues to expand, content providers are preparing and distributing more and more content in electronic form. This content includes traditional media such as books, magazines, newspapers, newsletters, manuals, guides, references, articles, reports, documents, etc., that exist in print and may be transformed from print into digital form through the use of a scanning device or other available means. A page image rendered to a user in a digital form allows the user to see the page of content as it would appear in print.
However, content providers may face challenges when generating the images of content, particularly when the accuracy of recognizing text in images is important. For example, to enable users to read page images from a book or magazine on a computer screen, or to print them for later reading, the images must be sufficiently clear to present legible and correctly translated text. Currently, the images of content may be translated into computer-readable data using various character recognition techniques, such as, for example, optical character recognition (OCR). Although the accuracy of OCR may be generally high, some characters, for example ones belonging to East Asian languages, may be identified incorrectly and/or interpreted wrongly. The cost of manually correcting misidentified characters may be extremely high, especially when scanning a large volume of pages.