As the use of computers and computer-based networks continues to expand, content providers are preparing and distributing more content in electronic form. This content includes traditional media such as books, magazines, newspapers, newsletters, manuals, guides, references, articles, reports, documents, etc., that exist in print, as well as electronic media in which the aforesaid content exists in digital form or is transformed from print into digital form through the use of a scanning device. The Internet, in particular, has facilitated the wider publication of digital content through downloading and display of images of content. As data transmission speeds increase, more images of pages of content are becoming available electronically. A page image allows a reader to see the page of content as it would appear in print.
Despite the great appeal of providing digital images of content, many content providers face challenges when generating, storing, and transferring the images of content, particularly when the accuracy of recognizing text in images is important. For example, to enable users to read page images from a book or magazine on a computer screen, or to print them for later reading, the images must be sufficiently clear to present legible text. Currently, the images are translated into computer-readable data using various character recognition techniques, such as optical character recognition (OCR), which includes digital character recognition. Although the accuracy of optical character recognition is generally high, some page images, even after undergoing OCR processing, are simply unreadable due to various artifacts. While manual correction is possible, the cost of manually correcting misidentified characters or inserting missing characters is extremely high especially when scanning a large volume of pages.
Another challenge faced by the digital content providers is the cost of storing and transferring images of content. To reduce storage and transfer costs, content providers desire to minimize the size of files used to store and transfer the images. Digital images may be represented at a variety of resolutions, typically denoted by the number of pixels in an image in both the horizontal and vertical directions. Typically, though not always, higher resolution images have a larger file size and require a greater amount of memory for storage and bandwidth for transfer. The cost of storing images of content and/or transferring images of content multiplies when one considers the number of images it takes to capture, store and transfer large volumes of media, such as books, magazines, etc. While reducing the size and resolution of images often reduces the requirement for storing and/or transferring the images, low resolution images eventually reach a point where the images, and particularly any text contained therein, are difficult for readers to perceive when displayed. Content providers wishing to provide page images with text must ensure that the images can be rendered in sufficiently high resolution so that displayed text will be legible.