As the use of computers and computer-based networks continues to expand, content providers are preparing and distributing more and more content in electronic form. This content includes traditional media such as books, magazines, newspapers, newsletters, manuals, guides, references, articles, reports, documents, etc., that exist in print, as well as electronic media in which the aforesaid content exists in digital form or is transformed from print into digital form through the use of a scanning device. The Internet, in particular, has facilitated the wider publication of digital content through downloading and display of images of content. As data transmission speeds increase, more and more images of pages of content are becoming available online. A page image allows a reader to see the page of content as it would appear in print. Furthermore, graphics, such as charts, drawings, pictures, etc., and the layout of such graphics in a page, are not lost when the page of content is provided as a digital image.
Despite the great appeal of providing digital images of content, many content providers face challenges when generating and storing the images of content, particularly when the accuracy of recognizing text in the content is important. For example, to enable users to perform text searches of the content of page images, the images must be translated into computer-readable text using one or more character recognition techniques, such as optical character recognition (which includes digital character recognition). Although the accuracy of optical character recognition is generally quite high, recognition of page numbers presents particular challenges that are difficult to overcome. For example, page numbers often appear in different locations on different pages, as is typically the case when the numbers appear in the right- and left-most corners of the front and back of a double-sided page in a book. In some cases, the page numbering scheme may be different for different portions of a book, such as the table of contents, index, or epilogue. In yet other cases, pages are intentionally left unnumbered, such as for inserts of graphics, advertisements, or other content. When scanning large volumes of pages, the cost of hand assigning each page image with a page number that is the same as the actual page number is extremely high and is, therefore, not a practical solution.