A. Field of the Invention
Systems and methods described herein relate to image scanning and, more particularly, to techniques for scanning and locating features in documents.
B. Description of Related Art
Modern computer networks, and in particular, the Internet, have made large bodies of information widely and easily available. Free Internet search engines, for instance, index many millions of web documents that are linked to the Internet. A user connected to the Internet can enter a simple search query to quickly locate web documents relevant to the search query.
One category of content that is not widely available on the Internet, however, are the more traditional printed works of authorship, such as books and magazines. One impediment to making such works digitally available is that it can be difficult to convert printed versions of the works to digital form. Optical character recognition (OCR), which is the act of using an optical scanning device to generate images of text that are then converted to characters in a computer readable format (e.g., an ASCII file), is a known technique for converting printed text to a useful digital form. OCR systems generally include an optical scanner for generating images of printed pages and software for analyzing the images.
When scanning printed documents, such as books, that are permanently bound, the spine of the document can cause a number of scanning problems. For example, although it is generally desirable to generate the images of the printed pages from flat, two-dimensional, versions of the pages, the spine of the book may cause the page to have a more three-dimensional profile. Additionally, scanning each page may require a human operator to manually turn the pages of the book. Occasionally, the human operator may introduce errors into the scanned image, such as by placing a hand, or portion of a hand, over the scanned image of the page. Text occluded by a hand cannot be further processed using OCR techniques.