A. Field of the Invention
Systems and methods described herein relate to image scanning and, more particularly, to techniques for scanning and removing distortions in documents.
B. Description of Related Art
Modern computer networks, and in particular, the Internet, have made large bodies of information widely and easily available. Free Internet search engines, for instance, index many millions of web documents that are linked to the Internet. A user connected to the Internet can enter a simple search query to quickly locate web documents relevant to the search query.
One category of content that is not widely available on the Internet, however, are the more traditional printed works of authorship, such as books and magazines. One impediment to making such works digitally available is that it can be difficult to convert printed versions of the works to digital form. Optical character recognition (OCR), which is the act of using an optical scanning device to generate images of text that are then converted to characters in a computer readable format (e.g., an ASCII file), is a known technique for converting printed text to a useful digital form. OCR systems generally include an optical scanner for generating images of printed pages and software for analyzing the images.
One problem with using OCR in the context of printed documents such as books is that books are generally bound in a manner that can make it difficult to generate high quality images of the pages. For OCR, it is desirable to generate the images of the printed pages from flat, two-dimensional, versions of the pages. Books generally have spines, however, that can cause the pages to have a more three-dimensional profile. The three-dimensional profile of the page, when viewed as a two-dimensional image, will exhibit distortion (“warping”) of the printed text. This warping can reduce the accuracy of the text file output by the OCR system.