The present invention relates to scanning documents and particularly to scanning books.
Scanned images of book pages often have three types of distortions generated by scanning. Depending on the book's orientation relative to the scanning direction when it lays on the scanning surface and the elevation of the book spine area above the surface, these three types of distortion appear at different levels. As shown in FIG. 1, when the book spine is above the scanning surface the scanned image usually has shadows appearing in the image close to the spine. The other two types of distortions happen for the same reason but only when a book is scanned with its spine parallel to the scanner sensor bar, referred to as the “parallel scanning case.” In this case, the page image squeezes toward the spine and consequently the text closer to the spine become thin and difficult to recognize. Beside this “squeeze” distortion the text close to spine also bends toward the center of the page. This type of distortion is referred to as “curvature distortion” in the present specification. The abovementioned distortions not only affect the image's visual readability in the affected area but also cause failures of automatic Optical Character Recognition (OCR) methods which are commonly used to transform the scanned visual information to the corresponding text. The present invention relates to digital document analysis. When applied to scanned books, such analysis can be used to detect aspects of the scanned document, such as page areas, page orientation, text areas, and book spine.