The present invention relates to book digitization. More specifically the present invention relates to correcting digital scan data of a curled page, such as in the vicinity of the book spine, or other distortion of the page.
There is a large amount of information contained in printed material. Printed material may include books, as well as newspapers, journals, magazines, pamphlets, and other periodical literature. However, access to such material, as well as storage space for such material, may often be limited. For example, due to the rarity as well as, often, the fragility of some material, such as some older books and publications, an institution that holds the material, such as a library, museum, or private owner, may be reluctant to lend the book to individuals or other institutions. Thus, a researcher or other interested individual who wishes to access such material may have to travel to the location of the material. Even so, access may be limited to a limited period of time or to viewing under special conditions. In addition, some publications, such as newspapers and popular magazines may deteriorate quickly. In addition, storage space at an institution may be limited.
Therefore, there has been much interest in digitizing the contents of rare books, as well as other printed material. A digitized version of the material may then be made available to a much larger segment of the population than had access to the original book. In addition, there is much interest in making available to the public in digital form a wide variety of books and publications that are out of print. (Hereinafter, printed material to be digitized will be referred to as a “book,” regardless of its actual form.)
In digitization, each page or pair of pages of the book is scanned to acquire as series of digitized images of the pages. The digitized images may then be saved in a digital format. The digitized images of the book may be made available to the public either in the form of a digital file, or as reprinted in the form of a facsimile edition of the book.
The acquired digitized images may be further processed to extract the textual contents of the book. For example, optical character recognition (OCR) technology may be applied to the scanned pages in order to create a text file of the textual contents of the book. The contents of the book may thus be made available to the public in the form of a text file.
A frequent obstacle to cost-effective digitization of an old book is the distortion of page images due to bending or curling of the pages. Depending on how a book is bound, the book may not open flat. In such a case, the ends of the pages near the binding may curled or bent.
When scanned with a scanner designed primarily for scanning flat objects, a digitized image of a curled end of the page may appear distorted. Text on the curled portion of the page may be tilted with respect to the line of sight of the scanner. The symbols or letters of the text may be distorted such that they may be difficult to read. In addition, the distortion of the letters may render the letters unrecognizable by standard OCR technology.
Often, various considerations preclude disassembling the binding of the book, or applying pressure to the book, in order to cause the pages to lie flat. Using special cameras or scanning techniques in order to scan around the curvature of the page may significantly increase the time and expense required to digitize the book. Such an increase in time and expense may seriously impede progress in digitizing whole libraries and collections of rare books.