In prior art the flat bed scanner has become a standard equipment in almost every office providing scanned input of typed text, book pages and different types of documents such as for example handwritten applications or partially handwritten schemes etc. to computers for further word-processing, electronic storage, electronic distribution etc. However, whenever a document or page is not properly aligned on the flatbed scanner, or the thickness of a book renders the page adjacent to the back of the book curved above the flatbed scanner, the scanned images transferred to the computer provides a deformed image of the text that is difficult to recognize in an OCR program as known in prior art.
In recent years, digital cameras have become an alternative to flatbed scanners due to the flexibility when using the camera. However, the problem with deformed text images for OCR processing in digital cameras is further enhanced since the misalignment of a camera image may occur in three dimensions (perspective distortion), even for pictures of flat pages. Lens faults like lens aberration and distortion may also influence the OCR efficiency.
A geometrical transformation of the deformed document image providing corrected images suitable for the OCR processing may solve the problem. The U.S. Pat. No. 6,304,313 disclose a digital camera with an OCR function based on dividing a document page into blocks, where each block is photographed before each block is processed by the OCR function. When all the blocks has been processed by the OCR function, the recognized blocks with text corresponding to the plurality of images are combined together to form one text data set corresponding to the whole document. However, the geometrical transformation according to this disclosure is merely to divide the page in such small blocks that the deformation in each small block is negligible. Therefore, this solution may require extensive processing to accomplish the task when the deformation exceeds a specific level. Further, the division of text may render the text in each block unrecognizable because the blocks are becoming too small to contain recognizable text.
The US patent application US 2003/0026482 from Feb. 6, 2003 disclose a method for correcting perspective distortion in a digital document image, for example from a digital camera, wherein a mathematical model of how parallel lines passes a single point when viewed under some perspective view is used to identify the perspective of the image. According to a preferred embodiment of this invention, horizontal and vertical border lines of an image comprising text is used to identify the perspective of the image. Based on this mathematical model of the distortion due to the perspective, text lines are corrected. As easily understood, this perspective based method do not cope with the other types of distortion readily encountered when for example a page in a book is photographed, and then passed through an OCR function. Besides perspective distortions, structural distortions due for example to bending or curving of book pages adds significantly to the problem of correcting such images from cameras. It is also clear from practical experience when using a camera for capturing images of text, the camera usually is oriented straight ahead above the page to be photographed. Therefore, the perspective distortion will usually contribute less to the total distortions encountered in the image compared to for example structural distortions of the object (text page, bending of book pages, curving pages etc.) itself.
The paper “Correcting Document Warping based on regression of curved text lines” by Zhang and Tan, International Conference on Document Analysis and Recognition, ICADR-2003, disclose a method based on models of the text line deformations as quadratic polynomial curves instead of using a more common cylinder model for the book deformation near the back of the book as described above. The lines are tracked using a connected element clustering algorithm within bounding boxes defined by the orientation of an already identified segment of the text lines.
The paper “Document image de-warping for text/graphics recognition” by Wu and Agam, International Symposium on Statistical Pattern Recognition, SSPR-2002, disclose a method based on lines that are tracked using a local adaptive cumulative projection at different angles. The tracked lines may cross each other due to the local nature of the algorithm when two starting points result in two different search directions. A second step of removing lines that are crossing based on the average orientation of the lines is included, limiting the method to images with fairly regular lines and a small perspective distortion. A rectangular mesh is fitted to the remaining lines for dewarping.
The paper “Rectifying the bound document image captured by a camera: A model based approach” by Cao et. al., International Conference on Document Analysis and Recognition, ICADR-2003, disclose a method based on applying a cylinder model to the book deformation near the back of he book and a perspective model to compensate for the depth difference. A best match between the cylinder model and a set of threshold skeletons of the lines are used to rectify the images.
All of these above referenced papers disclose methods having clear limitations with respect to the type of geometric deformations that can be dewarped by these methods. The cylinder model and the quadratic polynomials will only fit the type of geometric deformations that is found in books with stiff book cover. The average orientation filtering requires that the text lines are fairly regular, which is found in the case with open books, and which also limits the methods to only small perspective deformations.
Therefore there is a need for a method and system providing better geometrical transformation of distorted images comprising text before processing images with an OCR function to achieve more reliable and more complete text recognition of documents in a computer system or digital camera system.