Images of rectangular documents such as receipts, pages of text, forms, etc. are often captured, e.g. photographed or scanned, to generate a digital version of the document. Unfortunately, as part of the capture process an image is often distorted with the capture operation producing a non-rectangular quadrilateral image. For viewing and subsequent processing of the image, e.g., for optical character recognition or information extraction purposes, it is often desirable to generate a rectangular image from a non-rectangular quadrilateral image generated from capturing, e.g., photographing or scanning, a rectangular object such as a rectangular paper document.
It is also often desirable to be able to identify the edges of the original captured document in an image produced by a photographing or scanning operation, e.g., to facilitate extraction of information from the image for other purposes such as use in forms and/or for cropping of the generated image to eliminate areas beyond the edges of the original document from the image to thereby better represent the original document as well as reduce data storage requirements by reducing the amount of data required to represent the scanned image by eliminating portions of the output of the capture operation which do not correspond to the original image.
Unfortunately, the technical problem of identifying lines that are potential meaningful with regard to subsequent image processing operations can be difficult given that stray marks and/or small pieces of document content may include what appear to be lines but which may not be significant with regard to a particular desired image processing operation and furthermore storage of information corresponding to identified lines of an image may take up significant memory requirements.
In addition, once lines are identified, there are still technical problems associated with attempts to automate the identification of lines of particular interest, e.g., with respect to identifying boundaries of a scanned document and/or for identifying other features of interest in a scanned document. There are also problems associated with knowing which detected lines should be used for purposes of making image corrections in at least some applications and/or for use in cropping a captured image.
From the above discussion it should be appreciated that there is a need for methods and apparatus which could address one or more of the technical problems associated with line identification in a scanned image, efficient storage of information relating to identified lines, determining borders or edges of an image based on identified lines and/or making image correction operations or performing image cropping operations on scanned documents.