1. Field of the Invention
The present invention relates to processing digital pixel images.
In one aspect, the invention relates to processing an image of a document and, in particular to resolving perspective distortion in the image. The invention is especially suitable for processing an image of a document captured using a digital camera, but the invention is not limited exclusively to this.
In another aspect, the invention relates to performing a summing function along lines through the image. This technique may advantageously be used in the first aspect of the invention, but may find application in any image processing requiring a line sum along a line inclined relative to the image coordinate axes (for example, skew detection).
2. Description of Related Art
There has been increasing interest in capturing a digital image of a document using a digital camera. A camera can be more convenient to use than a traditional device such as a flatbed scanner. For example, a camera may be portable, and offer the convenience of face-up, near instantaneous image acquisition. However, this freedom of use for cameras means that the document is captured under less constrained conditions than when using a scanner designed for high quality document imaging. For example, when using a scanner, the document is typically scanned at a predetermined fixed position and under controlled conditions. In contrast, when using a camera, the camera may be at any arbitrary position relative to the document. This can lead to degradation of the captured image, in particular degradation caused by perspective distortion in the captured image.
By way of illustration, FIGS. 1 and 2 provide a comparison of images of a document 10 captured using a flatbed scanner (FIG. 1) and using a digital camera (FIG. 2). In either figure, part (a) shows the original document 10 and part (b) shows the captured image 12. Part (c) shows the appearance of the word “the” at two different positions 14 and 16 in the captured image.
In FIG. 1(b), the document image 12 captured by the scanner may be slightly skewed (in a worst case situation if the user did not place the document on the scanner correctly), but the document is “flat” in the image. Any skew in the image is uniform, and does not vary. Such uniform skew can be resolved and corrected using any well known skew detection algorithm, such as that described in U.S. Pat. No. 5,335,420. Other suitable skew detection techniques are described in W. Posti, “Detection of linear oblique structures and skew scan in digitized documents”, In Proc. 8th International Conference on Pattern Recognition, pages 687-689, 1986; and D. Bloomberg, G. Kopec and L. Dasari, “Measuring document image skew and orientation”, In Proc. SPIE: Document Recognition II, pages 302-316, 1995.
In FIG. 2(b), the image 12 suffers from perspective distortion which is caused by the camera being used at a slight angle offset from the optimum axis normal to the document plane. Such mis-positioning of the camera relative to the document plane is typical, as a user will often not be able to judge this accurately, or the user will avoid holding the camera in an awkward position. An inexperienced user might not even be aware of the importance of camera position. In FIGS. 2(b) and 2(c), the angles of inclination of the document text are not uniform, but vary across the image. For the first (upper) occurrence 14 of the word “the”, the text has a slight downward slant, and each character is twisted clockwise relative to the horizontal. The second (lower) occurrence 16, the text has a slight upward slant and is noticeably larger. Each character is also twisted anticlockwise relative to the horizontal.
It is not possible to apply the usual skew-correction technique to detect and correct for perspective distortion because the skew-correction technique relies on there being a uniform angle of skew across the image. In the case of perspective distortion, the angle changes continuously across the image.
The existence of perspective distortion in an image is highly undesirable. Firstly, the human eye is highly sensitive to such distortion, and it makes the image unappealing and distracting to human readers. Perspective distortion can also create serious problems for automatic image processing techniques which typically assume a “flat” document image without perspective, and may become complicated, slow and even unreliable in the presence of perspective distortions. For instance, template-based text recognition would have to match many more templates to compensate for perspective-induced variations in character shape and orientation. Automatic page layout analysis usually assumes the presence of horizontal and vertical white space, and so might not be effective if perspective distortion is present. Also document image compression schemes may assume a non-perspective image, and might not be optimized for perspective distorted images.
The specific problem of correcting perspective distortion in document images is relatively new and unresolved, since document capture with cameras is itself a new and emerging field. Document images can be described in terms of two orthogonal families of lines: one parallel to the text lines (X-lines) and one parallel to the borders of formatted text columns (Y-lines). The perspective transform of a family of parallel lines is a pencil: that is, a family of lines through a common point, known as a vanishing point. Therefore, under perspective, the lines in the document image are referred to as the X-pencil and the Y-pencil.
Techniques are known for perspective estimation in natural images, and such techniques fall generally into two categories. One approach estimates a texture gradient, whose magnitude describes the slant angle of a textured plane and whose direction describes the tilt direction of the plane. This may be achieved by a segmented analysis of segmented texture primitives (see for example, K. Ikeuchi, “Shape from regular patterns”, Artificial Intelligence, 22:49-75, 1984; and K. Kanatani and T. Chou, “Shape from texture: general principle”, Artificial Intelligence, 38:1-48, 1989). Alternatively, local frequency domain approaches may be used (see for example, B. Super and A. Bovik, “Planar surface orientation from texture spatial frequencies”, Pattern Recognition 28(5): 729-743, 1995).
The second approach attempts to determine vanishing points. Most such techniques employ the Gaussian sphere as accumulator for votes for local edge orientation votes (see for example S. T. Barnard, “Interpreting perspective images”, Artificial Intelligence, 21:435-462, 1983; and J. Shufelt, “Performance evaluation and analysis of vanishing point detection techniques”, IEEE Trans. Pattern Analysis and machine Intelligence, 21(3):282-288, 1999). More recently, local frequency domain methods for vanishing points have also been proposed (see for example E. Ribero and E. Hancock, “Improved orientation estimation from texture planes using multiple vanishing points”, Pattern Recognition, 33:1599-1610, 2000).
However, document images differ significantly from natural images. A natural image contains continuous “hard” edges around the boundary of a defined object in the image, from which perspective information may be obtained. In contrast, an arbitrary document image might typically contain only textual or graphical information with no continuous hard edges (unless all of the edges of the document page happened to be included in the image, and clearly distinguished from a background). Therefore, although document perspective estimation techniques based on hard edges have been proposed (see P. Clark and M. Mirmehdi, “Location and recovery of text on oriented surfaces”, In Proc. SPIE: Document Recognition and Retrieval VII, pages 267-277, 2000), such techniques do not perform reliably for general document images.
Accordingly, it would be desirable to provide a technique that can be applied to any generic document image, for estimating the perspective in the image, even if the image does not contain continuous hard edges around the edge of the document page. It would also be desirable for such a technique to be relatively quick in operation (for example, it should be relatively quick compared to operations such as OCR).