1. Field of the Invention
The present invention relates to the field of image processing and, more particularly, to the detection or estimation of skew in document images.
2. Discussion of the Background Art
The automatic processing of document images, typically by computers, is now widespread and is performed for a variety of reasons including, for example, optical character recognition. Often there are problems in the automatic processing because the document image is skewed. Thus, it is advisable to detect or estimate the skew angle, and correct the skew, before applying any further image processing.
Incidentally, in the present document the expressions “skew detection” and “skew estimation” are both used to designate the process of determining a value for skew angle. The term “estimation” does not denote a lower level of accuracy in determining such a value.
Various techniques have been proposed for automatic skew detection in document images. These are usually methods based on clustering of nearest neighbors, methods based on Hough transform, or methods involving determination of projection profiles. However, these methods suffer from a number of drawbacks. Often the skew estimation/detection process is slow. Also, few methods are applicable to gray-scale images or to images containing drawings. Moreover, most known methods can give inaccurate results when applied to analysis of documents with text in non-Western scripts (for example, in Devnagari and Bangla scripts).
It has been proposed to use techniques derived from mathematical morphology in an algorithm for skew detection in a document image, see for example, the paper entitled “A fast algorithm for skew detection of document images using morphology” by A. K. Das and B. Chanda from IJDAR, International Journal on Document Analysis and Recognition, (2001) 4, pages 109–114. According to this proposal, the morphological operations of “closing” and “opening” (or “dilation” and “erosion”) are applied to a document image in order to convert text lines into black bands. Subsequently, the black bands are analyzed in order to find the baseline pixels of each text line, lines of a certain length are extracted and the orientation angles thereof are computed. Then the median angle is taken to represent the skew angle.
Although the algorithm proposed by Das and Chanda is fast and may be applicable to a variety of script forms, it is not well-suited to processing documents containing drawings as well as text. Special steps must be included in the Das and Chanda algorithm in an attempt to minimize the effect of drawings on the skew-angle-estimation process.
The present invention seeks to provide a new technique for skew estimation based on mathematical morphology.
The principles of mathematical morphology were laid down in the 1960s by G. Matheron and J., Serra. When applied to image analysis, mathematical morphology provides a framework for analyzing the shape and form of structures present in the image. Many mathematical morphological operations make use of a probe, or “structuring element”, to investigate the structure of the image under analysis. The shape and size of the structuring element must be adapted to the geometric properties of the image objects to be processed. For example, linear structuring elements are suited to the extraction of linear objects in an image.
Set notation is often used to express mathematical morphological operations. The structuring element is often denoted by the set of points B, which constitutes it. When the structuring element is translated onto a point x, then it is written as Bx. For a black-and-white image, the set of all white pixels in the image describes the image (the same is true for the set of all black pixels in the image). Such a set can be considered to be an image object F. A corresponding image object f can be defined for a gray-scale image. There is no formal difference between morphological operations whether applied to binary or gray-scale images.
For mathematical morphology on gray-scale images, different equivalent approaches can be taken. A simple idea is to look at the “umbra” of the function, that is the set {(y,x)|y<f(x)} and to apply the usual set operators on this set. Generally, for gray-scale images, planar structuring elements are used (for instance a disk would be used in place of a sphere). Thus, the function is considered level set by level set.
Another approach is to define morphological operators using a generalized expression which applies to gray-scale images. For example, the expression for a dilation operation would become:
                              f          ⊕                      B            ⁡                          (              x              )                                      =                              sup                          y              ∈              B                                ⁢                                          ⁢                      f            ⁡                          (                              x                +                y                            )                                                          (        1        )            and a binary image would then correspond to the special case where f(x)=1 if x∈X and f(x)=0 elsewhere.
In the following description, when a binary image is involved, the symbol F will be used to designate the image object. When a gray-scale image is involved, then symbol f will be used, and when the image object can be either gray-scale or binary, the symbol A will be used.
It may be helpful to recall some of the basic operations used in mathematical morphology, notably the operations of dilation, erosion, opening and closing.
Dilation
The operation of “dilation” seeks to answer the question “When a structuring element B is translated onto a point x, does it intersect with the set defining the image object A?” The dilation of an image object A using a structuring element B can be written δ1,B(A). An image object can be repeatedly dilated. If dilation is repeated n times, then it is said that a dilation of size n has been performed, and the result is written as δn,B(A).
In set notation, the dilation of an image can be expressed in terms of Minkowski addition which, for a binary image F gives:δ1,B(F)=F ⊕B={x|Bx∩F≠Ø}  (2)In other words, the dilated image δ1,B(F) will contain image points (typically, black pixels) at all points x for which there is an intersection between the original image F and the structuring element when translated onto x (Bx).
For a gray-scale image f, the dilation of the image by the structuring element B can be expressed, in a similar way, as:
                                          δ                          1              ,              B                                ⁡                      (            f            )                          =                                            (                              f                ⊕                B                            )                        ⁢                          (              x              )                                =                                    max                              b                ∈                B                                      ⁢                                                  ⁢                          f              ⁡                              (                                  x                  +                  b                                )                                                                        (        3        )            In other words, for a point x, the value of this point in the dilated image will be the maximum of the values taken at the points (x+b) in the original gray-scale image f, b representing the vectors defining the points in the structuring element B.
Considered visually, dilation can be likened to adding a layer to objects represented in the image. A dilation of size n adds n layers to the objects.
Erosion
Erosion is the complement to dilation. The operation of “erosion” seeks to answer the question “When a structuring element B is translated onto a point x, is the structuring element completely contained in the set defining the image object A?” The erosion of an image object A using a structuring element B can be written as ε1,B(A). An image object can be repeatedly eroded and εn,B(A) denotes an image A that has been eroded n times.
In set notation, the erosion of an image can be expressed in terms of Minkowski subtraction which, for a binary image F, gives:ε1,B(F)=F⊖B={x|Bx⊂F}  (4)In other words, the eroded image ε1,B(F) will contain image points at all points x for which, when the structuring element is translated onto x it is completely contained within the original image object.
For a gray-scale image f, the erosion of the image by the structuring element B can be expressed, in a similar way, as:
                                          ɛ                          1              ,              B                                ⁡                      (            f            )                          =                                            (                              f                ⊖                B                            )                        ⁢                          (              x              )                                =                                    min                              b                ∈                B                                      ⁢                                                  ⁢                          f              ⁡                              (                                  x                  +                  b                                )                                                                        (        5        )            In other words, for a point x, the value of this point in the eroded image will be the minimum of the values taken at the points (x+b) in the original gray-scale image f, b representing the vectors defining the points in the structuring element B.
Considered visually, erosion can be likened to stripping off a layer from objects represented in the image.
Opening
The opening operation includes an erosion followed by a dilation (this is not equivalent to a dilation followed by an erosion—see “Closing” below). If an image A is opened by a structuring element B, then the result γ1,B(A) can be expressed in a variety of ways:γ1,B(A)=A∘B=AB=(A⊖B)⊖B  (6)The first three expressions are just different symbolic representations of “A closed by B”, the final expression indicates an erosion followed by a dilation.
Application of the opening operator to an image tends to smooth the contours of objects in the image, to separate an “isthmus” in the image from the “mainland” (if the link between the two is smaller than the structuring element), and to remove objects (or their parts) which are smaller than the structuring element.
Closing
The closing operation includes a dilation followed by an erosion. The closing operation is the dual operation (not the inverse) of the opening operation. If an image A is closed by a structuring element B, then the result φ1,B(A) can be expressed in a variety of ways:φ1,B(A)=A●B=AB=(A⊕B)⊖B  (7)
Application of the closing operator to an image tends to close holes or slits in the image if they are smaller than the structuring element and to cause the union of “islands” to the “mainland” when the distance between them is shorter than the structuring element.