The present invention relates generally to a method for detecting skew present in a scanned image and more specifically to a method of detecting skew of an image which is based on a count of certain features of the image and in which said image may be represented in compressed form.
The human eye can detect very small deviations from orthogonality, particularly in digital images of simple structures where the discontinuities caused by aliasing errors draw attention to these deviations. There is less information available about the human sensitivity to skewness of text and complex graphic images. When presented information in a well defined coordinate frame, skew angle is consistently and significantly over-estimated. Presence of skew is not aesthetically pleasing and has pragmatic effects as well. Presence of skew may result in failure to capture all of the source area of the image because one or more corners may fall outside of the field of view of the scanner due to the skew. Skewed images do not compress as quickly or as compactly as images correctly registered to the page coordinate system. Skewed fields are more difficult to utilize in standard page layout and composition operations such as cropping and inserting.
Skew may arise in several ways including difficulties in paper feeding either at the time of digitization or prior to digitization at the time of photocopying, among others. In an electronic reprographics environment, where reproduction is effected by digital scanning and printing of the resulting bitmaps, early and accurate detection of skew angle can result in a significant reduction in the need for resources to correct for and store the skewed image.
Skew angle determination is traditionally a two stage process. First the feature on which alignment quality will be based is determined, and second, various tests are applied to determine if the proposed alignment is a good one relative to a prior sample alignment or other standard. An "alignment" as used herein is an orientation of the components of an image.
To date, determination of skew angle has been accomplished on image data in uncompressed form. Herein, "image" is taken to mean a pattern or collection of distinguishable regions, whether a likeness, representation or neither, typically but not exclusively that which might be found on a printed page. "Image data" means herein data which is typically, though not exclusively, digital in format which may be used by an appropriate system to reproduce an image. Data in "compressed" format means herein data which has been reduced in extent, e.g., the amount of memory space, in bits, required to store the data, from data in "uncompressed" format, no matter what method is used to compress the data. Determination of skew angle from compressed image data has not been contemplated to date or, if it has, it has been dismissed under the general belief that performing skew detection on compressed image data would require the extra, and potentially unnecessary step of transforming the image into compressed data format.
A technique for the detection of skew angle in uncompressed images is described by Henry S. Baird in The Skew Angle of Printed Documents, Proceedings of SPSE Symposium on Hybrid Imaging Systems, pp. 21-24, 1987 (hereafter "Baird"). Initially, Baird must distinguish between a mark or connected component that is text and one that is not text. In this regard, "text" marks or connected components for the purposes of the present disclosure are those comprised of letters, numbers, punctuation and related marks. This is as distinguished from "non-text" marks or connected components which are those not being text, typically graphics, symbols or illustration-type portions of an image. Due to the method used for locating fiducial points, Baird's technique will only yield meaningful results for text images. This classification as text is made by Baird on the basis that the maximum dimension of a text mark is less than or equal to the "em" in a predetermined maximum font size, e.g., 24 point.
The alignment algorithm of Baird's technique operates on the basis of the alignment of selected features associated with each mark, or connected component. "Mark" shall be used herein to refer to a connected component. A "connected component" is defined for the purposes of this disclosure a group of color bearing units, typically picture elements ("pixels") of like color which touch one another. A feature commonly used for this purpose is the "fiducial point." A "fiducial point" as used herein is a point on or associated with a mark which is located by a predetermined set of rules (the same rules for each mark), and which may be used for selected purposes as a representation of a feature of the mark. For example, Baird uses the bottom center of a bounding box around each mark for the fiducial point. Baird's fiducial points 10 are shown as cross-hairs on bounding boxes 12 around marks 14 in FIG. 1a.
In a perfectly aligned horizontal line of English text, for example, each fiducial point will lie on the same horizontal line 16 at the base of the characters (called the "baseline"), with the exceptions of dotted characters, for example the letter "i" of FIG. 1a, certain punctuation marks and characters with descenders, which are portions of the mark which descend below the base line, such as the lower case characters "g", "p", etc. Since a fiducial point is assigned to each connected component, the dotted characters "i"s and "j"s and many punctuation marks (;?!:") give rise to more than one fiducial point per glyph, with at least one of these points significantly misaligned with respect to the baseline. Characters with descenders will generate fiducial points away from, but near to the baseline. Descenders are statistically uncommon enough that the method of Baird (and of the present invention) provides a means for recognizing and compensating for their effect on determination of the true baseline. Note that there is exactly one fiducial point per connected component in the image.
In essence, Baird calculates skew by determining the number of fiducial points per "line" for a variety of rotational alignments. Line as used herein means, for example, one of a plurality of imaginary parallel scan lines traversing the document and oriented perpendicular to a selected feature such as a margin or page edge. The rotational alignments are calculated by trigonometric translation of the fiducial points.
Referring to FIGS. 1b and 1c, counting of the number of fiducial point per line is accomplished by projecting the locations of the fiducial points 10 onto an accumulator line 18 which is perpendicular to the projection direction, as indicated by arrow p. Accumulator line 18 is partitioned into "bins" 20 of a uniform predetermined height, h, for example equal to 1/3 of the height of a six point character. Height h may be varied as appropriate. Height h may be as small as 2 pixels. However, as h decreases, computation time increases. Importantly, as h approaches the character height, skew angle determination performance disintegrates. Returning to the bins, there is exactly one bin per line. The number of fiducial points for a selected line is then equal to the number of fiducial points projected into the bin corresponding to that line.
Since this method results in a relatively small number of fiducial points (depending on the nature of the image), the alignment is made efficient by calculating the alignment on the basis of the sum of a positive power greater than 1, e.g., 2 (sum of squares) of the counts of the fiducial points which appear in each of the rotationally aligned bins. Baird (at page 22, lines 20-22) refers to this sum of squares as "a real-valued energy alignment measure function" defined as ##EQU1## where C.sub.i (.theta.) denotes the number of points projected into the i.sup.th bin at angle .theta., and m denotes the number of bins. The sum of the counts raised to the positive power for each angle, e.g. A(.theta.) for the power equal to two, is referred to herein as the "power" for that angle. An index of all such powers will contain a global maximum whose angle .theta. is approximately equal to the skew angle. Baird states that the real-valued energy alignment measure function displays a global maximum at the correct skew angle and that experiments suggest that any positive superlinear function of c.sub.i within the summation will perform correctly. FIG. 1b shows the positions of the fiducial points 10 and the relative size of each bin 20 on a skewed text sample in which the bins are unaligned with the skewed text. FIG. 1c shows the distribution of the fiducial points of the same skewed image of FIG. 1b into aligned bins.
Calculation of the power of each of a variety of alignments requires that the positions of each fiducial point be known. Specifically, the coordinates of a fiducial point are used to mathematically translate the fiducial point, by an angle and a displacement from an origin, to a new set of coordinates. This process is done for the complete collection of fiducial points, and the power of the alignments before and after translation are compared. From each comparison, the angle corresponding to the alignment with the greatest power is retained. After all angular alignments within a selected range have been compared in this manner, the skew angle may be assumed to be the angle corresponding to the alignment with the greatest power. In the event of weak alignment or multiple alignments, however, this assumption may need to be otherwise verified.