U.S. Patent application Ser. No. 07/454,339 filed Dec. 21, 1989, assigned to the present assignee, discloses a method for the detection of the predominant alignment of a page containing text and/or graphics. The contents of this copending application are incorporated herein by reference. The technique described in the copending application computes the "power" of an alignment angle based upon the locations of the pass codes in the CCITT G4 image. This technique uses, as fiducial marks, the locations of the pass codes which result in the output of runs of white pixels. A large power at a given angle signifies the alignment of the pass codes at that orientation.
Since an understanding of the techniques employed in the method of said copending application are of value in the understanding of the present invention, the technique thereof will now be described with reference to FIG. 1-6 of the drawings of the present application.
FIG. 1 is a block diagram illustrating the format of an environment in which the method of the copending application, as well as that of the present invention may operate. This block diagram illustrates a portion of a computer system 50 that includes or is connected to receive output signals from a scanner 52 capable of scanning an image and producing digital data which represents that image. This digital data is communicated to a processor 54. The processor controls input and output operations and calls to program memory 56 and data memory 58 via bus 60.
Program memory 56 may include, inter alia, a routine 62 for controlling the scanning of an image by scanner 62, a routine 64 for converting the digital data representing the image into a compressed data format, and a routine 66 for determining skew angle from the compressed data. Program memory 56 thus has a data memory 58 which stores, at location 68, the digital data structure produced by scanner 52 under control of the scanning control routine 62, at location 70 the data structure of the compressed representation of the scanned image produced by compression routine 64, and at location 72 the data structure containing selected point data, for example fiducial point location, produced by skew angle determination routine 66. To facilitate the communication between program memory 56 and data memory 58 necessary for operation, each are connected to bus 60 such that input and output operations may be performed. It is of course apparent that memories 56 and 58 may constitute a single memory block.
Under control of the processor, skew detection routine 66 accesses various parts of the data memory 58 to acquire data needed to calculate skew angle. Once calculated, the skew angle may then be applied to the output at 74, which may comprise means for displaying the results such as a CRT display, hard copy printer or the like, or may comprise a means for utilizing the results to perform further operations, such as modification of the image data to compensate for skew, etc.
It has been assumed that the image data has been compressed according to the Group 4 standard, although the technique can be modified to render similar results using other compression techniques, such as CCITT 2-dimensional Group 3 format, etc. The coding scheme of Group 4 relies upon the existence and relative spacing between pixel color transitions found on pairs of succeeding scan lines. In Group 4 coding, each line in turn becomes a "coding line" and is coded with respect to its predecessor, the "reference line". The first line is coded with respect to an artificially defined all white reference line. The Group 4 compression standard is explained in greater detail in "International Digital Facsimile Coding Standards", Hunter et al, Proceedings of the IEEE, Vol 68, No. 7 July 1980, pp 854-867 and Int'l Telecommunications Union, CCITT (Int'l Telegraph and Telephone Consultative Committee) Blue Book, Geneva 1989 (I 92-61-03611-2).
Encoding in the Group 4 format has 3 modes--vertical, horizontal and pass. In order to determine the current mode, adjacent scan lines are compared to determine whether, given a first pixel color transition on the reference line, such as black to white, there exists a corresponding pixel color transition (i.e. also black to white) on the coding line. The existence and relative spacing of the transition on the coding line from the transition on the reference line is employed to determine the mode.
Thus, FIG. 2a illustrates a vertical mode, wherein the black to white or white to black transition positions on adjacent scan lines are horizontally close (equal to or less than three pixels, i.e., a, b, .ltoreq.3). FIG. 2b illustrates a horizontal mode, wherein the transition positions are further apart than 3 pixels. The pass mode is illustrated in FIG. 2c, wherein a transition on the reference line does not correspond to any transition on the coding line. The compressed data includes, inter alia, a mode code together with a displacement which implies a displacement measured on the reference line as opposed to the coding line, i.e., a.sub.1 to the right of b.sub.2.
The coding can be explained more clearly with respect to FIGS. 3 and 4. In FIGS. 3a and 3b, the fiducial points 76 are located on the basis of topographic features of the different marks. These topographic features are always located on the marks themselves. Specifically, skew is determined from the locations of the pass codes in Group 4 compressed representation of the image. The position of the pass code fiducial point 76 on unskewed and skewed text are shown by the X marks in FIGS. 3a and 3b, respectively.
Since all of the pass codes (i.e. codes corresponding to the pass mode) are defined relative to a point on the respective mark, all fiducial points are located at some point on the mark, regardless of the extent of the skew. In addition, since there may be more than one pass code in the compressed data representing a mark, there may be more than one fiducial point per mark. For example, for typical font styles, passes will be generated in two places along the baseline of many characters including upper and lower case "A", "H", "K" etc., and in three places along the baseline of an upper and lower case "M".
Passes may also be generated as a result of aliasing errors, for example as shown on the underside of the crossbar of the unskewed "G" and on the right leg of the unskewed "K", in FIG. 3. Distinguishing such aliasing errors is not of importance to the present discussion.
There are two types of passes, i.e. white passes which represent a passage from black pixels to white pixels, and black passes which represent a passage from white pixels to black pixels. White passes are thus indicative of the bottoms of black structures, and are hence somewhat analogous to the bottoms of connected components in the raw bitmap, such as line ends. It is thus guaranteed that there is at least one white pass at the bottom of each connected component. It is accordingly advantageous to use white passes as the fiducial points in the scanning of text or characters, although it will be apparent that black passes may alternatively be employed to determined skew angle. The positions of the white pass code fiducial points 78 on unskewed and skewed text are shown as arrows in FIGS. 4a and 4b respectively.
The Group 4 encoding of passes does not distinguish between white passes and black passes. This may be determined, however, by maintaining the color state. Color state can be maintained by a binary state bit which is initialized to white. Subsequent events including a pass code occurrence may cause the state bit to invert, thereby keeping a running track of the desired pass color.
Comparing FIGS. 3a, 3b and FIGS. 4a, 4b, it is seen that fewer fiducial points are generated off the baseline of the text in FIG. 4 than in FIG. 3. Thus, white passes are advantageous in providing fiducial points on which to base skew measurements by alignment.
FIG. 5 is a flow diagram of a skew detection routine 66 that may be employed in order to determine the skew in a document. This diagram assumes that an image has been scanned, that digital data has been produced corresponding to the scanned image, and that digital data has undergone compression according to a selected data compression method such as that producing Group 4 compressed data.
Initially (box 92), the white pass codes in the data structure of compressed image data are located. Once a white pass code is located, its location in an appropriate coordinate system is determined (box 94). The data may be stored as x,y coordinates. A test is then made (box 98) to determine whether the end of the scanned page has been reached. If so, the skew angle determination proceeds. Otherwise, as search is made for the next, if any, white pass code on the given page.
The steps of boxes 92-98 are collectively referred to as a coordinate determination routine which is disclosed in greater detail with respect to FIG. 6. In this flow diagram, box 101 illustrates the input of data in the Group 4 compressed format. Using x,y coordinate pairs, x and y are first initialized to 0 to indicate the start of each new page (box 102).
The Group 4 codes are detected (box 103), and tests are made to detect horizontal codes (box 104) and vertical codes (box 112). It is assumed that all other codes are pass codes. The detection of the different codes may be implemented by character string recognition, as above discussed. If the detected code is a horizontal code, the x value is increased by the x displacement value associated with the horizontal code (box 106). That is, the horizontal mode of Group 4 includes a code indicating the mode and a displacement indicating the number of pixels between the reference pixel color transition and the current pixel color transition. In the case of a horizontal code, the displacement is the number of pixels between a pixel color transition on the particular line and the next pixel color transition on that same line.
The new value of x does not become an abscissa value used to determined alignment, but is a running value of the displacement from the first pixel position on a scan line. In the method of the copending application, only white pass codes are used for alignment determination.
Assuming that a horizontal code is detected, the binary pixel color state bit is then incremented (box 122). Once the new value of x has been calculated, it is checked (box 108) to determine if the line end has been reached, for example by comparing x to the known length of the scan line. If the line end has not been reached, code detection continues for that line (box 103). If the line end has been reached, x is set to 0 (box 110) to correspond to the beginning of the next line and y, which keeps a running count of the line number, is incremented by one and checked (box 111) to determine if the page end has been reached. The page end may be detected by comparing the y value with the known number of lines on the page. If a page end has been reached, power is then determined for various alignments swept through a number of alignments angles (box 126), as will be discussed. If the page end has not been reached, code detection resumes (box 103).
If the detected code is not a horizontal code, it is tested (box 112) to determine if it is a vertical code. If a vertical code is found, the x value is determined and the program proceeds in a manner similar to that when a horizontal codes was found.
If the code is neither a horizontal code nor a vertical code, it is assumed to be a pass code. Group 4 does not distinguish between black and white pass codes, but the type of pass code may be distinguished by keeping track of the binary pixel color state bit at box 118. Initially the state bit has been set to 0 (box 102). Arbitrarily, 0 has been chosen to correspond to white pass codes. Each time a code is detected, the state bit is checked. If the state bit is not equal to 0, i.e. if the pass code is not a white pass code, the new value of x is set to equal the old value of x (box 120). Assuming that the next code encountered is not a pass code, the next code will have associated with it the requisite information needed to properly calculate the next value of x. If the next code encountered is a pass code, the process is repeated until a code is encountered which is not a pass code. This is the essence of a Group 4 pass code. Continuing, the new value of x has been set (box 120), and the state bit is incremented (box 122) for the next encountered pass code.
If the state bit is 0, a white pass code has been encountered. The location of the white pass is maintained in order to calculate power of the alignment, and for the transformation steps that will be discussed below. This may be done at selected point data location 72 in the data memory 58 of FIG. 1. The maintenance of the locations of the white pass codes is performed at box 124. Next, the value of x is set, the state bit incremented, and the program tests for line and page ends, as discussed above.
Returning now to FIG. 5, assuming that line and page ends have been found, the program section 126 determines power for a plurality of alignments. Initially the alignment angle is set to 0 (box 128). This alignment corresponds to the alignment at which the image was initially scanned. The power of this alignment is calculated, for example by summing the number of passes detected at each of a plurality of different heights (e.g. each corresponding to 1/3 of the height of a six point character), the heights extending along lines perpendicular to the alignment direction being tested. The calculation of power is made more efficient by calculating the alignment on the basis of the sum of a positive power greater than 1 (e.g. 2) (sum of squares) of the counts of the passes which appear in each of the rotationally aligned height increments. The variance of the distribution is maximized by maximizing the sum of squares of the counts, resulting in an index of the "power" of the alignment from which the skew angle is determined. Such power calculation is discussed, for example, in "The Skew Angle of Printed Documents" Henry S. Baird, Proceedings of SPSE Symposium on Hybrid Imaging Systems, 1987, pp 21-24, the contents of which are incorporated herein by reference.
In accordance with the copending application, in the determination of power, a call may be made to memory location 72 of the data memory 58 and the number of x values stored therein determined for each line. The square of the number of x values for each line is accumulated in an array (box 130) representing the power of the alignment at the current alignment angle. The array of squares is stored, together with the current alignment angle (box 132), which may be a part of the data memory 58.
The alignment angle is now incremented by a selected amount, for example one degree (box 134). The power of the alignment is determined for alignments with a range of alignment angles. Selection of the range of alignment angles depends upon a number of factors, such as the expected range of alignment angles, the expected strength of the alignments, the expected number of alignments, etc. The greater the range of alignment angles, the greater the computation time for a given angle increment. For example, the range of skew angles tested may be +40 degrees to -40 degrees. Once incremented, the current alignment angle is tested to determine whether it falls within the selected range (box 136). If the current alignment is within the selected range, the locations of the white pass codes are translated (box 138). Several method of translating the locations of the pass codes exist, and their applicability depends on the coordinate system used, the memory size available, the speed of calculation required, etc.
If the current alignment angle falls outside of the selected range, the maximum power may be determined (box 140) by comparing the powers of the various alignments previously stored. The maximum power may then be output (box 142) in a wide variety of formats, for example in the form of the absolute angle, a spectrum of angles together with their powers, etc. The format of the output depends on the intended use of the results.
U.S. Pat. No. 5,001,766 discloses a method and apparatus for distributing and correcting rotational error (skew) between the dominant orientation of an image and a reference line by generating a file of picture elements representing the image with respect to the reference line, projecting the picture elements into contiguous segments of imaginary lines at selected angles across the file, counting the number of picture elements that fall into the segments and finding the projection that generates the largest value of an enhancement function applied to the segment counts.