Optical character recognition systems and image processing systems generally are well-known in the art. One problem with such systems is that the printed material in a document being read in the system may be skewed at an angle with respect to the optical character recognition system orientation. This problem is called skew. It is not cost effective to rotate the document in the system scanner to eliminate the skew. The preferred method is to process the video data generated from viewing the document in such a manner as to remove the skew prior to image processing. One such data processing method is disclosed in U.S. Pat. No. 3,967,243, in which the skew is removed by normalizing vertical and horizontal second moments computed from the video data. This sort of computation is fairly sophisticated and potentially burdensome.
A simpler method is to organize the lines (rows) of video data generated by the document reader into successive columns of predetermined positions. In this method, disclosed in U.S. Pat. No. 4,558,461, whenever there is a sufficient number (e.g., one) of black pixels in a particular column in a given line of data, that entire column in that line (only) is then transformed to all black pixels. As a result, if the video data is skewed, a smeared staircase pattern appears. The skew angle may be easily computed from the shape of the staircase.
This latter method suffers from the disadvantage that the start position of each step in the staircase is fixed by the predetermined positions of the columns into which the rows of video data are organized and then smeared. This introduces a type of error which heuristically may be thought of as "quantization" error. More specifically, the edge of each step in the staircase pattern coincides with a boundary of one of the columns, the location of which is predetermined without regard to the contents of the video data. The accuracy of this process is limited by the minimum width of the columns into which the rows of video data may be organized. This is analogous to the quantization error typically encountered in analog-to-digital conversion, in which the accuracy is limited by the minimum step size of the digital quantization.