1. Field of the Invention
The present invention relates to a document image recognition apparatus and a computer-readable storage medium storing a document image recognition program for recognizing a document image by detecting the tilt of a document image in a document, etc. read by an image scanner or received from a facsimile device, amending the tilt, and extracting a character line and column.
To read a larger volume of document through an optical character reader (OCR) engine, it is necessary to provide the function of analyzing the layout of document text containing both vertical and horizontal character lines such as Japanese newspaper text. The present invention provides the new technologies of detecting the tilt of text for a correct tilt amendment to a document image and extracting lines and columns to correctly recognize document images as technologies required to analyze the layout of text having vertical and horizontal character lines.
2. Description of the Related Art
(1) Detecting the tilt of a document image
To read a common printed document, it is necessary to first obtain a document image using an image input device such as an image scanner, etc. At this time, a tilt is normally given to an original document in setting it. To use the document in electronic filing or document recognition, the tilt of the document image should be detected and amended.
In the conventional tilt detecting technology, it is assumed that characters are regularly arranged in a text area which forms an important part of a document image.
For example, the first system is suggested by the `A Fast Algorithm for the Skew Normalization of Document Images` by Nakano, et al. in the publication D, vol. J69-D, No.11, pp.1833-1834 from the Transactions of the Institute of Electronics and Communication Engineers of Japan. That is, the tilt of a character string is estimated by assuming that the reference line of the character string is almost regularly provided, performing the Hough transformation on the coordinate value of the lower end of a character block, and detecting the peak value in the Hough space.
The second system is suggested by the `Document Image Tilt Detection Apparatus` by Mizuno, et al. in Tokukaihei 7-192085. That is, the tilt of a character string is estimated by extracting the connected components of characters, generating a provisional character line by combining vicinal connected components, and obtaining a straight line touching the provisional character line.
The third system is suggested by the `Document Tilt Amendment Apparatus` by Saito, et al. in Tokukaihei 2-170280. That is, a document image is provisionally amended by sequentially changing the tilt angle .theta., and the angle .theta. for the smallest area of the enclosing rectangle containing all black pixels in the amended image is obtained.
(2) Layout Analysis (extracting lines and columns)
Conventionally, the following method has been suggested as a method of extracting lines and columns of character strings in a document image containing vertical and horizontal arrangements of characters.
For example, the fourth system is suggested by the `Document Image Processing Apparatus` by Tsujimoto, et al. in Tokukaihei 1-183783. That is, the column of an input document can be automatically determined by projecting a character line of an input document in a specific direction, and generating a projective distribution.
Furthermore, the fifth system is suggested by the `Document Image Processing Apparatus` by Mizutani, et al. in Tokukaihei 5-174179. That is, columns are extracted using an area in which no components are arranged in an input document.
The sixth system is suggested by the `Character String Extracting Method and Apparatus` by Hiramoto, et al. in Tokukaihei 10-31716. That is, character lines are arranged in different directions, and extracted from a document containing areas having characters different in size and pitch.
For example, a number of Japanese printed documents have vertical and horizontal arrangements of characters. Therefore, it is necessary to appropriately extract character lines and columns when document text is recognized.
However, there are the following problems with the above described conventional systems.
(1) Problems in detecting the tilt of a document image
Since the lines are arranged in a fixed direction in the above described first system, the system cannot be applied to a document containing both horizontal and vertical character lines as in Japanese newspaper. Furthermore, since all characters are not arranged on a reference line even in a document having character lines in a fixed direction, error cannot be avoided. Additionally, there is another problem that the Hough transformation process requires a large volume of computation.
In the above described second system, there is the possibility that a large error may occur because, as in Japanese newspaper, a character line can be mistakenly extracted as a horizontal character line from the column having vertical character lines.
Although the above described third system is designed to detect the tilt of a document text containing both horizontal and vertical character lines, a tilt angle is detected according to small amount of information about the area of an enclosing rectangle containing black pixels of a document image. Therefore, there is the problem that the precision of a detected tilt is unstable. Furthermore, since it is necessary to repeatedly perform the process of extracting a rectangular area by rotating an image itself, a large volume of computation is required.
(2) Problem with layout analysis
Since the above described fourth system preliminarily extracts a character line, and performs a column extracting process based on the preliminary extraction, a non-uniform column which is divided into a number of small character line portions can be actually divided into small portions.
Since the fifth system extracts a column using a blank area, there is the possibility that a column can be mistakenly extracted when a document contains a space between lines larger than a space between columns.
This is a serious problem with a document image of the text formed by closely arranged vertical and horizontal character lines. For example,if a document image contains a small space between the vertically written article and the caption of the photograph as shown by a rectangular box below the photograph area at the upper left corner on the newspaper shown in FIG. 1, then the article and the caption are mistakenly recognized as one column and the characters in each line of the horizontally written caption are mistakenly recognized as the leading two characters of the vertically written article.
Since the column area is extracted as a preprocess performed before a very precise line extracting process in the sixth system, a non-uniform column which is divided into a number of small character line portions can be actually divided into small portions, thereby performing a wrong line extracting process.
That is, in the above described technology, either 1 (basic element set).fwdarw.line extracting process.fwdarw.column extracting process.fwdarw.(layout analysis result) or 2 (basic element set).fwdarw.column extracting process.fwdarw.line extracting process.fwdarw.(layout analysis result) is followed and based on the bottom-up process or the top-town process. In the above described technologies, it is assumed that the line extracting process and the column extracting process are independent of other processes, and lines and columns are extracted by sequentially performing the processes, thereby causing the problems with these technologies.