In specifying the position of a character frame or the like on an image as a process for identifying a “form” (which is a document, such as written application or written contract, having prescribed formal matters) and recognizing characters, it is possible to adopt (1) a scheme wherein page marks or reference marks are employed as reference, (2) a scheme wherein, when a scanner has the function of detecting the edges of a form with a black background, the edges are employed as reference, and (3) a scheme wherein black character frames are individually detected and then matched with a predefined frame.
With the scheme (1), however, the page marks or reference marks are indispensable to the form, to pose such a large number of limitations that a region usable by a user becomes smaller in area. Another drawback is that this scheme cannot cope with a form in which the page marks or reference marks are not existent. The scheme (2) has the drawback that it is not applicable without the expensive scanner dedicated to OCR as is furnished with the special function. The scheme (3) has the drawback that, since not only horizontal segments, but also vertical segments need to be extracted as features in order to detect the black character frame, a processing speed lowers.
Another drawback is that, since the image itself is preprocessed by a skew correction so as to establish a state having quite no skew, the processing speed lowers still further. Especially with a scheme wherein the form is identified using its contents, when a skew or a positional deviation exists, an analysis logic is liable to become complicated, and the processing speed tends to lower.
One method for solving these drawbacks is, for example, a technique according to the inventor' invention disclosed in Japanese Patent Laid-Open No. 143986/1999. With the technique, the following processing steps are executed: (1) First, horizontal segments are extracted from a bitmap image. (2) A rectangle circumscribed to the horizontal segments (a circumscribed oblong) is generated. (3) A skew is theoretically corrected. (4) Subsequently, by referring to form definition structures registered beforehand, candidates for a form are narrowed down on the basis of the circumscribed rectangle, and candidates for the form are narrowed down on the basis of the information of the horizontal segments, thereby to identify the form. (5) Thereafter, characters in a region corresponding to the input field of the identified form are recognized, and the recognized characters are generated from the bitmap image. The information items of the segments and circumscribed rectangles of respective forms are registered in each form definition structure beforehand.
Owing to the technique, advantages as stated below have been brought forth. Even in a case where the form has no page marks or no reference marks, and where a scanner is incapable of detecting the edges of the form, it is permitted to specify a character frame (input field) and to recognize characters. Since the bitmap images can be compared with reference to the circumscribed rectangle formed only of the horizontal segments recognizable at high speed, the speed of the process for identifying the bitmap image can be heightened. Further, since mapping to an ideal image is realized without subjecting the image itself to a skew correction, the speed of the process for identifying the bitmap image can be heightened. Still further, since the four corners of the circumscribed rectangle can be set as virtual page marks, a logic for detecting the character frame with reference to the page marks in the prior art can be diverted. Yet further, since the definition information items of circumscribed rectangles and horizontal segments can be added to existing form definition structures, the information items of the form definition structures in the prior art can be utilized as they are, and a burden on an operator who creates the form definition structures can be lightened.
Meanwhile, as a premise for applying the aforecited technique disclosed in Japanese Patent Laid-Open No. 143986/1999, the direction of the bitmap image needs to be a direction in which the character recognition is possible. More specifically, in a case where the bitmap image is rotated, e.g., 90 degrees or 270 degrees relative to the recognizing direction, the operation of rotating the image into the direction in which the characters can be normally recognized needs to be considered. By way of example, the following processing is executed in a system which is practiced by the inventors: (1) First, the original form is rotated 90 degrees. (2) Horizontal segments are detected, and the form is identified by utilizing the information of the segments. (3) When the form identification is OK, it is followed by recognition processing. (4) When the form identification is NG, the original form is rotated 270 degrees. (5) Horizontal segments are detected, and the form is identified by utilizing the information of the segments. (6) When the form identification is OK, it is followed by recognition processing, and when it is NG, a form error is judged. Incidentally, assumed for the process is the form in which characters are laterally written on an oblong sheet of paper of A4-format or the like, and data in the case where the form is scanned in parallel with the shorter side of the sheet of paper by a facsimile or the like. Therefore, the example corresponds to a case where the rotational direction of the form is limited to 90 degrees or 270 degrees.
In this manner, the drawbacks attendant upon the schemes (1)–(3) stated before can be solved by the aforecited technique or the process of the image rotation operation practiced by the inventors. Problems to be explained below, however, are involved in the aforecited technique or the process of the image rotation operation practiced by the inventors.
With the aforecited technique, it is premised that forms designed for the OCR process (forms for OCR) are chiefly employed. In the form for OCR, the character input frame (input field) is designated by thick lines, and ordinarily characters are enclosed with thick rectangles one by one. In the horizontal segment detection in the technique, the segments can be stably detected as long as the form for OCR is employed. Premising that the horizontal segments can be stably detected, it is reasonable for stabilizing the generation of the virtual page marks (circumscribed rectangle) that the outermost horizontal segments which constitute a circumscribed rectangle of larger area are selected as key segments. Therefore, the technique is premised on the employment of the form for OCR, and it adopts an algorithm in which the circumscribed rectangle (virtual page marks) is generated using the outermost horizontal segments as the key segments. That is, the technique does not take into consideration an algorithm which is based on the horizontal segments stably detectable within the form.
Therefore, in case of employing a non-OCR form which is not designed for the OCR, the technique involves the problem that, in a case where the key segments for determining the circumscribed rectangle cannot be detected on account of a blur, a skew, the lack or stain of a form end, a fold or the like, or where the key segment lacks at a part (particularly on an outer side), stable virtual page marks (circumscribed rectangle) cannot be generated, so erroneous virtual page marks are generated. Especially in a case where a clear area which is an empty area for coping with the skew or expansion/contraction of the form is nonexistent or slight at the peripheral part of the form, the necessary horizontal segment lacks due to a facsimile header in the received data of an image transmitted by a facsimile or the like. In consequence, the form to be essentially recognized is not recognized, and a form identification error develops.
Moreover, a long time period is generally expended on the process for detecting the horizontal segments over the whole form or the process of the rotation operation of the whole image. With the technique of the image rotation operation explained before, one time of rotation operation and segment detection process must be inevitably executed each time one form is dealt with, and two times of rotation operations and segment detection processes must be executed in the worst case. It is always required to heighten a processing speed, and it is desired to adopt an algorithm which can realize a higher processing speed by omitting any wasteful operation.
A non-limiting object of the present invention is to provide a technique which can generate stable virtual page marks even in a non-OCR form so as to stably identify the form.
Another non-limiting object of the present invention is to provide a technique which bestows redundancy on the detection of horizontal segments and can generate stable virtual page marks even when the detection of the horizontal segments is difficult.
Further non-limiting objects of the present invention are to provide a technique which detects the direction of a form rotation operation beforehand, and to provide a technique which can enhance a processing speed by suppressing the number of times of the rotation operations and the segment detection processes of the whole form to the minimum.