1. Field of the Invention
The present invention relates to recognition method and apparatus in which various types of recognition processes are performed according to attributes provided for areas of characters, areas of tables, areas of figures and areas of rules automatically extracted from a printed document or a hand-written document in which the characters, the tables, the figures having many lateral and longitudinal lines, and the rules are drawn.
2. Description of the Related Art
Electronic documents have been recently utilized to improve a document referring operation and minimize a document storing space. To store a printed document or a hand-written document in a memory of a computer with an input device, various types of table and figure recognition methods have been developed in which a character such as a Chinese character, a letter, or the like, a table and a figure are directly converted into pieces of electronic data without any other process.
2.1. Previously Proposed Art
In a conventional table and figure recognition method, characters, tables, figures and rules drawn in a recognized document such as a printed document or a hand-written document are initially read with an input device of a computer such as a scanner and are converted into binary-valued images of the characters, the tables, the figures and the rules. The binary-valued images are stored in a memory of the computer. Thereafter, the binary-valued images are compressed to degrade the resolution of the binary-valued images to about a 100 dots per inch (dpi), and pieces of compressed binary-valued image data respectively corresponding to a pixel are output. Therefore, the extraction time required to perform an area extraction process can be shortened. Thereafter, combined black pixel portions which each are composed of a plurality of black pixels adjacent to each other in a lateral direction, a longitudinal direction or a diagonal direction of the recognized document to form a combined group are detected. A rectangle circumscribed about each of the combined black pixel portions is called a circumscribed rectangle. Thereafter, pieces of positional information and pieces of size information of the circumscribed rectangles corresponding to the combined black pixel portions are stored.
Thereafter, a length ratio of a longitudinal side to a lateral side in each of the circumscribed rectangles is calculated, and one or more circumscribed rectangles in which the ratios are respectively larger than a prescribed value are selected as one or more circumscribed rule-rectangles from among the circumscribed rectangles. Each of the circumscribed rule-rectangles is judged to be circumscribed about a rule. Thereafter, a rule attribute is set to each of the circumscribed rule-rectangles.
Thereafter, a length of a longitudinal side and another length of a lateral side are compared with each other for each of remaining circumscribed rectangles not selected as the circumscribed rule-rectangles, and a shorter length between the lengths is selected. Thereafter, one or more circumscribed rectangles in which the shorter lengths are respectively smaller than a prescribed value are selected as one or more circumscribed character-rectangles from among the remaining circumscribed rectangles. Each of the circumscribed character-rectangles is judged to be circumscribed about a character. Thereafter, a character attribute is set to each of the circumscribed character-rectangles. Thereafter, a direction of the circumscribed character-rectangles placed in a line is determined according to the positional information of the circumscribed character-rectangles, so that character lines are extracted.
Thereafter, one or more lateral lines and one or more longitudinal lines existed in each of remaining circumscribed rectangles not selected as the circumscribed rule-rectangles or the circumscribed character-rectangles are extracted. Each of the lateral lines is composed of a series of black pixels adjacent to each other in the lateral direction, and each of the longitudinal lines is composed of a series of black pixels adjacent to each other in the longitudinal direction. Thereafter, the number of lateral lines, the number of longitudinal lines intersection numbers which each correspond to the number of intersections formed on each of the lateral and longitudinal lines are counted for each of the remaining circumscribed rectangles. In cases where the number of lateral lines or the number of longitudinal lines in a circumscribed rectangle is larger than a prescribed value on condition that one of the intersection numbers in the circumscribed rectangle is larger than another prescribed value, the circumscribed rectangles is judged as a circumscribed table-rectangle circumscribed about a table. Thereafter, a table attribute is set to each of the circumscribed table-rectangles.
Thereafter, one or more remaining circumscribed rectangles not selected as the circumscribed rule-rectangles, the circumscribed character-rectangles or the circumscribed table-rectangles are judged as one or more circumscribed figure-rectangle circumscribed about a figure. Thereafter, a figure attribute is set to each of the circumscribed figure-rectangles.
Thereafter, rule areas occupied by the circumscribed rule-rectangles, character areas occupied by the circumscribed character-rectangles, table areas occupied by the circumscribed table-rectangles and figure areas occupied by the circumscribed figure-rectangles are extracted according to the area extraction process. Thereafter, various types of recognition processes are performed for the rule areas, the character areas, the table areas and the figure areas according to the attributes set to the circumscribed rule-rectangles, the circumscribed character-rectangles, the circumscribed table-rectangles and the circumscribed figure-rectangles, so that the recognition of the rules, the characters, the tables and the figures drawn in the printed document or the hand-written document is performed.
2.2. Problems to be Solved by the Invention
However, in the conventional table and figure recognition method, in cases where it is judged whether a circumscribed rectangle is regarded as a circumscribed table-rectangle or a circumscribed figure-rectangle, the judgement merely depends on the number of lateral lines existing in the circumscribed rectangle, the number of longitudinal lines existing in the circumscribed rectangle, the intersection numbers which each correspond to the number of intersections formed on one of the lateral and longitudinal lines. Therefore, in cases where a figure in which many intersections formed on each of the lateral and longitudinal lines exist is drawn in a recognized document a circumscribed rectangle circumscribed about the figure is erroneously judged as a circumscribed table-rectangle. That is, there is a drawback that the recognition processes cannot be reliably performed because the figure cannot be correctly distinguished from a table.