In order to save characters and the like printed or hand-written on a form (such as a slip) as data, character recognition apparatuses that scan a form as a color image by a image scanner and the like and perform recognition of characters in the scanned color image data have been known.
A form has a plurality of items in which characters are printed (or written), with the size, position of the character box of each item being different for each form. Conventionally, the operator needs to prepare definition information that specifies the shape, position, color, and the like of the character box in advance, so that characters in such forms in various formats may be recognized.
However, the method in which definition information is prepared in advance has a problem that the number of operation processes for the operator becomes large.
Patent Document 1 describes that connected components of black pixels in a binary image are sorted into a horizontal line, a vertical line, a character and a broken line element, and when the white ratio being the ratio of a space between two adjacent broken line elements in the horizontal direction to the total length including the space is equal to or below a threshold value, it is determined that the broken line elements may be connected.
Patent Document 2 describes that a rectangle is extracted from a binary image; the number of black pixels in the rectangle is counted; and the number of pixels with respect to the area of the rectangle is calculated as the black pixel occupancy. Then, based on the black pixel occupancy, whether or not the rectangle forms a dotted line is determined.
Patent Document 3 describes that a rectangle of black pixel continuous elements is extracted from a binary image, and a solid line box corresponding to a cell of a table is extracted from the extracted rectangle. Then, a rectangle corresponding to a dotted line element within the solid line box is extracted, and a dotted ruler is extracted by combining dotted line elements within a predetermined distance.    [Patent Document 1] Japanese Laid-open Patent Publication No. S61-175880    [Patent Document 2] Japanese Laid-open Patent Publication No. H10-97588    [Patent Document 3] Japanese Laid-open Patent Publication No. H11-242716