In conventional document recognition, a line is extracted using rectangle information based on “a set of black pixels that seem to be characters” or “a circumscribed rectangle of pixels that seem to be characters.” Failure in estimating a size of the circumscribed rectangle may lead to failure in a line extraction, because the estimation of the circumscribed rectangle substantially influences on the appropriateness and inappropriateness in of a line extraction. Thus, technologies using character recognition for evaluating how much likely it is that an object seems to be is a character have been developed.
Japanese Patent No. 3913985 (corresponding English publication is U.S. Pat. No. 6,701,015), for instance, discloses a technology that judges whether objects in an area that includes a noise are character components or not using the number of character components or character recognition indicia. Moreover, Japanese Laid-open Patent Publication No. H. 11-219407 (corresponding English publication is U.S. Pat. No. 6,332,046) discloses a technology in which character strings are extracted based on uniformity of characters, and evaluates the character strings are evaluated using character recognition. Furthermore, Japanese Laid-open Patent Publication No. H. 04-211884 discloses a technology that segments a character from a contact character when a line is known.