In general, it is technically important for document processing apparatuses to extract written characters which overlap with entry boxes. To easily separate the entry boxes and written characters, conventionally, the entry boxes are printed in a dropout color, which can be removed when reading, on condition that a reading apparatus which can distinguish colors is used, or a gray color is used for the entry boxes on condition that a reading apparatus which can distinguish gradation is used. Due to the cost of reading apparatuses, the cost of document printing, and a need for continuing to use existing single-color documents, however, recognizing binary (in many case, black-and-white) documents has been increasing. In these cases, to recognize characters which overlap with a box, a small portion outside the line of the box is detected and this is used as a key to perform a process. It is, however, not easy to solve this issue fundamentally.
There is also a need for reading documents sent through binary facsimile transmission. There have also been increasing demands for improving a method for reading black-and-white documents on which characters are handwritten by ball-point pens or pencils, which serves as a method which can also be used for documents sent through facsimile transmission.
FIG. 37 shows a structural view of a conventional document processing apparatus (see Japanese Examined Patent Publication No. Sho-63-18786).
On a document 101, character entry boxes are printed at lighter gray than characters to be written in the entry boxes. Photoelectric conversion means 2 that can distinguish various levels of darkness performs photoelectric conversion such that the lighter gray of the character entry boxes is converted to a smaller number and the darker gray of the characters are converted to a larger number for a one-line area of the document 101. The number to show the degree of darkness of each pixel in the one-line area obtained as a result of photoelectric conversion (hereafter we call the number as gray level) is stored in storage means 103. The content stored in the storage means 103 is sent to character-entry-box position detection means 104. The character-entry-box position detection means 104 counts the number of pixels having a predetermined gray level along the row direction and the column direction. When the counts exceed predetermined values for the row direction and the column direction, the character-entry-box position detection means 104 determines that character entry boxes are disposed at that position, and sends the character-entry-box position information to character extraction means 105. The character extraction means 105 uses the character-entry-box position information and the stored content sent from the storage means 103 to extract characters.
As described above, in the conventional case, since the character entry boxes are detected and characters are extracted by using the fact that the gray level of the character entry boxes is low, a mark for extracting characters is not required.