The present invention generally relates to data reading apparatuses, and more particularly to a data reading apparatus for reading data from a form sheet and applicable to the so-called optical character reader (hereinafter simply referred to as OCR) which optically reads an image information.
In the present specification, the term "form sheet" is used to refer to a sheet which has a fixed form having spaces which are to be filled in or entered with data. The form sheet includes slips, tickets, debit notes, questionnaires, various kinds of sheets printed with a frame (fixed or standard form), headings and the like, and the data are entered into predetermined spaces in the form sheet identified by a line, a box and the like.
Various OCRs have been developed. The OCR scans a document by use of an image scanner, and reads an image information from the document as image data. The image information may include printed or hand-written characters on a sheet of paper. The characters are recognized from the image data, and the image data corresponding to the recognized characters are converted into character code data.
Compared to the case where an entry of data is made from a keyboard, the efficiency with which the data are entered is improved when the OCR is used as input means for entering character information and the like to processing systems which process character information and the like or to communication systems such as data communication systems which transmit character data. Such processing systems include word processing systems, automatic translating systems, systems for totalling form sheets and systems for producing data files for searches.
The OCR is provided with a dictionary for recognizing characters, and image data of character fonts are pre-registered in the dictionary as reference image information. A character recognition means compares image data of an entered character with image data in the dictionary and finds a pattern which matches that of the entered character. When a matching pattern is found, the character recognition means recognizes the entered character as a predetermined character and generates character code data corresponding to the predetermined character.
Generally, many kinds of character fonts, that is, many kinds of character designs such as types are used. For this reason, the dictionary for recognizing characters must be provided for each of the kinds of character designs.
However, when reading the writing in the document on the OCR, it is impossible to recognize the characters when characters and image information other than the characters coexist in one document, characters of different character designs coexist in one document, no existing format is available on the writing style or the like.
In addition, even when the document contains necessary data which need to be recognized and unwanted data which require no recognition, the character recognition means also recognizes the unwanted data. For this reason, there is a problem in that time is wasted for recognitions which actually do not need to be carried out, and it is difficult to increase the reading speed.
Especially when the OCR is used to total form sheets to process data on the form sheets by reading characters and the like entered on the form sheets, the conventional OCR requires a long reading time because the OCR also reads the fixed form of the form sheet. Furthermore, there is a problem in that the processing of the read information becomes complex.