1. Field of the Invention
The present invention relates to a character reading system for a document or the like, that verifies or corrects character information read from a document or the like and outputs the correct character information, and more particularly to a character reading system for a document or the like that verifies or corrects, at a central station, data related to the characters of a document or the like sent from a terminal through a communication line and outputs this data as correctly read character information.
2. Description of Related Art
In the past this type of character reading system was disclosed in the publication xe2x80x9cOKI DENKI KENKYU KAIHATSUxe2x80x9d Vol. 59, No. 4, pp. 23-26, 1992, xe2x80x9cOCR Application Systems for Financial Industry Information Systems.xe2x80x9d FIG. 6 is a block diagram schematically illustrating an example of this conventional character reading system.
This system is a character reading system applied to a centralized exchange system for centrally managing exchange. This character reading system 10 comprises reading terminals 14 that read and output information pertaining to the characters of a document 12 or the like, and a central station 20 that centrally processes the information sent from the reading terminals 14 and outputs it as correctly read character information. Usually, the reading terminals 14 are installed at various remote local stations 16 such as xe2x80x9cplace of business,xe2x80x9d and are constituted by an OCR (hereinafter this OCR will be referred to as a remote OCR) or a facsimile device, for example. At the central station 20, the various information pertaining to characters sent from these reading terminals 14 via a communication line 18 is centrally processed and outputted as correct character information.
With the conventional structural example shown in FIG. 6, a remote OCR 14 for reading characters printed or recorded on a document 12, such as a bank transfer request, is installed at each local station 16. This remote OCR 14 is a device that reads the image data IMG of the document 12, converts this into character data DATA, and transmits the image data IMG and the character data DATA to the central station. The remote OCR 14 is connected to a communication network 18 such as an ISDN (Integrated Service Digital Network). Specifically, the character data include so-called xe2x80x9ccharacterxe2x80x9d data and xe2x80x9cnumericalxe2x80x9d data.
Meanwhile, the central station 20 is furnished with a storage device 22 for temporarily storing information, a correction terminal 24, a verification terminal 26, a gateway 28, a LAN 30 that is connected between these constituent elements 22, 24, 26, and 28 in order to allow information to be passed back and forth between these constituent elements, and a host computer 32 that is connected to the gateway 28. The correction terminal 24 and the verification terminal 26 are each constituted by a separate microcomputer. The storage device 22 is also controlled by a separate microcomputer.
The communication network 18 is connected to the storage device 22. This storage device 22 temporarily stores the image data IMG and character data DATA sent from the remote OCR 14 through the communication network 18 in order to correct and verify the character data.
The correction terminal 24 is used to decide whether or not the character data DATA stored in the storage device 22 is correct, and if the character data DATA is incorrect, an operator of this device corrects it. To this end, a conventional correction terminal 24 simultaneously displays on a monitor screen all of the image data IMG of the document 12 along with all of the character data DATA that has been read, including both correct character data and incorrect character data. The operator in charge of correction then makes a visual comparison of the image data with the various character data, and if an error is discovered in the character data, the operator uses the keyboard of the correction terminal 24 to change this incorrect character data to correct character data. The corrected character data DATA is transferred along with the image data IMG to the verification terminal 26 via the LAN 30.
The verification terminal 26 is a device that verifies that the correction has been properly carried out at the correction terminal 24. To this end, this verification terminal 26 simultaneously displays on a monitor screen all of the image data IMG of the document 12 along with all of the character data DATA that has been read, including both correct character data and incorrect character data, just as with the correction terminal 24. The operator in charge of verification then makes a visual comparison of the displayed character data and image data and decides whether the corrected character data is in fact correct. If it is decided that the corrected character data is correct, the operators uses the keyboard of the verification terminal to enter this, and the correct character data is sent to the gateway 28 via the LAN 30, and is outputted from the gateway 28 to the host computer 32.
Meanwhile, if it is decided at the verification terminal 26 that the character data is incorrect, this is entered by the operator using the keyboard of the verification terminal, and the image data IMG of the document 12 and the character data DATA are returned to the correction terminal 24 via the LAN 30.
With a conventional character reading system structured as above, the following operational steps are taken in order to output correct character data. Character data is recognized and read from image data by a single recognition on the basis of a certain character recognition method. Then, regardless of whether the character data that has been read is correct or not, all of the read character data is displayed along with the image data on the monitor screen of the correction terminal 24, and an operator looks at the character portion of the image data while making a direct visual comparison with the corresponding read character data, and decides whether the character data is correct. Next, after the incorrect character data has been corrected, the correct character data that was not corrected and the corrected character data are transferred along with the image data to the verification terminal 26, and these sets of data are simultaneously displayed on the monitor screen of the verification terminal 26. The operator again makes a direct visual comparison of all of the character data displayed on the monitor, and decides whether each set of character data is correct. Only when it has been decided that all of the character data displayed on the monitor screen of the verification terminal 26 is correct, the correct character data is outputted as read character data to the host computer 32 via the gateway 28.
However, accuracy has been improving for the character recognition itself of the document 12 or the like using OCR or the like. On the other hand, this has not led to a reduction in the correction and verification work that has to be performed by operators at the correction terminal 24 and the verification terminal 26. The reason for this is that since all of the read character data is displayed on the monitor screen, the operator has to decide the correctness of the character data and perform correction for the character data that needs correcting for all of the character data, regardless of whether correction is needed or not.
Therefore, an object of the present invention is to provide a character reading system for a document or the like with which the work of correcting and verifying character data read from a document or the like is reduced as much as possible.
The inventors pertaining to this application conducted various research and experimentation in an effort to achieve this object. First, the inventors focused on the fact that there is a variety of character recognition methods. For example, it is known that recognition algorithms for recognizing characters can be broadly grouped into pattern matching methods and structural analysis methods. There is also a variety of recognition methods among these pattern matching methods, depending on how the pattern is taken, how the dictionary is used, and so on. In view of this, it was thought that character recognition could be performed more reliably if two different character recognition methods were employed for a single recognition object. As a result, the inventors arrived at the conclusion that if the recognition of character data from image data were carried out individually by two different character recognition methods, a mutual comparison made of the character data sets obtained as a result, and the character data that matched in this comparison outputted directly as correct character data, then all of the read character data would not have to be displayed on the correction terminal and the verification terminal.
According to the first aspect of this invention, there is provided a character reading system in which first and second character data are separately read-out by two different character recognition methods, a decision is made as to whether the character data that has been read is correct by whether these first and second character data match or not, and correct character data is outputted just as it is, without being displayed on a correction terminal or verification terminal.
Therefore, this character reading system comprises: a first data reading component that reads image data from a recording medium such as a document in which the characters to be read are stored, recognizes first character data from the image data on the basis of a first character recognition method, and outputs this image data and first character data; a second data reading component that checks whether the above-mentioned first character data matches second character data recognized from the above-mentioned image data on the basis of a second character recognition method different from the above-mentioned first character recognition method, and outputs the above-mentioned first or second character data as correctly read character data if there is a match, but outputs the above-mentioned first or second character data as incorrect data if there is no match; a correction component having a display that receives and displays the above-mentioned image data and incorrect data, for correcting the above-mentioned incorrect data into correct character data while the operator compares the displayed image data and incorrect data; and a memory component that readably stores image data and first character data from the above-mentioned first data reading component, the above-mentioned second character data, and correct character data corrected as above.
With the constitution of the present invention, in the first and second data reading components, first and second data are respectively recognized and read, on the basis of mutually different character recognition methods, from image data that has been read from a single document. If there is a match between the first and second character data respectively read on the basis of mutually different character recognition methods, then the character data that is the result of this reading can be deemed to be correct character data. Accordingly, the correct character data for which the reading results have been deemed to be matching in the second data reading component is outputted as character data for the required post-processing without being sent to a correction component. If it is decided that the read first and second character data do not match in the second data reading component, this means that the first and/or the second character data is incorrect character data, so only in this case is the first or second character data sent to the correction component for the correction of the character data, and this incorrect character data is changed to correct character data at the correction component. Image data is displayed as an image on the monitor of the correction component, while the data to be corrected is displayed in a character font.
Thus, one of the two sets of character data is sent to the correction component as data to be corrected only when there is a mismatch between the first and second character data that have been individually read by two different character recognition methods for the same image data, and therefore the data to be corrected is the only data that is displayed simultaneously with the image data on the display component (monitor screen) of the correction component. Therefore, the operator only needs to compare the image display of the image data with the font display of character data that is the data to be corrected and to perform correction, verification, or the like for this data, so much less work is entailed by correction, verification, and the like than in the past.
In the implementation of the present invention, it is preferable for the first data reading component to be constituted by a document image data reading component and a first recognition component that recognizes character data as first character data from this image data on the basis of a first character recognition method. For example, it is good to use an OCR (Optical Character Reader) as the first data reading component. Each of the OCRs is separately installed as a reading terminal at each of remote local stations and may be coupled to a memory component at a central station through communication lines.
Alternatively, in the implementation of the present invention, the first data reading component may be constituted by a facsimile device as the image data reading component and an OCR as the first recognition component. In this case, a facsimile device is installed as a terminal at each local station, and an OCR is installed at the central station and linked to a memory component, allowing the OCR at the central station to be linked to these facsimile devices via communication lines.
Also, in the implementation of the present invention, it is good for the second data reading component to be constituted by a second recognition component that recognizes character data as second character data from image data on the basis of a second character recognition method, and a decision component that checks the first character data and second character data and decides whether the two sets of character data match or not. It is favorable, for example, for this second data reading component to be constituted by a second OCR.
It is also favorable for the memory component, the second recognition component, the decision component, the correction component, and, in some cases, the first recognition component to be linked together via a LAN. The verification component for verifying whether the corrected character data is correct or not may also be linked via this LAN as required. A gateway may also be linked via the LAN as needed, with this gateway linked to a host computer such that the uncorrected correct character data or the correct character data obtained by correction can be outputted to this host computer.