The present invention relates to a technology of enhancing an efficiency of proofreading text data generated by an auto character recognizing system.
With a spread of the Internet over the recent years, what becomes important is a technology for digitizing an existing paper document into an electronic document. The digitized document from the paper document is obtained normally based on OCR (Optical Character Reader) programs on a computer.
This OCR program automatically recognizes the characters recorded on the paper. A mis-recognition might, however, occur in this character recognizing process. It is therefore dispensable for the document digitized by the OCR program to detect and correct the mis-recognized character (which will hereinafter be called proofreading). An efficiency of this proofreading operation largely depends on an ability of a proofreader. Accordingly, it is essential to exactly grasp at first the proofreader ability in order to enhance the efficiency of the proofreading operation.
No technology of objectively grasping the proofreader ability has, however, been proposed so far. This is because proofreading target documents (which will hereinafter be referred to as manuscript) have a diversity in category, and there do not exist criteria for defining the proofreader ability, wherein especially a degree of difficulty of the manuscript is not clear. Accordingly, the ability evaluation largely fluctuates depending on the modifying target manuscript.
It is a primary object of the present invention, which was devised to obviate the problems inherent in the prior art, to provide a technology capable of objectively judging an ability of a proofreader of a digitized document by use of OCR programs.
The present invention also aims at dynamically evaluating the proofreader ability, detecting a decrease in operation efficiency, setting a proper operation time, and thus judging a operation exchange timing.
To accomplish the above object, according to one aspect of the present invention, a method of managing an ability of a proofreader who proofreads an electronic document generated from a recognition target document by executing character auto recognition programs, the method comprises a step of estimating a character count of potential mis-recognized characters contained in the electronic document, a step of detecting a mis-recognized character discover count as a mis-recognized character count with which the proofreader discovers the mis-recognized characters in the electronic document, a step of detecting a processing time spent for proofreading the electronic document, and a step of calculating a score relative to a proofreader ability based on a ratio of the potential mis-recognized character count to the mis-recognized character discover count per unit time.
The step of counting the potential mis-recognized character count may include a step of counting a non-coincident character count between the electronic documents generated by executing plural types of character auto recognition programs with respect to the same recognition target document, or a step of counting a character count of which a degree of coincidence showing a preciseness of the character auto recognition program-assisted recognition of each character is a predetermined value or under.
The method of managing the ability of the proofreader may further comprise a step of calculating a degree of difficulty of a proofreading target electronic document on the basis of a ratio of the potential mis-recognized character count to a total character count of the electronic document, a step of calculating a proofreader ability level by averaging the scores with respect to the plurality of proofreading target electronic documents per predetermined range of the degree of difficulty, and a step of selecting an optimal proofreader corresponding to the degree of difficulty of the proofreading target electronic document.
The method of managing the ability of the proofreader may further comprise a step of calculating a change in the score relative to the proofreader ability with respect to the operation time for consecutively proofreading the plurality of proofreading target electronic documents, and a step of setting the operation time based on the change in the score relative to the proofreader ability.
The method of managing the ability of the proofreader may further comprise a step of evaluating the proofreader ability for every predetermined operation time, and a step of setting again the operation time on the basis of the change in the proofreader ability.
According to another aspect of the present invention, a system for managing an ability of a proofreader who proofreads an electronic document generated from a recognition target document by executing character auto recognition programs, comprises an information input/output unit (6, 7, 8) for detecting a mis-recognized character discover count with which the proofreader discovers the mis-recognized characters in the electronic document, and a processing time spent for proofreading the electronic document; an information recording unit (3, 4), an information display unit (5), and a control unit (2) for executing a step of counting a character count of potential mis-recognized characters contained in the electronic document, and a step of calculating a score relative to a proofreader ability based on a ratio of the potential mis-recognized character count to the mis-recognized character discover count per unit time.
The system for managing the ability of the proofreader may further comprising a timer. The information input/output unit may detect a start if the proofreading operation and an end of the proofreading operation, and the timer may count a period of time from the start of the proofreading operation to the end of the proofreading operation.
According to a further aspect of the present invention a readable-by-computer recording medium may be recorded with a program is executed by a computer and comprising the processes described above.
As described above, according to the present invention, the ability of the proofreader of the document digitized by the OCR programs can be objectively judged.
Further, according to the present invention, it is feasible to judge an operation exchange timing by dynamically evaluating the proofreader ability, detecting a decrease in operation efficiency and setting a proper operation time.