The present invention relates to apparatus and methods for facilitating OCR verification processes.
Many key-in applications are automated by scanning the hard-copy pages and using Optical Character Recognition (OCR) techniques to recognize the text written in various fields on the pages. No OCR technique is immune to errors, and hence, the automatic OCR phase is typically followed by a verification phase in which the OCRed characters are verified either automatically and/or manually. For most of these applications the scanned pages are forms in which characters may be interrelated within the field or across fields by some logical relationship such as arithmetic, dictionarial and/or logical relationship. This interrelation between characters can be utilized to automatically verify and/or increase the confidence level of part of the characters involved in the relationship. However, basing the verification on solely automatic methods is seldom sufficient. For the vast majority of applications, and in order to achieve a high accuracy level of final recognition, the intervention of a human operator to manually verify characters is normally needed. A useful techniques to implement this manual key-in verification is to display characters to the operator on a computer screen and have some navigation application for the operator to mark and/or correct the erroneous characters.
In many of the key-in verification techniques, fields are extracted from the scanned images and displayed to the operator along with the OCR results. The operator uses a keyboard or mouse to point at the erroneous characters and mark and/or correct them. This technique is called a video coding technique. U.S. Pat. No. 5,455,875 to Chevion et al describes a method, called SmartKey, for organizing the data on the screen and for utilization of a mouse in such a way that yields an improvement factor of 3-6 in productivity relative to ordinary video coding schemes. That is, each operator can key-in 3-6 times more characters than in conventional video coding techniques, or, equivalently, a lesser number of human operators is required to key-in the data.
The disclosures of all publications mentioned in the specification and of the publications cited therein are hereby incorporated by reference.
The present invention seeks to provide improved apparatus and methods for facilitating OCR verification processes.
To date, two complementary methods are used in order to compose an application: a manual phase, in which new information is obtained from a human operator, generally via a keyboard, and an automatic phase in which new information may be concluded from the logical interrelationships between characters. Both methods are stochastic in nature since they both contain errors. Also, there is no guarantee on the final error rate in the final data. Moreover, these methods cannot be dynamically tuned to approach a desired error rate.
The present invention seeks to provide a new key-in method that interactively combines manual key-in with automatic logical phases such that the manual and automatic phases are data driven. Multiple phases are typically interleaved in such a way that the data acquired in each phase is fed into the next phase typically with the aim of minimizing the number of manual key-strokes. Preferably, a goal of the present system is to optimally use the logical interrelationships between the characters in order to achieve the lowest possible number of key strokes by the human operators, and thus to further increase the overall productivity, measured by number of characters verified per man-hour. The stochastic characteristics of both the automatic logical phase and the manual human operator performance is typically measured, on-line, and the application can be tuned to approach any desired accuracy level, for any type of data, at minimal manual labor time.
The method of the present invention typically achieves an additional productivity factor of 3-10 relative to the aforementioned prior art SmartKey technique. The improvement factor with respect to the prior art SmartKey method depends on the quantity of logical relationships present in the application forms.
The method of the present invention is also termed herein the KIM (Key-In-Management) method. It typically manages OCRed data to/from various key-in stations, where each key-in station (KIS) is a physical or virtual station in which new information is added to the characters sent to these stations, generally via a human operator who supplies feedback regarding data displayed on a computer screen.
A preferred embodiment of the present invention seeks to obtain final OCRed data that best complies with the various logical rules of the form with minimum human labor cost (or maximum efficiency). This goal is typically accomplished by using a set of key-in stations that may supply information on characters at certain accuracy and certain cost. Examples of such key-in stations are human operators keying-in data by video coding techniques using Smartkey carpets, triplets, and/or fields, where Smartkey is a video coding technique described in U.S. Pat. No. 5,455,875 to Chevion. The method of the present invention uses a dispatcher that sends characters to the key-in stations e.g. in order to optimize some predefined cost (of system effectiveness), a logic module that supplies services regarding the inter and intra fields logic rules, and an OCR engine that supplies the probabilities of the OCRed characters. The dispatcher obtains services from the key-in stations and reports to the manager level of the system of the present invention.
Many document processing applications involve extracting ASCII information from images of scanned papers. The basic tool to supply this information is an OCR (Optical Character Recognition) engine that is capable of identifying characters and/or other symbols and marks in a given image. Some OCR engines supply only the most probable guess for each of the characters out of some alphabet of characters applicable for the field containing the character. This type of OCR output is called a hard decision output in which the OCR supplies the final decision for that character. Generally, a confidence level attribute is supplied with the output guess, which is, in most cases, normalized to the range [0,1]. Other OCR engines may supply more information, such as the probability vector of the character image to be one of each of the alphabet characters. OCR engines of this kind are said to supply a soft decision where the calling application may use these probabilities to generate a more global decision for the character images.
The OCR used by the present invention may even include a combination of several types of OCR classifiers via some voting scheme such that the OCR output may be more complex and/or data dependent.
No matter how the OCR of the application is implemented internally it will make errors. In most cases, the error level of the OCR is above the desired error rate required by the application. In order to supply the ASCII data at the error level requested by the application, some of the errors must be corrected. The application may thus use the OCR engine as a tool within a more complex system architecture that eventually supplies the ASCII data at the desired error rate.
In order to decrease the error rate beyond that supplied by the OCR engine, other information regarding the characters is incorporated. A major source of information in many document processing applications is the logical relationships between the application characters. There are many possible types of logical relationships such as: arithmetic (e.g., summation, multiplication, equality, inequality, etc.), syntax (e.g., dates), mathematical formulas (e.g., check-sum digits), and dictionaries (in which words or even phrases are taken from an a priori known dictionary). These logic rules are known a priori, and are defined specifically for each application. In most cases, the application characters comply with the logic rules and these rules may be used to decrease the overall error rate. Of course, one should be aware of the fact that it is possible for some characters to disobey these logic rules (for example when the writer of the character(s) intentionally or unintentionally did write a logically erroneous formula). Sometimes the logic rules may be somewhat vague as for example in a multiplication logical relationship were rounding is used at different accuracy levels of rounding.
The combination of logic rules and OCR may increase the system accuracy but generally it is insufficient for most practical applications. Another source of information that is generally used is manual correction of errors by human operators. Characters are gathered in some form (or in several different forms, depending on the data) and are displayed to human operators that key in the data. Keyed-in information supplied by human operators is also susceptible to errors, depending on human capability, expertise, mental and physical compliance, and the way the data is displayed. In many cases, human accuracy is time dependent since he/she may make more errors after several hours of work. However, sending characters to human operators increases the information gained on the characters and thus may be used to further decrease the system error rate.
Using human operators to key-in characters is costly. The cost is specific to each application but can be defined in terms of labor, space, system response time, etc. Generally, increasing the key-in cost increases the system accuracy. The present invention seeks to minimize the key-in cost of supplying an ASCII representation of the data at a desired accuracy level. Typically, some or all of the following factors are substantially optimized:
what are the characters to be sent to the key-in stations.
In what order should these characters be sent.
How characters are organized and displayed.
To which operator should each character be sent.
When should the key-in process terminate.
The system typically performs a scheduling function, dynamically determining the characters regarding which more information should be requested, the order in which these characters should be sent to the key-in station and the key-in stations to which each character should be submitted. These routing decisions are preferably made dynamically since the information acquired regarding the characters depends on all of the following: the data, the OCR accuracy, the type of logic, and the returned values from the human operators. The system preferably dynamically incorporates the on-line acquired information as to optimize the key-in cost. It also preferably combines all its information sources (OCR, Logic and Key-in stations) at each time instant to take the optimal routing decision.
Since there are many types of OCR engines, many types of Logical rules, and many ways in which data can be gathered and displayed to a human operator, the system of the present invention is preferably as generic as possible in order to enable its application to a wide variety of document processing systems regardless of the individual elements comprising these systems. The method of the present invention is useful for a general model of a document processing system involving OCR engines, Logic rules and generic key-in stations. The system of the present invention is preferably generic, dynamic and costly effective, and is applicable to various types of document processing systems.
According to a preferred embodiment of the present invention, the system of the present invention has some or all of the following features:
Introduction of an overall system model for verification of OCRed data, that incorporates computerized computations techniques with key-in stations personed by human operators in order to optimize desired efficiency cost of the overall system. The computation methods and the ways data is processed, distributed, displayed and collected to/from the key-in stations are optimized together, in contrast to just optimizing each unit separately.
The system can be asked to supply any desired accuracy level. The improvement of the accuracy is monotonic, where it is continuously being assessed and monitored by the system automatically (via a predictor) to yield the desired accuracy at the minimal key-in cost. Assessment of the accuracy level is carried out on-line based on the specific data being processed, rather than just statistical off-line average assessment of contemporary key-in methods.
Logical interrelationships between the form characters are applied in a closed-loop method being verified by human operators in an optimal verification process that verifies the minimal number of characters to achieve compliance with the logic rules applied to the form fields. This is in contrast to current open-loop methods used to apply logic. Moreover, character probabilities are used to optimize the hypotheses suggested by the Logic.
The system can overcome key-in errors of the human operators and/or logically incorrect formulas and yet supply any desired accuracy level of the final ASCII data. This is achieved by both the closed-loop method connecting the logic hypotheses with the results of the human keyed-in information, and by the on-line monitoring of the achieved accuracy level.
The system can correct errors even for True-LRSs (true logical relationships which are logically correct either because their OCR accuracy was perfect or because OCR errors occurred which canceled each other out). The system can also correct errors for characters that do not participate in any logical relationship.
The model is generic in the sense that is may be applied to the vast majority of form processing and key-in applications. It can use any OCR engine. Any desired accuracy level may be requested by the user. There is ample flexibility in defining system parameters such as the key-in cost that is being optimized, key-in stations and the way they operate and manipulate their data, and the trade-offs between manual labor cost and computation consumption cost.
On-line modification of key-in station characteristics: Key-in station characteristics may be learned and tuned on-line to comply with different human operators, different key-in methods, different time of day and/or other working condition parameters. These newly adapted characteristics are readily used by the selector and dispatcher automatically to optimize the usage of the key-in stations.
The probability of making an error in a key-in station can be measured on-line if the true value of the characters are known. Hence, a sub-process can be used inside the key-in process that presents deliberate errors to the key-in station operator whose true values are known. Character images are presented inside the application characters as if they were erroneously recognized, and the response of the key-in station operator is monitored for these characters. Since the true values and the keyed-in values are known, the error probability can be assessed at any desired time instant. The key-in cost can immediately be measured from the application characters themselves as the average key-in time per character.
There is thus provided, in accordance with a preferred embodiment of the present invention, a key-in method including a manual keying in phase, and an automatic accuracy enhancement phase, wherein at least one of the manual phase, the automatic phase and the interaction therebetween is dependent, at least one dataflow point, on data generated previous to the dataflow point.
Also provided, in accordance with another preferred embodiment of the present invention, is an OCR-based system for supplying machine readable text at low key-in cost, the system including a plurality of key-in stations manned by human operators, and a manager dispatching OCR-generated characters to the key-in stations iteratively to monotonically increase character confidence until a desired accuracy level is achieved, wherein the manager is operative to decide which characters are sent to which key-in stations by optimizing to minimize overall key-in cost.
Further in accordance with a preferred embodiment of the present invention, the system also includes an accuracy assessor operative to assess a current accuracy level of the system, on-line, based on data being processed.
Still further in accordance with a preferred embodiment of the present invention, logical interrelationships between input characters are applied in a closed loop interconnecting hypotheses derived from the logical interrelationships with information arriving from the key-in stations.
Also provided, in accordance with still another preferred embodiment of the present invention, is a method for supplying machine readable text at low key-in cost, the method including applying logical interrelationships between input characters in a closed loop interconnecting hypotheses derived from logical interrelationships with keyed-in information, and correcting errors for at least one of the following: true logical relationships, false logical relationships, and characters not participating in any logical relationship.
Further in accordance with a preferred embodiment of the present invention, the method also includes prompting a user to request a desired accuracy level, and performing the applying and correcting steps until the desired accuracy level is achieved.
Further provided, in accordance with another preferred embodiment of the present invention, is a method for supplying machine readable text at low key-in cost, the method including using key-in stations to improve accuracy of machine-generated machine readable text, including dispatching machine-generated machine readable text to various key-in stations using stored key-in station characteristic parameters to reduce key-in cost, and learning the key-in stations"" characteristics and tuning the key-in station characteristic parameters, on-line, thereby to adjust for variation in working conditions.
Further in accordance with a preferred embodiment of the present invention, the variation in working conditions include changes in at least one of the following: human operators operating the key-in stations, key-in methods, time of day, and working conditions.
Also provided, in accordance with another preferred embodiment of the present invention, is a system for supplying machine readable text at low key-in cost, the system including a plurality of key-in stations, and a manager dispatching characters to the key-in stations iteratively to monotonically increase character confidence until a desired accuracy level is achieved, wherein the manager is operative to decide which characters are sent to which key-in stations by optimizing to minimize overall key-in cost.
Further in accordance with a preferred embodiment of the present invention, the input characters include OCR characters.
Also provided, in accordance with another preferred embodiment of the present invention, is a key-in system including a data driven manual unit for accepting input which is manually keyed in, and an accuracy enhancer for automatically processing the manually keyed in input in order to enhance the accuracy thereof including performing a plurality of accuracy enhancing iterations wherein the data flow in each iteration depends on the output of a previous iteration.
Further provided, in accordance with another preferred embodiment of the present invention, is an OCR-based method for supplying machine readable text at low key-in cost, the method including dispatching OCR-generated characters to a plurality of key-in stations iteratively to monotonically increase character confidence until a desired accuracy level is achieved, wherein the manager is operative to decide which characters are sent to which key-in stations by optimizing to minimize overall key-in cost.
Further in accordance with a preferred embodiment of the present invention, the method also includes assessing a current accuracy level of the system, on-line, based on data being processed.
Still further in accordance with a preferred embodiment of the present invention, logical interrelationships between input characters are applied in a closed loop interconnecting hypotheses derived from the logical interrelationships with keyed-in information.
Also provided, in accordance with another preferred embodiment of the present invention, is a system for supplying machine readable text at low key-in cost, the system including a logic unit operative to apply logical interrelationships between input characters in a closed loop interconnecting hypotheses derived from logical interrelationships with keyed-in information, and an error correction unit operative to correct errors for at least one of the following: true logical relationships, and characters not participating in any logical relationship.
Further in accordance with a preferred embodiment of the present invention, the error correction unit is operative to correct errors until a user-selected accuracy level is achieved.
Also provided, in accordance with another preferred embodiment of the present invention, is a system for supplying machine readable text at low key-in cost, the system including a dispatcher operative to dispatch machine-generated machine readable text to various key-in stations using stored key-in station characteristic parameters to reduce key-in cost, and a key-in station monitor operative to learn the key-in stations"" characteristics and to tune the key-in station characteristic parameters, on-line, thereby to adjust for variation in working conditions.
Additionally in accordance with a preferred embodiment of the present invention, the variation in working conditions include changes in at least one of the following: human operators operating the key-in stations, key-in methods, time of day, and working conditions.
Also provided, in accordance with another preferred embodiment of the present invention, is a method for supplying machine readable text at low key-in cost, the method including dispatching characters to a plurality of key-in stations iteratively to monotonically increase character confidence until a desired accuracy level is achieved, wherein the manager is operative to decide which characters are sent to which key-in stations by optimizing to minimize overall key-in cost.
Further in accordance with a preferred embodiment of the present invention, the input characters include OCR characters.
Also provided, in accordance with a preferred embodiment of the present invention, is an OCR system including an OCR unit, at least one manually operable OCR verification stations receiving output from the OCR unit for verification, and an accuracy monitor operative to instantaneously estimate current accuracy of OCR output generated by the OCR unit in conjunction with the OCR verification stations and to terminate operation of the OCR unit and the OCR verification stations when the estimated current accuracy reaches a threshold value.
Further in accordance with a preferred embodiment of the present invention, the accuracy monitor uses character probabilities to estimate current accuracy.
Still further in accordance with a preferred embodiment of the present invention, the hypotheses are based on character probabilities.
Additionally in accordance with a preferred embodiment of the present invention, the machine-readable text includes ASCII output.
Further in accordance with a preferred embodiment of the present invention, the machine readable text includes ASCII.