1. Problem Solved by the Invention
Optical character recognition systems are useful for automatically reading the contents of a document for storage in a computer memory. An image sensor scans the document and generates image data, which the optical character recognition system transforms into text. The data representing the text is then immediately stored in a computer memory for instant access and processing by the user.
An important requirement is that the optical character recognition system either be able to distinguish between image data representing text characters and image data representing non-text things (e.g., printed lines), or else that the data representing printed lines or other non-text things be deleted from the image data before it is received by the optical character recognition system.
When processing a plurality of different business forms, the optical character recognition system may be more efficient if it knows the locations of the various fields in a given business form containing text characters. For example, if the business form is a sales order form, the data may be used more quickly if the system already knows the location on the form of certain critical information such as the price, quantity, type, delivery address, etc.. Knowing the location of the various fields on the form may also help the system orient the document image correctly in memory, or determine the boundary in the image data between one document and the next document.
Thus, the optical character recognition system needs to know to which one of a plurality of known business forms a particular document (represented by incoming image data) corresponds if it is to operate at the highest efficiency. Therefore, for maximum efficiency, the incoming documents must first be grouped according to type of business form before they are processed by the optical character recognition system. As each group of documents is fed to the system, the user must inform the system as to which type of business form the current group corresponds. The sorting or grouping function may require an unacceptably large amount of the user's time.
Thus, the problem is how to permit the optical character recognition system to operate at maximum efficiency without requiring the user to sort the incoming documents according to type of business form or to inform the system of the type of document about to be received.
2. Prior Attempts to Solve Related Problems
The necessity of first informing an image processing system of the type of form of an incoming document (i.e., the location of all of the printed lines characteristic of a business form) is illustrated, in the case of an image compression/de-compression system, in U.S. Pat. No. 4,020,462 to Morrin. According to the Morrin patent, once the user informs the system as to which form the incoming document corresponds, the system uses the known locations of the various printed lines on that form to cull out the text character data. U.S. Pat. No. 4,504,969 to Suzuki et al. illustrates how an image processing system can recognize rectangular patterns on a document stored as data in a memory, but only if the user first defines the patterns. The problem with such prior techniques is that the user's time and effort are required to provide information about the documents to the image processing system.