1. Field of the Invention
The present invention relates generally to an optical character recognition of machine-readable forms, and in particular methods of pre-recognition analysis, especially when fields' and other elements' location is not strongly fixed.
2. Prior Art
A widespread method of document pre-recognition processing comprises parsing the image into regions, containing text, and regions containing non-text objects.
Developers of the known systems follow the path of restriction and simplification of document structure, in order to apply existing image structural identification methods.
A U.S. Pat. No. 5,864,629 (Jan. 26, 1999, Wustmann) discloses a method of marking out a particular part of document logical structure for the case, when special characters are present in some document regions and are not present in others. The method is illustrated by the example of regions containing character “$”.
The method is not enough universal and can be narrowly applied only for special cases.
A U.S. Pat. No. 6,507,671 (Jan. 14, 2003, Kagan, et al.) discloses a method of marking out filled in data input fields of the document. The method is illustrated by the example of standard machine-readable form, of fixed layout.
The method can be applied only for machine-readable forms of fixed layout. It is rather complicated since the step of logical structure mark out is combined with the step of intelligent text recognition in data input fields.
Methods using fields fixed layout property of a form are often used in form identification, but have important drawback consisting in fitness thereof for the only specially designed machine-readable form.
For example, the method of U.S. Pat. No. 5,822,454 (Oct. 13, 1998, Rangarajan) discloses the way of pre-processing a document of the only fixed form—“INVOICE”. The document has the strongly fixed form either by the fields list, or by their properties and layout.
The shortcomings of the method lay in unfitness for processing any other form of document except the invoice form and also inability to change the fields list or overcome spatial deflection of fields.