1. Field of the Invention
The present invention relates generally to a field of bit-mapped image character recognition and, more particularly, to a method of parsing and analysis pre-process assisting character and text recognition of printed text from bit-mapped binary image or other binary or raster images inputted from scanning device or like or obtained by other way.
2. Prior Art
Segmentation and parsing methods are known in the art. Typically, such methods divide an image into parcels containing homogeneous objects and use a plurality of special computing procedures, each depending on a plurality of parameters to analyze an object.
The most known methods initially divide the object into a plurality of regions, each region further dividing into smaller objects—paragraphs, lines, words, characters, non-text objects etc. After that all prior methods make various pre-recognition analysis and corrections of image to improve its quality for further text recognition. Corrections may include removing distortion of various types—distortion, skew, inversion text matter, cursive character matter, undesired dots—extra or missing etc.
A plenty of methods of preprocess analysis are known that remove the distortions in raster or bit-mapped image. (e.g. U.S. Pat. No. 5,594,815 Jan. 14, 1997). Their productivity depends mainly on an amount of distortions in the image. Each type of distortion is corrected by the only means during the analysis session.
Other kinds of prior art use new methods for parsed regions and objects analysis, differing in the amount of computing resources required. (e.g. U.S. Pat. No. 6,205,261, Mar. 20, 2001). Said methods are not enough universal, they sometimes can't vary the computing volume in accordance with extent of complexity or simplicity of the document structure.
Another known method supposes a single stage of image pre-recognition analysis mainly as an ordinary version, and an addition of one or more profound stages of analysis in a case if errors occurred at stage 1 (U.S. Pat. No. 5,717,794, Feb. 10, 1998). The incorrectness criteria in said method is the difference between the length of character string, most likely comprising a line and a resulting line, as a result of the first analysis session. But for all that no supplemental data is to be collected and used but only a repeating session with the same set of the analysis means.
One more method of the close subject deals with detecting distorted regions on the analysed image and removing the distortion in said regions by special means. (U.S. Pat. No. 5,590,224 Dec. 31, 1996). No supplemental data is collected and utilized. This considerably reduces the accuracy of the analysis. An object of present invention consists in hastening the raster (bit-mapped) image analysis procedure without any loss in quality and accuracy.