1. Field of the Invention
The present invention relates to a method and apparatus for automatic document recognition and, more particularly, to a method for automatically determining the text line, word and character cell spatial features within a document.
2. Description of Related Art
Optical character recognition and the use of optical character recognition to convert scanned image data into text data suitable for use in a digital computer is well known. In addition, methods for converting scanned image data into text data and the types of errors such methods generate are well known. One problem in converting scanned image data into text data lies in being able to distinguish the individual lines of text from each other, in being able to distinguish words grouping within a single line of text, in determining the vertical and horizontal extents of the character cells within a single line of text, and in properly separating ligatures or kerns between connected components in a single line of text.