Most optical character recognition (OCR) systems and handwriting recognition methods treat each character as a single connected object, especially when character segmentation is involved. The text may be printed or written on a form or label providing guidelines that touch the characters; or, the writer may underline portions of the text with a line that intersects the characters. Also, a line may be superimposed unintentionally onto a label, for example, during handling of a package. Studies have found that over 35% of handwritten addresses on parcels have at least one character touching a guideline. When they connect characters, such lines make it difficult for recognition systems to identify the individual characters as separate objects.
Some prior systems simply detect the whole guideline and remove it from the original image of the text. This often removes part of character strokes and, therefore, breaks the connectivity of the characters. For example, if an underline touches a "U" at its bottom curve and the system removes the entire underline, the "U" may be broken and recognized as two "I"s.
Other prior systems have attempted to separate characters in an image from ruled lines that contact characters. An example of such a system, which examines prospective contact positions of a character and a ruled line, is described in Japanese Kokai Patent Application No. Hei 31991!-160582. Such systems may have difficulty when the ruled line varies in thickness, when its direction is not very close to horizontal, when it branches into multiple lines, or when it bends significantly. Such systems may also be more suited to machine printed characters and less capable of dealing with handwritten characters.
Thus, there has been a need in the art for a system and method for reliably identifying guidelines, ruled lines, frames, and the like, in images of text, even when such lines vary in thickness and direction, and when the lines touch text that is machine or hand formed. There also has been a need in the art for removing such lines while preserving all of the character strokes of the text.