In a typical optical recognition system, a document, including text and graphics, is processed by a scanner to form a digital representation of the document. Illustratively, the digital image is bitonal with a logic "1" representing pixels which are black and a logic "0" representing pixels which are white. The digital image is then processed by a recognition engine. The recognition engine converts the digitized image into symbolic information about the contents of the image. This symbolic information is then stored in a memory. The recognition engine may be implemented using specially dedicated electronic circuitry or through use of a programmable machine such as a general purpose computer.
One function of the recognition engine is to recognize the characters contained in the digital image so that these characters can be converted to symbolic form. Before the characters can be recognized, it is usually desirable to separate the graphical and text regions of a digital image, to identify lines of text within the text regions, and to locate character boundaries within the lines of text. A variety of techniques have previously been used to carry out this character extraction process including 1) histogram techniques, 2) expansion and shrinking techniques (see, e.g., O. Nakamora et al, "A Character Segmentation Algorithm for Mixed Mode Communication" and T. Akiyama et al, "A Method of Character Extraction From Horizontally/Vertically Printed Document images"), 3) the constraint run length method (see, e.g., F. M. Wahl, et al, "Block Segmentation and Text Extraction in Mixed Text/Image Documents", Computer Graphics and Image Processing, 20 pp 375-390 (1982), and 4) small area segmentation (see, e.g., N. H Yeh et al, "Character Recognition by 1 Board OCR" Proceedings of International Computer Symposium, 1986, December 17-19, Tainan, Taiwan ROC, pages 129-137). These four character extraction techniques locate the character boundaries by scanning each column of pixels to identify the highest and lowest boundary of each character and scanning each line of pixels to get the left-most and right-most boundary of each character.
Each of these four techniques has its own advantages and disadvantages. The advantage of the histogram technique is high speed. However, the histogram technique cannot handle images in which the lines of text are slanted and the histogram technique cannot handle complex documents comprised of text and graphics mixed together. On the other hand, the expansion and shrinking method has the capability of processing complex images comprising graphics and text. However, this method is of low speed, requires excessive computation capacity and is sensitive to noise. Similarly, the Small Area Segmentation Method has high speed and the capability of processing slanted lines but cannot handle complex documents including mixed text and graphics.
Efforts have been made to improve these techniques. For example, ROC Pat. No. 30987, entitled "Block Segmentation Labeling During One Scanning" has mitigated the disadvantages of low speed and excessive use of computation capacity in the constraint run length method. However, this technique is unable to process complex images comprised of text and graphics and is sensitive to noise.
In view of the foregoing, it is an object of the present invention to provide an image processing method which can receive a digital image of a complex document and separate the digital image into areas comprising lines of text and graphics, and determine character boundaries within the lines of text. It is a further object to provide such an image processing technique which is high speed, utilizes a minimum of computation capacity, is relatively insensitive to noise, is capable of handling complex documents comprising mixtures of textual and graphical material, and is capable of processing slanted lines of text.