1. Field of the Invention
The present invention relates to a method of and an apparatus for separating adjoining image regions, suitable for use in recognizing discrete characters from optically input image information in the form of a pattern composed of images of characters adjoining or connected to each other.
2. Description of the Related Art
In recent years, apparatuses have been proposed which are capable of optically and directly inputting character information by using optical means such as image scanners. Such apparatuses are very useful as they enable entry of character information from hand-written or printed or typed characters. In use of these apparatuses, however, recognition of discrete characters is often hampered by the fact that the optically read images of two or more successive characters on the original undesirably form a continuous image pattern composed of the character images adjoining and connected each other, due to various reasons such as a practical limit in the resolution of the image scanner, stains on the original document, and so forth.
Methods have been proposed to separate such adjoining images or patterns into discrete character images to enable correct character recognition, among the most popular method for separating adjoining images is one which utilizes a histogram of a projection of the image along the direction of the train (the direction of the line of writing) of the characters.
This known character separation method will be described with reference to FIGS. 7 and 8. It is assumed here that characters "a" and "c" on an original were optically read and that a composite image composed of images of the characters "a" and "c" adjoining each other has been obtained as shown in FIG. 7. When the height-to-length ratio of an envelope rectangle 201 of this composite image is equal to or below a certain value, i.e., when this composite image is horizontally very much elongated, the image is determined as being a composite image composed of images of successive characters adjoining each other and, hence, a demand arises for separation of this composite image into images of the discrete characters "a" and "c". To cope with this demand, a region 202 in which the separation is to be done is determined on the basis of the information concerning the envelope rectangle 201, and black pixels forming this composite image are projected on the X-axis of a coordinate so that a histogram 203 indicative of the frequencies of appearance of black pixels along the train of characters is obtained as shown in FIG. 8. Then, an X coordinate X.sub.c, which is within the above-mentioned region 202 and which has a histogram value d(x) not greater than a predetermined threshold d.sub.0, is determined as shown in FIG. 8. When there are a plurality of such X coordinates Xc, the mean value of these coordinates is determined as the coordinate Xc.
The composite image is then "cut" into separate character images at the portion thereof corresponding to the X coordinate Xc. Thus, when the height-to-length ratio is equal to or below a predetermined value, the read composite image is determined as being a combination of images of successive characters, and the region 202 within which the composite character is to be cut is set on the basis of the configuration of the composite image. Then, the composite image is cut into discrete character images at a portion thereof where the frequency of occurrence of black pixels is lowest within the set region, i.e., at the portion where the degree of bond between a plurality of groups of pixels is weakest.
This known method, however, suffers from the disadvantage that the separation of character images is often hampered by influence of noise, due to the fact that the position where the composite image is "cut" is determined on the basis of the black pixel histogram.
For instance, a histogram as shown in FIG. 10 is obtained when a stain on the original sheet is optically read to form an image 204 shown in FIG. 9 or when characters on the line which is immediately preceding the line which is being read extend far enough down on the scanned manuscript, etc., to have happened to be read concurrently.
The image 204 of the stain increases the number of black pixels on the X-coordinate Xc to a value greater than the threshold value d.sub.0. In other words, there is no X-coordinate where the projection of the black pixels is equal to or below the threshold value d.sub.0 (see FIG. 10). As a consequence, the composite image composed of images of characters adjoining each other cannot be correctly cut into discrete character images, making it impossible to recognize these characters.