1. Field of the Invention
The invention relates to a method for extracting individual characters from raster images of a read-in character sequence, and in particular to handwritten or typed character sequences having a free pitch.
2. Description of the Related Art
In the case of automatic character recognition, in the context of raster image conditioning it is necessary, inter alia, to isolate segments which are respectively associated with an individual character from the read-in character sequence. As long as the raster images of individual characters are intrinsically cohesive and are bounded on both sides by white regions, such as white columns or white paths, the extraction of the individual characters presents no particular difficulties. However, this "ideal case" does not exist in the case of closely written handwriting and typing since, in this case, the individual characters often overlap and/or are in contact with one another, which considerably exacerbates separation of the characters, or even makes it completely impossible, because there are no longer any white columns and white paths between the letters. If the writing is only set closer together, but is written with a fixed pitch (i.e. normal typing or handwriting in small pre-printed raster boxes), so-called comb segmenting methods (as disclosed in Wissenschaftliche Berichte [Scientific Reports] A. E. G. -TELEFUNKEN, Volume 47, Number 3/4, March 1974, pages 90-99, Berlin, Dr. J. Schurman: "Bildvorbereitung fur die automatische Zeichenerkennung" [Image processing for automatic character recognition]) can often be successfully used to estimate the pitch and to find the segmenting columns. However, in principle this is not possible in the case of printed documents which are produced using typewriters having proportional spacing or composing machines, and likewise in the case of free handwriting, for which reason previous character recognition methods cannot process corresponding character strings.
The statistical method disclosed in European Patent Application 0 047 512 admittedly allows such character strings to be separated in principle, but is not of sufficient quality in the case of free handwriting.