The field of optical character recognition (OCR) has evolved to the point where the individual characters on a relatively "clean" document can be automatically classified and recognized with a high degree of confidence. The reliability of OCR software in this context is due, at least in part, to the fact that the individual characters are well segmented relative to one another. In other words, the spaces between characters are sufficiently clear that the individual characters can be readily distinguished from each other. In such a case, the pattern for each character can be separately classified and grouped into recognizable words, without a great degree of difficulty.
A more problematic situation is presented by lower quality documents in which the distinctions between individual characters may not be clear. For example, documents which have undergone multiple generations of photocopying, particularly with varying degrees of enlargement and reduction, as well as facsimile copies of documents, often do not have sharply defined characters. In these situations, the lines of one character may merge with those of adjacent characters, so that the distinct patterns for the individual characters cannot be readily ascertained.
In such a situation, the character recognition device is forced to impose segmentations based on different features in an image of a word. In some cases, the segmentation can be based on distinguishing aspects of the image, to increase the reliability of the process. As the quality of the image decreases, however, the reliability of such segmentation also diminishes.
Ideally, it would be desirable to analyze all possible (or all reasonable) segmentation paths of a word image. Moreover, for each segmentation path, it would be desirable to analyze all possible (or all reasonable) symbolic interpretations of that segmentation path. Finally, it would be desirable to identify the segmentation path and symbolic interpretation thereof that is most likely to be the correct interpretation of a word. However, there are certain difficulties associated with attempts to practice this ideal recognition process. If all reasonable segmentations are considered, the number of possible combinations of segments that need to be analyzed for potential character patterns goes up exponentially. Furthermore, as the number of segments increases so too does the number of symbolic interpretations, thereby adding another complexity multiplier. To examine all the reasonable segmentation paths and symbolic interpretations for a word of any appreciable length there are literally billions or trillions of possibilities that need to be analyzed to determine the word represented by the image.
Even assuming that a technique could be found which permits examination of the billions or trillions of possibilities, the issue remains as to how to pick the "best" possibility. This can be conceptualized as finding a needle in a haystack. In addition to the complexity issues raised by the combinatorial explosion of possibilities described above, it is not currently known in the art how to assign a probability to an arbitrary segmentation string/symbolic interpretation that reflects its likelihood of being correct with sufficiently high accuracy given the billions or trillions of possibilities. Yet another problem is to provide sufficient flexibility to deal with different types of documents and document portions whose solutions to these complexity issues may themselves differ.
Accordingly, it is desirable to provide a technique for recognizing images in a manner which permits a pattern to be divided into an arbitrary number of pieces that are sufficiently small to avoid the likelihood of overlapping two features, and yet which permits the examination of the possible features represented by the various combinations of the pieces to be performed in an efficient manner.