1. Field of the Invention
The present invention pertains to the field of image analysis, and more particularly to bi-level image segmentation, characterization and recognition. Accordingly, the general objects of the invention are to provide novel methods, apparatus and data structures of such character.
2. Description of Related Art
In the field of image analysis, image recognition requires segmentation and interpretation of connected component objects found within an image. For bi-level images, a connected component object is a group of pixels of a given binary value, (e.g., 1 or black), completely surrounded by a group of pixels of the alternative binary value (e.g., 0 or white). Methods and apparatus for identifying connected component objects within an image, e.g. by performing line-by-line connected component analysis, are known in the art. One such example is disclosed in U.S. patent application Ser. No. 09/149,732, entitled xe2x80x9cSegmenting and Recognizing Bi-level Images,xe2x80x9d filed on Sep. 8, 1998 and assigned to the same assignee as the present invention; this patent application is incorporated by reference in its entirety in the present application.
Following connected component segmentation (or identification), image recognition typically proceeds by extracting a set of features for each connected component object which is needed by a given classification method to uniquely recognize a targeted object. Once this object data has been extracted, well known character recognition methods such as Bayesian, nearest neighbor, and neural network analysis may be used to classify each object by comparing object features with features obtained from a list of reference objects. When the features are similar enough, the unknown object is recognized and the known reference object can be substituted for the previously unknown object during further document manipulation.
While various character recognition methods using feature extraction methods are known in the art, an intuitive, easy to implement, and comprehensive method for feature extraction is notably absent. Additionally, improvements in object recognition speed and accuracy are still desirable. What is needed, therefore, are systematic and comprehensive procedures, apparatus and data structures that capture more of the information inherent in a given connected component object and make it available for subsequent manipulation (e.g., to improve character recognition).
Another deficiency of the related art is that inordinate effort is spent in conventional image recognition attempting to correct transmission or scanning errors which result in connected component objects which are the unrecognizable derivatives of known objects. One common error typically sought to be corrected is the cutting of a recognizable connected component object into two or more unrecognizable objects. For example, character cutting errors may result in the detection of three unrecognizable objects /, -, and  , rather than the single recognizable object, A. Another common error of this nature is the connecting of two or more separately recognizable connected component objects into a single, unrecognizable, object.
With a list of known objects that is limited to the letters of the English alphabet, a typical character recognition method would fail to recognize the sequence of unrecognized objects /, -, and  , as the letter A. What is also needed, therefore, are methods, apparatus and data structures for automatically recognizing future instances of currently unrecognizable connected component objects by adding those objects into a dictionary of reference objects. One of the many advantages of such a development would be that repetitive cutting and connecting errors, etc. would no longer be an impediment to character recognition.
One aspect of the present invention is directed to simple and effective methods, apparatus and data structures to identify bi-level connected component objects found within document images. This aspect of the invention includes intuitive and easily implemented methods for generating novel graphic representations having topographical feature vectors which correspond to connected component objects. In accordance with this aspect of this invention, these graphs can be used to recognize or interpret the objects as objects in a reference dictionary with a high degree of efficiency and accuracy.
In particular, this aspect of the present invention identifies individual connected component objects (3000), extracts pixel data (3100-3430) for each object, determines the connectivity relationships between various pixel runs and records the same into novel data structures (4200). The resulting data corresponds to simplified lumped graphs (or L-graphs) (4100) which use three or more types of nodes to capture characteristic feature vectors of the objects. In such embodiments, the feature vectors of the lumped graphs are compared directly to the feature vectors of the objects (11160) in a reference dictionary (10010). When the feature vectors are sufficiently similar, the object comparator (2070) recognizes or interprets the object as a known object and uses the simplified representation (11190) of the known object for future data processing. Conversely, when the feature vectors are sufficiently dissimilar, the object comparator (2070) preferably sends a message requesting additional assistance in recognizing the object. If the object is recognized with additional assistance, a dictionary updater can add the object to the dictionary of known objects for automatic recognition purposes in the future. As a result of the novel lumped graphs and data structures, the invention enables the use of novel object recognition techniques which offer more flexibility, higher processing speed and greater accuracy in the recognition process.
A particularly preferred aspect of the present invention is directed to more sophisticated methods, apparatus and data structures for interpreting bi-level connected component objects found within documents. Such methods use the same simplified lumped graphs as the embodiments described above. However, once lumped graphs of plural objects have been generated, their feature vectors are compared with one another and objects which meet a minimum level of similarity are clustered together. Additionally, objects which are representative of each cluster are selected and compiled into a library (2050) of cluster representatives. These representative objects are then compared to the known objects in the reference dictionary (10010).
In particular, these embodiments of the invention preferably match each cluster representative to a dictionary object by comparing its feature vector with the feature vectors of known objects stored in the dictionary of known objects. When the cluster representatives have been interpreted as known objects, the remaining members of each respective cluster can also be equated with the respective known objects in the dictionary. The data (now interpreted) from the image can then be outputed as a series of known objects.
Still other aspects of the present invention are directed to generating the lumped graph representations of the targeted objects and to advantageously store data for these objects. In particular, the lumped graphs are generated by capturing the connectivity relationships between pixel runs in the connected component. This data can then be used to create the simplified lumped graphs. These lumped graphs are preferably simplified by identifying adjacent pixel runs which have identical connectivity relationships and then replacing these disparate bits of data with a single edge or stroke.
The data structure embodiments of the invention are particularly well suited to generating and storing object data which corresponds to the lumped graph representations of connected component objects. For example, these data structures are capable of capturing pixel run connectivity data which is used to define various nodes of each lumped graph. Additionally, they are capable of capturing simplified object data in the form of strokes or edges which represent a plurality of pixel runs having identical connectivity relationships. Thus, the data structure of the invention makes more of the information which defines connected components objects available for manipulation (e.g., during object interpretation).