1. Technical Field
The embodiments disclosed in this application generally relate to Graph-Theory based pattern recognition technologies used for recognizing objects such as images.
2. Background
Pictographic Recognition (PR) technology is a term used herein to describe a Graph-Theory based method for locating specific words or groups of words within handwritten and machine printed document collections. This technique converts written and printed text forms into mathematical graphs and draws upon certain features of the graphs (e.g., topology, geometric features, etc.) to locate graphs of interest based upon specified search terms or to convert the graphs into text.
PR has been successfully used in the past as a search and recognition tool by identifying individual characters in strings of cursive handwritten English and Arabic script. However, the free flowing structure of handwritten text, especially Arabic, has posed some unique challenges for PR-based methodologies. First, Arabic is written in a cursive form so there is no clear separation between characters within words. Often, writers take considerable license in writing Arabic strings so that characters are either skipped or highly stylized. This makes it difficult to parse the string automatically into separate characters and to identify the individual characters within an Arabic word using computer-based recognition methodologies. Second, Arabic characters change their form depending on their word position (e.g., initial, middle, final, standalone, etc.). Third, Arabic words incorporate external characteristics such as diacritical markings. Lastly, Arabic writers often add a second “dimension” to writing by stacking characters on top of each other and the Arabic language is heavily reliant on ligatures (i.e., multiple characters combined into a single form) All these characteristics contribute to considerable dissimilarities between handwritten and machine printed forms of Arabic.
These dissimilarities make it difficult to achieve satisfactory results using existing PR techniques. Moreover, there is little or no ability to extend such techniques to, e.g., images. This is primarily because it is very difficult using conventional techniques to convert an image into a suitable graph. This is because satellite imagery, photographs, and other types of remote sensing images rarely surrender their information readily to computer algorithms. The information usually has to be coaxed out of the images through a sophisticated series of processing steps. By their nature, these images contain background clutter and superfluous information, atmospheric effect, and many other flaws that degrade the image quality or create a confusing “field” of information surrounding an object of interest. Often, these defects must first be eliminated—or, at least attenuated—before objects of interest within the images can be detected, extracted, and/or identified. Or, a method must be applied that can distinguish items of interest within noisy backgrounds and “surgically” extract them from their surroundings.
The practice of imagery analysis dates back to the dawn of aerial reconnaissance during World War I. Although new technologies such as multi-spectral imagery have been perfected and numerous image analysis techniques have been developed during the intervening years, image understanding remains firmly in the domain of human experts to this day. The expert must still sift through exceedingly large amounts of data, before he is able to employ the full power of his modern tools to the problem.
Some of the most successful approaches of image content analysis rely heavily on human judgment. One such approach requires the human expert to analyze an image and annotate and extract the key features from it in accordance with his interests and skills. This method came to the fore during World War II and the early part of the Cold War. While it met the needs of its users, it is labor-intensive, expensive, error-prone, and inefficient.
Another, more modern, approach is the one adopted by the current crop of web search engines. This method relies on the textual information that was inserted into the images by its creators. It is highly effective and requires no new technologies, since it leverages the power of the text search engine; however, such techniques rely implicitly on the judgment of the content creators. If the images were tagged with incorrect or incomplete textual information, the results are entirely useless. And, such methods cannot handle untagged images.
A similar but slightly more sophisticated approach examines not only the image tags, but also the text that surrounds the image in a document, as well as the text surrounding the URL in other documents that link to this image. Such approaches also suffer from the common ailment as the tag-based approach, because the results are entirely dependent on the content creators' judgments.
Content-based image retrieval methods and systems employ algorithms that actually analyze the content of the image. Some of the well-known content-based image retrieval systems use a combination of simple ideas including color histograms, Gaussian descriptors, Fourier descriptors, and wavelet signatures. In spite of their higher level of sophistication, however, these techniques typically cannot handle the rich set of low-level structural details, due to the fact that they work only with abstract feature sets such as color blobs, shape edges, and straight lines of specific orientations.
The biometric face identification (facial recognition) systems employ some of the most complex techniques capable of dealing with the minute, detailed features of the human face. Thought highly sophisticated, these techniques cannot cope adequately with background clutter, poor lighting, partial occlusion, and angular distortions.
In sum, the existing techniques for recognizing objects within imagery and for comparing and searching images are limited by the nature of the feature sets they employ and the levels of abstraction they apply to those feature sets. Images by nature present feature vectors of very high dimensionality requiring solutions that reduce feature dimensions down to manageable size. Often this requires substantial amount of abstraction computations. For instance, such abstraction can involve distilling the content of the image into a distribution of pixel colors, edge crossings or similar measures that yield very efficient computations, but sacrifice large amounts of significant information.