Image content-based retrieval is becoming a powerful alternative/addition to conventional text annotation-based retrieval. Even so, it has yet to reach the robustness and computational effectiveness of text-based retrieval. Text-based retrieval, on the other hand, is notoriously lacking in precision, even when boolean combinations of key-words are allowed. It is a common observation with those using popular conventional search that full text indexing of documents (scanned or electronic) causes a large number of irrelevant documents to be retrieved.
A more productive use of text-based querying is when it is combined with image content-based querying. A special case of this occurs when the text strings relevant for indexing documents occur within image structures, such as text in special regions of a news video or text within region fields of a form. Retrieval based on such structured text can yield fewer but more relevant matching documents.
An example of the above-mentioned special case arises in the area of processing engineering drawing documents, a large number of which still exist in paper form. Creating electronic conversion of such documents is an important business for large format scanner makers. As is known, large format scanners can scan engineering drawing documents at a relatively fast rate of 25 sheets/minute, and are quickly giving rise to very large databases (in excess of 100,000 objects) of large-sized drawing images (e.g., 14000.times.9000 pixels). Currently, indexing of such documents is done manually with skilled keyboard operators, and is considered a highly labor intensive activity constituting a significant cost in the digitizing of scanned images. Manual indexing by a keyboard operator can also be unreliable since the keywords employed by a user may not match the ones attached to the documents during database creation.
In contrast to full-text indexing of pure text documents, automatic full-text indexing using conventional OCR algorithms will not yield useful results for drawing images. Fortunately, useful text information for indexing such drawing images is found in specific image structures called "title blocks". Typically, a title block will include information pertinent for indexing a corresponding drawing, such as part number, name of the unit being depicted, date of design, and architect name. Indexing keyword extraction from such image structures requires that the image structures themselves be first identified.
As will appear from the Detailed Description below, the present invention employs some of the principles underlying a solution for a model indexing problem, namely the principles underlying "Geometric Hashing". Referring to articles by Y. Lamdan and H. J. Wolfson (entitled "Geometric hashing: A general and efficient model-based recognition scheme", in Proceeding of the International Conference on Computer Vision, pages 238-249, 1988, and "Transformation invariant indexing" in Geometric Invariants in Computer Vision, IT Press, pages 334-352, 1992), Geometric Hashing has been used to identify objects in pre-segmented image regions. Another work extending the basic geometric hashing scheme for use with line features includes an article by F. C. D. Tsai entitled "Geometric hashing with line features" in Pattern Recognition, Vol. 27, No. 3, pages 377-389, 1994. An extensive analysis of the geometric hashing scheme is provided in an article by W. E. L. Grimson and D. Huttenlocher entitled "On the sensitivity of geometric hashing", in Proceedings International Conference on Computer Vision, pages 334-339, 1990.
Obtaining suitable geometric hash functions has also been explored in an article by G. Bebis, M. Georgiopolous and N. Lobo entitled "Learning geometric hashing functions for model-based object recognition" in Proceedings International Conference on Computer Vision, pages 543-548, 1995, and a discussion of using the concept of "rehashing" in the context of geometric hashing is provided in an article by I. Rigoustos and R. Hummel "Massively parallel model matching: Geometric hashing on the connection machine" in IEEE Computer, pages 33-41, February 1992.
As taught by now-allowed U.S. patent application Ser. No. 08/878,512 to Syeda-Mahmood (the disclosure of which is incorporated herein by reference), a data structure known as the "geometric hash table" can be used effectively to index handwritten words in a handwriting localization scheme. While the handwriting technique is believed to provide fast search and retrieval in the context of locating and recognizing handwritten word queries in handwritten documents, the same speed for search and retrieval is not obtainable when locating and recognizing two-dimensional patterns in a relatively large document (e.g. engineering drawing document). This degradation of search and retrieval speed is attributable, in substantial part, to the size of the geometric hash table. It would be desirable to provide a system in which localization of two-dimensional patterns could be achieved with a data structure that is considerably more compact than the geometric hash table.
For several types of document images, such as the type of document image associated with typical engineering drawing documents, the size of a corresponding geometric hash table can be quite large. It has been found that a geometric hash table for a group of images from a typical engineering drawing document set can be as large as 40 Gbytes, a size that far exceeds the size of main memory for most computer systems. Thus a database cannot be formed readily for a geometric hash table developed from one of several types of document images. It would be desirable to provide a relatively compact structure that both exploits the principles underlying the geometric hash tree and lends itself readily to searching in databases having images of all sizes.