The present invention relates to document matching. More particularly, the present invention relates to document matching using structural information.
Many schemes exist to retrieve documents meeting parameters defined by a person searching for the documents. The most common method of searching is based on text. For example, a person searching documents can define a text string including one or more keywords included in the desired document, or the person searching can define a Boolean search to find documents meeting desired or specified content.
Schemes also exist to search document images. For instance, image searching schemes exist that utilize the line structure of documents. Such searching schemes are typically used with schematics, maps, flowcharts, etc. Another example of an image searching technique is shown in Niblack, W., et al., xe2x80x9cThe QBIC Project: Querying Images By Content Using Color, Texture and Shapexe2x80x9d SPIE Proceedings, Vol. 1908, pp. 173-187 (February, 1993).
People often search physical documents because they can easily recognize documents based on their visual appearance. For example, if a person knows that a particular diagram was included in a set of presentation slides, the person can quickly search stored documents for the diagram and retrieve the related slides. However, if a person cannot remember the particular document that contains a desired diagram, then more documents may have to be examined. For documents stored on a computer system, examining and opening and thoroughly reviewing a file may be extremely time consuming especially where the number of files that must be searched is large.
What is needed is a scheme for automatically searching for an electronic document based on the visual appearance of the document.
A method and apparatus for document matching using structural information is described. A target document is analyzed to generate structural information that describes the target document. The structural information describing the target document is compared to structural information describing a set of stored documents. One or more of the stored electronic documents are retrieved based on a match between the structural information describing the target document and the structural information describing the stored electronic documents.