1. Field of the Invention
This invention relates to improvements in text retrieval systems. More particularly, this invention relates to a text retrieval systems for retrieving a document based on a comparison of the signature of a word in the text and the signature of a query term.
2. Description of Related Art
Considerable interest has been devoted to improving text retrieval systems. Text retrieval systems generally provide location information of individual words within the documents collected in the set or corpus of documents. The location information is generally kept in an inverted index. The location information can be, for example, a word offset from the beginning of the document by the number of words from the beginning at which the word is located. The location may contain, for example, an offset from a beginning of a section, paragraph, section number, sentence number, or other such location indicating index.
In the case of a combined image and text system the location information can be a page number with x,y coordinates and a height and length. This information serves two purposes. It makes more efficient searches where there is a constraint or value associated with the proximity of two or more terms to each other. Without the proximity information in the index, the text document would have to be examined from the beginning to find where in the document the two words occurred. The other purpose is to facilitate providing feedback to the user on why a particular document was selected by a search. A small segment of the document might be shown to the user perhaps with the terms highlighted that caused the document to be selected. The location information may make it possible to display and highlight the relevant text without reading the whole document.
Location information can be one of the largest components of an inverted index. However, it is often desirable in text retrieval systems to keep the index overhead to a minimum. The computation involved in merging possibly long lists of location information can be extensive. The present invention presents a technique which decreases the computation necessary to do proximity search while not increasing the indexing overhead wantonly.