The invention relates to information retrieval, and more specifically, to a novel method and apparatus for retrieving information using sub-documents comporting to user queries.
Given the plethora of information that is accessible by computer systems, particularly on distributed databases, many information retrieval systems provide sophisticated search tools. A search tool allows a user to specify a query to operate on a set of target documents. Often, a user builds a query by combining one or more search terms with logical operators such as AND, OR and NOT. Then, the query is submitted to a search process, sometimes referred to as a xe2x80x9csearch engine,xe2x80x9d which processes the query and causes the query to operate on the set of target documents that are typically stored on a database. Once the query is processed, any documents that satisfy the query, sometimes referred to as xe2x80x9chits,xe2x80x9d are identified by the search engine and presented to the user. In situations where a large number of documents satisfy the query, additional terms are typically added to a query to reduce the number of hits to a manageable number. A user then selects one or more of the identified documents to be retrieved.
Once the selected documents have been retrieved, the user must review the documents to locate the information specified in the search query. For situations where many documents are selected, or one or more of the selected documents is large, locating the hits within the documents can be an arduous task. To resolve this problem, some information retrieval systems provide a local search utility to re-execute the query to locate the portions of the selected documents containing the hit. However, this requires an extra search of the selected documents.
Consequently, in view of the need to automatically retrieve information and the limitations in the prior approaches for retrieving information at the document level, an alternative approach for automatically retrieving information is highly desirable.
An approach for retrieving information using sub-documents is described. First, a set of sub-documents is established based upon a set of documents. Then a query is processed that operates on the set of sub-documents, causing a score to be generated for each sub-document.
The score for each sub-document is indicative of the relevance of the corresponding subdocument to the query. The scores are reviewed and the sub-document having a score that indicates the highest relevance between the sub-document and the query is retrieved.
According to another aspect of the invention, in response to a user selection, the sub-document having a score that indicates the next highest relevance between the sub-document and the query is retrieved. The sub-documents may be presented to the user in an order based upon the scores.
According to another aspect of the invention, the document containing the sub-document having the score that indicates the highest relevance between the sub-document and the query is displayed and automatically scrolled to the location of the sub-document. Based upon user input, the document is automatically scrolled to other sub-documents based upon their scores. If any of those sub-documents are contained in another document, that document is automatically loaded.