Information retrieval systems, generally called search engines, are now an essential tool for finding information in large scale, diverse, and growing corpuses such as the World Wide Web. Generally, search engines create an index that relates documents (or “pages”) to the individual words present in each document. A document is retrieved in response to a query containing a number of query terms, typically based on having some number of query terms present in the document. The retrieved documents are then ranked according to other statistical measures, such as frequency of occurrence of the query terms, host domain, link analysis, and the like. The retrieved documents are then presented to the user, typically in their ranked order, and without any further grouping or imposed hierarchy. In some cases, a selected portion or snippet of text of a document is presented to provide the user with a preview of the content of the document. Depending on the query terms and the document, the snippet may not provide useful information to the user to assess the relevance of the document to the query.
There is a need for an information retrieval system and methodology that can provide more meaningful snippets.