1. Field of the Invention
The present invention relates to generation of annotations for documents and for search engine output results as a means for assisting a user in selecting relevant search results.
2. Description of the Related Art
The World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly.
Search engines attempt to return hyperlinks to web pages in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to provide links to high quality, relevant results to the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web pages. Web pages that contain the user's search terms are “hits” and are returned to the user.
The overriding goal of a search engine is to return the most desirable set of links for any particular search query. Annotation generation is one of the aspects of providing search results and managing the search process. Annotations are meant to summarize what the documents are “about”—conventionally they are sentences that are mentioned in the documents themselves, and which supposed by capture the meaning of the document.
Conventional search engines have two ways to specify the number of sentences in the annotations that they provide—one is a percentage of the total sentences in the document (e.g., 5% or 10%) or a maximum number of sentences that the user can specify (or, alternatively, a certain number of words before and after some word in the found text). Neither of these approaches is, in fact, satisfactory. For example, with very large documents, (e.g., 50 or 100 pages of text), specifying what appears to be a relatively low percentage (such as 10%) would still result in five or ten pages of text that the reader has to “digest.”
The conventional solution to this is to give the user the flexibility to adjust the parameters of annotation—e.g., giving the user the ability to switch between percentages and fixed maximum number of sentences presented, and giving in the ability to adjust the actual values of the percentages or numbers of sentences. The disadvantage of this approach is that the user, instead of concentrating on the substance of his search, has to instead constantly manipulate parameters that are not directly related to the subject matter of his search—in other words he has to manipulate the parameters of what is displayed on the screen, rather than adjusting the search query itself.
Conventional search engines typically annotate their search results by producing a few (typically between one and three) sentences in which the words of the query are found. This does not necessarily produce the most relevant annotations. For example, a user searching for documents relating to the Boeing 787 Dreamliner can input, as his query, “Boeing 787 Dreamliner.” One of the hits in response to such a query might be an article in a magazine about a completely unrelated subject, with a paragraph at the end of the article saying something to the effect “and in our next issue, look for a detailed discussion of the design process of the Boeing 787 Dreamliner.” This sentence will be picked up by the search engine, and the document presented to the user (possibly with a relatively high ranking), even though the actual “meaning” of the document has nothing to do with the subject matter of the query.
Accordingly, there is a need in the art for a system and method for generating contextually relevant annotations.