Computers are well-suited for searching vast amounts of information. One type of computer system used to search for information stored in computers is an “information retrieval system”. Generally, in operation, an information retrieval system accepts as input a statement of an information need (i.e., a query) and provides as output a search result identifying a set of one or more documents that the information retrieval system determined were relevant to the query. The documents may include text documents, multi-media documents, web pages, images, audio tracks, videos, and other types of information. An Internet search engine is an example of one type of information retrieval system.
A search result provided by an information retrieval system in response to a query often identifies more than one document as being relevant to the query. In such a case, the search result may provide a textual summary of each identified document in lieu of providing the actual documents themselves. The textual summaries can then be reviewed by a human user who, based on the summaries, decides which documents identified in the search result appear to be most relevant to the query. For example, in the context of web search engines, a search result may comprise a web page presenting an ordered listing of a web search result summaries listed in order of decreasing relevance. Often a web search result summary is presented in Title-Abstract-URL (TAU) format. FIG. 1 depicts an example web search result summary in TAU format. As shown in FIG. 1, the search result summary 10 comprises a title 11, a short keywords-in-context extractive summary or abstract 12, and a Uniform Resource Locator (URL) 13. In this example, the summarized document is a web page containing content about an annual event called “Burning Man” that takes place in a Nevada desert retrieved in response to a query “burning man”.
The title of a search result summary is of particular significance to the user in efficiently and accurately assessing the relevance of a summarized document. For one, the title often appears before other summary information as a heading for the summary. Thus, a user is most likely to read the title before reading any other summary information. Second, the user would ideally be required to read no more than the title to accurately determine how relevant the document is to the inputted query. Given the significance of the title to the user, it is desirable for information retrieval systems to present good quality titles in search result summaries of documents.
One possible approach for providing a good quality title for a search result summary of a document is to provide the title assigned by the creator or author of the document. For example, a web search engine could select, for a web page document, the Hypertext Markup Language Title (HTML) title given to the web page to use as the title in the search result summary of the web page. However, not all documents are given titles by their creator or author. Even where a title is given, the given title may be uninformative, irrelevant, not presentable, or otherwise sub-optimal.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.