A search engine is a computer program that helps a user to locate information. To locate information on a particular topic, a user can submit to a search engine one or more search query terms related to the topic. In response, the search engine executes the search query and generates information about the results of the search. The information about the results of the search, referred to herein as the “search results”, usually contains a list of the resources that satisfy the search query. The resources identified in the search results are referred to herein as “matching resources”.
While search engines may be applied in a variety of contexts, search engines are especially useful for locating resources that are accessible through the Internet. Resources may include files whose content is composed in a page description language such as Hypertext Markup Language (HTML). Such files are typically called pages. Using a web browser, pages may be retrieved by selecting HTML links that contain the Universal Resource Locators (URLs) of the pages.
Depending on the query terms used and the number of pages that contain those query terms, search results may contain so many matching resources that a user may be overwhelmed when trying to determine which matching resources to investigate further. To assist a user in selecting one or more matching resources from a list, the search results may include a short description, or abstract, for each matching resource. By reading the abstract for a given matching resource, a user should be able to better determine whether the matching resource merits further investigation. Abstracts should be relatively short, so that a user may quickly judge the relevance of matching resources listed in the search results.
Unfortunately, abstracts that are displayed by existing search engines frequently fail to provide a user with the most useful information that is contained on a page. The search results generated by existing search engines typically include abstracts that have been generated based on the words that are contained in the matching documents, but without taking into account the kind, quality, or relevance of the various sources of information about the matching documents.
For example, if a user submits a search query for “Chinese food”, some matching resources may contain information about restaurants, and some matching resources may contain information about cookbooks. A particular matching resource may be associated with multiple sources of information about that particular matching resource. For example, one matching resource that contains information about a restaurant might be associated with a database that contains a telephone number and/or an address of the restaurant, and another matching resource that contains information about a restaurant might not be associated with such a database. One matching resource that contains information about a cookbook might be associated with a database that contains a price of, and/or a number of pages in, the cookbook, and another matching resource that contains information about a cookbook might not be associated with such a database. Existing search engines usually do not account for this difference in available sources of information about matching resources. Rather, the process that existing search engines use to generate abstracts for pages about restaurants is usually the same process that existing search engines use to generate abstracts for pages about cookbooks, regardless of the differences in available sources of information about the pages. As a result, existing search engines display abstracts that often fail to capture the most useful information that is contained in the matching resources that the abstracts describe.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.