The present invention relates in general to searching a corpus of documents, and in particular to search systems and methods that leverage user annotations of documents, including annotations provided by the querying user as well as annotations provided by other users who have a trust relationship to the querying user.
The World Wide Web (Web) provides a large collection of interlinked information sources (in various formats including texts, images, and media content) relating to virtually every subject imaginable. As the Web has grown, the ability of users to search this collection and identify content relevant to a particular subject has become increasingly important, and a number of search service providers now exist to meet this need. In general, a search service provider publishes a Web page via which a user can submit a query indicating what the user is interested in. In response to the query, the search service provider generates and transmits to the user a list of links to Web pages or sites considered relevant to that query, typically in the form of a “search results” page.
Query response generally involves the following steps. First, a pre-created index or database of Web pages or sites is searched using one or more search terms extracted from the query to generate a list of hits (usually target pages or sites, or references to target pages or sites, that contain the search terms or are otherwise identified as being relevant to the query). Next, the hits are ranked according to predefined criteria, and the best results (according to these criteria) are given the most prominent placement, e.g., at the top of the list. The ranked list of hits is transmitted to the user, usually in the form of a “results” page (or a set of interconnected pages) containing a list of links to the hit pages or sites. Other features, such as sponsored links or advertisements, may also be included on the results page.
Ranking of hits is often an important factor in whether a user's search ends in success or frustration. Frequently, a query will return such a large number of hits that it is impossible for a user to explore all of the hits in a reasonable time. If the first few links a user follows fail to lead to relevant content, the user will often give up on the search and possibly on the search service provider, even though relevant content might have been available farther down the list.
To maximize the likelihood that relevant content will be prominently placed, search service providers have developed increasingly sophisticated page ranking criteria and algorithms. In the early days of Web search, rankings were usually based on the number of occurrences and/or proximity of search terms on a given page. This proved inadequate, and algorithms in use today typically incorporate other information, such as the number of other sites on the Web that link to a given hit page (which reflects how useful other content providers think the hit page is), in addition to the presence of search terms on the hit page itself. One algorithm allows querying users to provide feedback by rating the hits that are returned. The ratings are stored in association with the query, and previous positive ratings are used as a factor in ranking hits the next time the same query is entered by any user.
Existing algorithms, however, generally do not take into account preferences of individual users. For example, two users who enter the same query could actually be interested in different things; a page or site that is relevant to one user might not be relevant to another. In addition, different users may have different preferences in areas such as how content is organized and displayed, which content providers they trust, and so on, that will affect how they evaluate or rate a given site. Thus, a site that satisfies one user (or many users) might not satisfy the next user who enters the same query, and that user might still give up in frustration.
Another tool for helping individual users find content of interest to them is “bookmarking.” Traditionally, bookmarking has been implemented in Web browser programs, and while viewing any page, the user can elect to save a bookmark for that page. The bookmark usually includes the URL (uniform resource locator) for the page, a title, and possibly other information such as when the user visited the page or when the user created the bookmark. The Web browser program maintains a list of bookmarks, and the user can navigate to a bookmarked page by finding the page in his list of bookmarks. To simplify the task of navigating a list of bookmarks, most bookmarking tools allow users to organize their bookmarks into folders. More recently, some Internet-based information services have implemented bookmarking tools that allow a registered user to create and access a personal list of bookmarks from any computer connected to the Internet.
While bookmarking can be helpful, this tool also has its limitations. For instance, even with folders it can be difficult for a user to remember which bookmarked page had a particular item of information that the user might be looking for at a given time. Also, existing bookmarking tools generally do not help the user identify whether he (or she) has already bookmarked a given page, nor do they provide any facilities for searching bookmarked information. Further, existing bookmarking technologies do not provide easy ways for users to share their bookmarks with other users.
Thus, it would be desirable to provide improved tools for helping individual users collect and search content that is of interest to them.