I. Technical Field
This disclosure generally relates to the field of computer software. More particularly, and without limitation, the disclosure relates to systems and methods for ranking documents.
II. Background Information
Existing search tools allow users to search for documents in a database by specifying queries. For example, in a boolean query, the user specifies operators such as AND and OR to connect search terms. As another example, in a natural language query, the user frames their query as a question. The search tools then execute the user queries by searching a database of various documents. Following the search, the search tools provide a result set of documents (“candidate documents”) to the user for each query. The user will then examine the result set to determine whether to view one or more of the candidate documents.
A user who is searching for documents related to particular topic of interest will often execute a number of related queries to find as many candidate documents for the topic as possible. As a result, users may spend a significant amount of time and effort executing queries on a single topic before deciding that their search has found a sufficient number of relevant candidate documents. In some cases, this is expensive, because users pay to use the search tools on a per-query basis. Although users generally gain proficiency as they learn to use the search tools more effectively, even experienced users typically execute a substantial number of queries on a single topic before being satisfied that the topic has been thoroughly searched.
As a large number of queries are typically executed for a given search topic, and each individual query on the topic may return a large result set of candidate documents, the total number of candidate documents found for a given topic may be large. Users will often keep track of the candidate documents in the result set for each query, and may tend to look more closely at candidate documents that appear in multiple queries, or that appear at or near the top of a list of candidate documents. However, the user must manually identify the candidate documents that are returned by multiple queries.
Moreover, users often tend to disregard certain candidate documents that may actually be relevant, because these candidate documents are ranked relatively low for each query the user executed. For example, the user may enter five queries on a given topic, and a particular candidate document may appear in the result set for each of these five queries, but be ranked relatively low in each result set. In such a situation, the user will often not take the time to review all of the candidate documents in each result set. Instead, the user may focus on the higher-ranked candidate documents. If so, the user would not notice that a candidate document matched each executed query, because the candidate document was ranked too low in each result set for the user to notice. However, the fact that the candidate document matched all or most of the queries suggests that the candidate document may actually be relevant to the topic being searched by the user, even though the candidate document was not ranked highly for any particular query the user entered while searching the topic.