It has become commonplace for people to use computer systems to search large collections of electronically indexed content. Typically, as part of a search, a person (or, more generally, a “searcher”) interacts with a computer system user interface (e.g., a graphical user interface or an application programming interface) to submit one or more search requests and view corresponding sets of search results (“search result sets”). The following description will be concerned with search requests that, at least in part, use strings of text (e.g., strings of Unicode characters) to indicate searcher interest in portions of a collection of content. Such strings of text are typically interpreted by a search engine as one or more search terms. For example, search terms may include words of a language such as English and logical operators such as ‘and’ and ‘or’.
The collection of content searched by a search engine is typically large, and a typical goal of the search engine is to present to the searcher the most relevant content with respect to a particular search request. However, there are many tradeoffs that may take place as a part of determining relevance of content, for example, with respect to a set of search terms, and conventional search engines incorporating such tradeoffs may be sub-optimal in one or more of a variety of contexts. Such sub-optimality with respect to relevance isn't insignificant. At least, it may be detrimental to searching efficiency. In commercial contexts, for example, sub-optimal surfacing of relevant content may result in significant commercial penalties such as lost sales.
One such tradeoff typically involves deciding how to index the collection of content and/or parse search terms from search requests. For example, a collection index may include information corresponding to a matrix associating search terms (e.g., each row may correspond to a particular search term) with content in the collection (e.g., each column may correspond to a particular item of content in the collection), with each position in the matrix including a relevance score quantifying a relevance of the search term for the item of content. A smaller collection index may reduce computing resource requirements, on the other hand, additional search terms (and thus a larger index) may enhance a relevance of search results to the searcher. Determining which search terms to include in the collection index and/or parse from search requests may be further complicated by the fact that a phrase (i.e., multiple words in a particular order) may indicate an interest different to interests indicated by its component words. For example, a search request using “Newton Baker” may indicate a subject matter interest different to search requests using “Newton” or “Baker” alone.
Same numbers are used throughout the disclosure and figures to reference like components and features, but such repetition of number is for purposes of simplicity of explanation and understanding, and should not be viewed as a limitation on the various embodiments.