The amount of available information and digital content on the Internet and other electronic sources continues to grow rapidly. Given the vast amount of information, search engines have been developed to facilitate searching for electronic documents. In particular, users or computers may search for information and documents by submitting search queries, which may include, for instance, one or more words. After receiving a search query, a search engine identifies documents that are relevant based on the search query.
At a high level, search engines identify search results by ranking documents' relevance to a search query. Ranking is often based on a large number of document features. Given a large set of documents, it's not feasible to rank all documents for a search query as it would take an unacceptable amount of time. Therefore, search engines typically employ a pipeline that includes preliminary operations to remove documents from consideration for a final ranking process. This pipeline traditionally includes a matcher that filters out documents that don't have terms from the search query. The matcher operates using a search index that includes information gathered by crawling documents or otherwise analyzing documents to collect information regarding the documents. Search indexes are often comprised of posting lists (sometimes called an inverted index) for the various terms found in the documents. The posting list for a particular term consists of a list of the documents containing the term. When a search query is received, the matcher employs the search index to identify documents containing terms identified from the search query. The matching documents may then be considered by one or more downstream processes in the pipeline that further remove documents and ultimately return a set of ranked search results.