With the widespread availability of information over networks, such as the Internet, search engines have come into widespread use. Search engines receive user queries and find content matching the query to return to the user. A common approach to implementing a search engine is through a page index. The page index relates terms that may appear in a search query to units of content on the network, frequently called web pages.
Various approaches exist for constructing and applying the page index. Constructing the index frequently entails “crawling” a network, such as the Internet, containing the body of data that will eventually be searched. Crawling entails following links from one web page to the next and analyzing each page. As part of the analysis, terms characterizing the web page may be identified and added to the page index in a way that associates that web page with those terms. These terms may be terms actually used within the content displayed by the web page or may be tags added specifically to influence how the crawler indexes the web page. Additionally, information, such as the number of links to a web page, may be captured and used to prioritize the web pages.
The page index is applied as part of a search stack. When a user submits a search query, a search engine matches terms in the query to web pages based on the search index. The search stack may include components that modify the search query before the index is consulted, such as to correct misspelling of search terms or attach terms that can be inferred based on a user profile. The search stack may also include components to filter search results. For example, web pages identified using the page index may be filtered, such as by ranking the web pages based on a metric indicating relevance to a query.
In scenarios in which search queries seeking information that may not appear directly on a web page are anticipated, information may be pre-computed. An entry may be made in the page index, pointing to the pre-computed information rather than a web page.