Search engines are often used to locate information of interest in a network, such as the entire Internet, or a more focused search of an enterprise intranet. In response to a user's query, a typical search engine provides a rank-ordered list that includes brief descriptions of the uncovered content, as well as text links to the associated network pages. The rank ordering of the list is typically based on a match between words appearing in the query and words appearing in the content. Typical limitations in present search methodology often cause irrelevant content to be returned in response to a query. In particular, the wealth of available content can impair search engine efficacy since it is difficult to separate irrelevant content from relevant content.
A typical engine selects pages, in part, based on the number of appearances of keywords found in search pages. A page can be assigned a relevance corresponding to the number of incidences of a search term on the page, normalized to the length of the page. Some engines seek to improve search results by giving greater significance to Web pages that are linked by a greater number of other pages, taking the number of links as an indicator of significance.
Most search engines follow the same basic procedure for processing information in a network-based collection of pages. The engine uses crawling and parsing techniques to form an index of terms found in the pages of the network. The index includes data that is used by the search system to process queries and identify relevant pages. After the index is built, queries may be submitted to the search engine. A query represents the user's information request, and is expressed using a query language and syntax defined by the search engine. The search engine processes the query using the index data for the network, and returns a hit-list of objects that the search engine identifies as topically relevant. The user may then select relevant objects from the hit-list for viewing and processing. A user of the engine may also use a page from the hit-list as a starting point for further navigation through the network.