The Internet and World Wide Web continue to expand rapidly with respect to both volume of information and number of users. The Internet is a collection of interconnected computer networks. The World Wide Web, or simply the web, is a service that connects numerous Internet accessible sites via hyperlinks and uniform resource locators (URLs). As a whole, the web provides a global space for accumulation, exchange and dissemination of all types of information. For instance, information can be provided by way of online newspapers, magazines, advertisements, books, pictures, audio, video and the like. The increase in usage is largely driven by the increase in available information pertinent to user needs. By way of example, the web and Internet were initially utilized solely by researchers to exchange information. At present, people of all occupations and lifestyles utilize the web to manage their bank accounts, complete their taxes, view product information, sell and purchase products, download music, take classes, research topics, and find directions, among other things. Further, usage will continue to flourish as additional relevant information becomes available over the web.
To maximize likelihood of locating relevant information amongst an abundance of data, search engines are often employed over the web or a subset of pages thereof. In some instances, a user is aware of the name of a site, server or URL to the site that the user desires to access. In such situations, the user can access the site, by simply entering the URL in an address bar of a browser and connecting to the site. However, in most instances, the user does not know the URL or site name that includes the desired information. To locate a site or corresponding URL of interest, users often employ a search engine to facilitate locating and accessing sites based on keywords and operators.
A web search engine, or simply a search engine, is a tool that facilitates web navigation based on entry of a search query comprising one or more keywords. Upon receipt of a query, the search engine retrieves a list of websites, typically ranked based on relevance to the query. To enable this functionality, the search engine must generate and maintain a supporting infrastructure.
Search engine agents, often referred to as spiders or crawlers, navigate websites in a methodical manner and retrieve information about sites visited. For example, a crawler can make a copy of all or a portion of websites and related information. The search engine subsequently analyzes the content captured by one or more crawlers to determine how a page will be indexed. Indexing transforms website data into a form, the index, which can be employed at search time to facilitate identification of content. Some engines will index all words on a website while others may in only index terms associated with particular tags (e.g., title, header or meta-tag). Crawlers must also periodically revisit web pages to detect and capture changes thereto since the last indexing.
Upon entry of one or more keywords as a search query, the search engine retrieves information that matches the query from the index, ranks the sites that match the query, generates a snippet of text associated with matching sites and displays the results to a user. Furthermore, advertisements relating to the search terms can also be displayed together with the results. The user can thereafter scroll through a plurality of returned sites, ads and the like in an attempt to identify information of interest. However, this can be an extremely time-consuming and frustrating process as search engines can return a substantial number of sites. More often then not, the user is forced to narrow the search iteratively by altering and/or adding keywords and operators to obtain the identity of websites including relevant information.