Conventional computer networking environments support the exchange of information and data between many interconnected computer systems using a variety of mechanisms. In an example computer-networking environment such as the Internet, one or more client computer systems can operate client software applications that transmit data access requests using one or more data communications protocols over the computer network to server computer systems for receipt by server software application(s) executing on those servers. The server software application(s) receive and process the client data access requests and can prepare and transmit one or more server responses back to the client computer systems for receipt by the client software applications. In this manner, client/server software applications can effectively exchange data over a network using agreed-upon data formats.
One example of a conventional information exchange system that operates between computer systems over a computer network such as the Internet is provided by a set of applications and data communications protocols collectively referred to as the World Wide Web. In a typical conventional implementation of the World Wide Web, client computer systems operate a client software application referred to as a web browser. A typical web browser operates to provide hypertext transport protocol (HTTP) requests for documents, referred to as “web pages,” over the computer network to web server computer systems. A web server software application operating in the web server computer system can receive and process an HTTP web page request and can return or “serve” a corresponding web page document or file specified (i.e., requested) in the client request back to the requesting client computer system over the computer network for receipt by the client's web browser. The web page is typically formatted in a markup language such as the hypertext markup language (HTML) or the extensible markup language (XML).
The World Wide Web contains billions of static web pages, and it is growing at a very fast speed, with many hundreds or thousands of web pages being created and placed for access on the Internet each day. To be able to efficiently access web pages of interest to people using web browsers, software developers have created web sites that operate as web search engines or portals.
Today's web search engines have their roots in a research field called information retrieval, a computing topic tracing back nearly 50 years. In the mid-1960's the most advanced information technologies of the day could handle only routine or clerical tasks. Starting in the 1990's, web search engines have completely changed how people gather information. No longer must we run to a library to look up something; rather we can pull up relevant documents with just a few clicks on a keyboard by entering a search query into a web search engine. Most web search engines provide an interface to a group of items that enables users to provide a search query and have the Web search engine find information resources related to the words in the search query.
A conventional web search engine includes one or more web crawler processes that are constantly identifying newly discovered web pages. This process is frequently done by following hyperlinks from existing web pages to the newly discovered web pages. Upon discovery of a new web page, the search engine employs an indexer to process and index the content such as the text of this web page within a searchable database by producing an inverted index. Generally, an inverted index is defined as an index into a set of texts of the words in the texts. A searcher then processes user search requests against the inverted index.
When a user operates his or her browser to visit the search engine web site, the search engine web page allows a user to enter one or more textual search keywords that represent content that the user is interested in searching for within the indexed content of web pages within the search engine database. The search engine uses the searcher to match the user supplied keywords to the inverted indexed content of web pages in its database and returns a web page to the user's browser listing the identity (typically a hyperlink to the page) of web pages within the world wide web that contain the user supplied keywords. Popular conventional web search engines in use today include Google, Yahoo!, Lycos, Bing and many others.