Conventional computer networking environments support the exchange of information and data between many interconnected computer systems using a variety of mechanisms. In an example computer-networking environment such as the Internet, one or more client computer systems can operate client software applications that transmit data access requests using one or more data communications protocols over the computer network to server computer systems for receipt by server software application(s) executing on those servers. The server software application(s) receive and process the client data access requests and can prepare and transmit one or more server responses back to the client computer systems for receipt by the client software applications. In this manner, client/server software applications can effectively exchange data over a network using agreed-upon data formats.
One example of a conventional information exchange system that operates between computer systems over a computer network such as the Internet is provided by a set of applications and data communications protocols collectively referred to as the World Wide Web. In a typical conventional implementation of the World Wide Web, client computer systems operate a client software application referred to as a web browser. A typical web browser operates to provide hypertext transport protocol (HTTP) requests for documents, referred to as “web pages,” over the computer network to web server computer systems. A web server software application operating in the web server computer system can receive and process an HTTP web page request and can return or “serve” a corresponding web page document or file specified (i.e., requested) in the client request back to the requesting client computer system over the computer network for receipt by the client's web browser. The web page is typically formatted in a markup language such as the hypertext markup language (HTML) or the extensible markup language (XML).
The World Wide Web contains billions of static web pages, and it is growing at a very fast speed, with many hundreds or thousands of web pages being created and placed for access on the Internet each day. To be able to efficiently access web pages of interest to people using web browsers, software developers have created web sites that operate as search engines or portals.
Conventional search engines operate algorithmically and are most often implemented as Web search engines to locate and rank information on the public Web. Other kinds of search engine are enterprise search engines, which search on intranets, personal search engines, and mobile search engines. Some conventional search engines also mine data available in newsgroups, databases, or open directories.
Most conventional search engines operate according to the following phases: (1) web crawling, (2) indexing, and (3) searching. Indexing (or “Internet indexing”) includes back-of-book-style indexes to individual websites or an intranet, and the creation of keyword metadata to provide a more useful vocabulary for Internet or onsite search engines. With the increase in the number of periodicals that have articles online, web indexing is also becoming important for periodical websites. Metadata web indexing involves assigning keywords or phrases to web pages or web sites within a meta-tag field, so that the web page or web site can be retrieved with a search engine that is customized to search the keywords field. This may or may not involve using keywords restricted to a controlled vocabulary list.
A typical conventional search engine includes one or more web crawler processes that are constantly identifying newly discovered web pages. This process is frequently done by following hyperlinks from existing web pages to the newly discovered web pages. Upon discovery of a new web page, the search engine employs an indexer to process and index the content such as the text of this web page within a searchable database by producing an inverted index. Generally, an inverted index is defined as an index of the words in the texts. A searcher then processes user search requests against the inverted index. When a user operates his or her browser to visit the search engine web site, the search engine web page allows a user to enter one or more textual search keywords that represent content that the user is interested in searching for within the indexed content of web pages within the search engine database. The search engine uses the searcher to match the user supplied keywords to the inverted indexed content of web pages in its database and returns a web page to the user's browser listing the identity (typically a hyperlink to the page) of web pages within the world wide web that contain the user supplied keywords. Popular conventional web search engines in use today include Google (accessible on the Internet at http://www.google.com/), Yahoo! (http://www.yahoo.com/) and many others.
When a user comes to the search engine and makes a query, typically by giving key words, the engine looks up the index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. Most search engines support the use of the boolean terms AND, OR and NOT to further specify the search query. An advanced feature is proximity search, which allows users to define the distance between keywords.
The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of webpages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the “best” results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another.
Google is a registered trademark of Google, Inc. of Mountain View, Calif., USA. Yahoo! is a registered trademark of Yahoo!, Inc. of Sunnyvale, Calif., USA.