With the advancements of computer and Internet technology, search engines have become an important tool for Web client (e.g., computer) users to acquire information. A traditional search engine, such as Inktomi, Excite, Lycos, Infoseek and FAST, normally comprises a router for transmitting and receiving message packets between the Internet and a Web crawler server, an index server, and a Web server. The search engine uses the Web crawler (also known as a Web spider or Web robot) to access Web pages resources located by URLs (Uniform Resource Locators) on a regular basis, extracts textual information and other related attributes of the Web pages, and stores such information so that the index server may process the retrieved data. The index server parses the documents and creates a document index by applying an indexing algorithm, which normally involves creating a priority-ordered index based on keywords or other attributes of each document.
The Web server comprises a search application for processing search requests applied to the search engine. Generally, based on a keyword of interest provided by a user to the search engine, the search application makes use of the indexer to retrieve a pre-built index database to provide a keyword results page to the user, so as to help the user find and visit new URLs.
Search engines use various algorithms to create indices. A modem search engine may create an index based on both document contents and linkage information (e.g., Google's PageRank). When trying to find a document that is most relevant to a user query, the search engine applies a search algorithm to the document index and returns a match result. Generally, a search engine uses the same set of algorithms to rank documents. Most importantly, these algorithms are designed and maintained by a search service provider (e.g., Google or Yahoo!) itself. With respect to Web content owners, all they can do is to provide Web pages and leave the search engine to determine how a document index is created according to the content of the Web pages.