A “search engine” is an information retrieval system designed to help locate data stored on a computer system or a network of computer systems such as on the World Wide Web. The search engine allows a user to request content meeting specific criteria and generates a list of items that match those criteria. The list is often sorted with respect to some measure of relevance of the search results.
As illustrated in FIG. 1, search engines 130 use search “indexes” 120 to operate quickly and efficiently. In operation, index generation logic 110 continually updates the index 120 using information gathered from Web servers 100-102 (or other types of servers). One well known form of index generation logic 110 is a “Web crawler” (also known as a “Web spider” or “Web robot”), a program or automated script that browses the World Wide Web in a methodical, automated manner and extracts text and metadata from Web pages to generate the index 120.
An “inverted index” is a specific form of index 120 used by many popular search engines today such as Yahoo® and Google®. As illustrated in FIG. 2 an inverted index is built from “tokens” 200 which represent text strings and other forms of information (e.g., XML tags, multimedia content) extracted from Web pages. Each token entry within the inverted index includes a listing of Web pages in which the token appears. In FIG. 2, for example, Web pages 1, 4 and 6 include the token “Hawaii;” Web pages 1, 11 and 14 include the token “vacation;” and Web pages 22, 29, 32 and 40 include the token “MP3.” The Web page entries may be ordered under each token based on the relevance of the Web pages (e.g., from most to least relevant).
The inverted index dramatically improves the speed with which the search engine 130 performs searches. For example, rather than searching each individual Web page for a specified text string or group of strings submitted by clients 140, 141, the search engine 130 simply identifies tokens which are relevant to the search and provides the Web pages associated with those tokens.
Given the continually growing number of computers and portable data processing devices connected to the Internet, current search engines require a significant amount of computing power. Accordingly, what is needed is a more efficient strategy for performing searches using an index.