Most people searching the World Wide Web (WWW) use search engines that are designed to help locate information stored on Web sites. Most, if not all, search engines search the WWW for one or more words, phrases, and/or combinations of words or phrases (hereinafter “term” or “terms”), keep an index of the located terms and where they were located, and allow users to look for terms in the index.
To index the billions of Web pages that exist on the WWW, a search engine's “web crawler” locates and downloads Web pages, graphics, audio, video, or other data files (hereinafter “documents”). The search engine's indexing modules or engines then process the downloaded documents, creating an index of terms found in those documents, such as terms on a Web page, the name of the located document itself, or the path where the document is located. In some embodiments, the indexing modules may ignore insignificant terms and may include in the index information about where in the document each indexed term is located.
The index created by the search engine is especially used to identify documents that contain search terms. To search for documents on a particular subject, a user enters or otherwise specifies a search query containing a term and submits the search query to the search engine. The search engine then searches its index to identify documents that contain the term specified by the search query. If the index contains a very large number of documents that satisfy the search query, for example, more than a thousand documents, the search engine may utilize various mechanisms to truncate the search or to otherwise limit the number of documents identified by the search. In any case, each document located by the search engine in response to the search result (excluding those documents which satisfy the query but are not included in the search results) is given a rank or score, otherwise know as the “query rank,” based on the perceived relevance of the document query. A search result listing the located documents, ordered by query rank, is then presented to the user. In other words, the documents with the best (e.g., highest) query ranks are presented first in the search result listing, followed by documents having lower query ranks. In addition, it should be understood that the search result listing generally includes a listing of documents that satisfy the search query, not the documents themselves. The search query listing will typically include for each listed document a title or other identifying information extracted from the document, a link (sometimes called a hyperlink or anchor tag) to the document, and a “snippet” of text to help the user decide whether to view the document. A snippet is a portion of text from a document that contains the search term. For example, “ . . . Computer-Flagship magazine of the IEEE Computer Society, where computing practitioners, . . . ” contains the search term “IEEE.”
Furthermore, as the WWW expands and the universe of documents grows, so does the size of the index. A large index generally takes longer to search than a smaller index. Accordingly, a system and method that increases search efficiency, while returning the most currently indexed search results within an acceptable time period would be highly desirable.