In the prior art, it has been well known that computer systems can be used to index information which is stored as records of a database.
In recent years, a unique distributed database has emerged in the form of the World-Wide-Web (Web). The database records of the Web are in the form of pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the Internet. The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest.
Typically, the information of the pages is parsed into "words" and entries are created in an index for each unique word. Associated with the entries are pointers to all of the places where the unique word occurs in the database.
Searches of the index are typically conducted by users supplying a query. The query can be in the form of query terms and operators which relate the terms, for example, the logical AND and OR operators.
Many query grammars also allow users to specify terms of a query as a literal phrase. For example, the phrase can be "an octopus lives in the sea." This means, find all Web pages which contain the phrase exactly as specified. From time to time, a particular topic may be of interest to many Web users. This may mean that many different users specify the same phrase in their queries.
However, indices typically have different entries for each word of the phrase, so in many cases, the location of qualifying pages still means the searching of several different word entries, more specifically with the above example, six word entries. If the phrase includes commonly occurring words, this search can be time consuming.
It is desired to optimize the index so that phrases which are frequently used in queries can locate qualifying records in a minimal amount of time.