In the prior art, it has been well known that computer systems can be used to manage indices to records of databases. Many techniques are known to parse, index and search databases. However, managing extremely large databases presents special problems.
In recent years, a unique distributed database has emerged in the form of the World-Wide-Web (Web). The database records of the Web are in the form of pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the
Internet
The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest. The information of the records or pages can be expressed in a number of different encoding modalities, such as hypertext, Java, video, Postscript and so forth.
Creating a "full text" index for the Web can result in an index having many millions of entries. For example, common English words such as "the," and "an" may occur at hundreds of millions of locations. The number of different unique "words" indexed can also number in the hundreds of millions.
It is also a problem to conduct a search of such a large index in a timely fashion, particularly if the index is accessed by tens of millions of users each day. In many cases it is necessary to store index structures on slow-to-access disk storage devices.
It is desired to provide a search mechanism which can operate on indices for large databases. In addition, it is desired that the index can be searched quickly using a minimal number of data processing resources.