In the prior art, it has been well known that computer systems can be used to manage indices to records of databases. Many techniques are known to parse, index and search databases. However, managing extremely large databases presents special problems.
In recent years, a unique distributed database has emerged in the form of the World-Wide-Web (Web). The database records of the Web are in the form of pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the Internet. The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest.
A full text index for Web pages may have to track hundreds of millions of indexable items. Some of the commonly occurring items, such as the word "the" may appear at hundreds of different location in the Web pages.
Therefore, it is desired to compress the index entries to minimize the amount of storage, and the time required to perform index searches.