In the prior art, it has been well known that computer systems can be used to manage indices to records of databases. Many techniques are known to index databases. However, managing extremely large databases presents special problems.
In recent years, a unique distributed database has emerged in the form of the World-Wide-Web (Web). The database records of the Web are in the form of pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the Internet.
The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest. The pages can be constructed using various formatting conventions, for example, ASCE.PI. text, Postscript files, html files, and Acrobat files. The pages can include links to multimedia information content other than text, such as audio, graphics, and moving pictures.
As a complexity, the Web can be characterized as an unpredictable random update, insert, and delete database with constantly changing content. This means that new pages can "spontaneously" appear anywhere as new Web sites are created, and previously indexed pages can disappear as defunct sites cease operation.
It is a problem to allow the searching of the index by millions of users each day while the index is maintained to delete old entries reflecting defunct sites, and to add entries for newly created pages.
More particularly, it is a problem to add index entries for new pages, which may appear by the thousands, in a timely manner without using excessive disk storage to store intermediate data structure as the index is being reorganized.
Conventional indices typically provide ad hoc data structures to permit updating, e.g., the addition and deletion of entries, concurrent with searching. For example, a journal or "stop-press" file may store new entries. The design and support of different data structures, and processes which operate thereon, degrades performance. Periodically, when the typically inefficient journal file becomes too large, it must be merged with the primary index, perhaps excluding the searching of the index during the update process.
Therefore, it is desired to create an index structure which can store entries for a large databases. The index structure should allow for concurrent maintenance and search operations.