In the prior art, it has been well known that memories of computer systems can be used to store indices to records of databases. Many techniques are known to form a data structure in a memory. However, to create an index for an extremely large databases presents special problems.
In recent years, a unique distributed database has emerged in the form of the World-Wide-Web (Web). The database records of the Web are in the form of pages accessible via the Internet. Here, tens of millions of pages are accessible by anyone having a communications link to the Internet.
The pages are dispersed over millions of different computer systems all over the world. Users of the Internet constantly desire to locate specific pages containing information of interest. The pages can be expressed in any number of different character sets such as English, French, German, Spanish, Cyrillic, Kanakata, and Mandarin. In addition, the pages can include specialized components, such as embedded "forms," executable programs, JAVA applets, and hypertext.
Moreover, the pages can be constructed using various formatting conventions, for example, ASCII text, Postscript files, html files, and Acrobat files. The pages can include links to multimedia information content other than text, such as audio, graphics, and moving pictures. As a complexity, the Web can be characterized as an unpredictable random update, insert, and delete database with a constantly changing morphology.
It is a problem to structure an index so that it can be concurrently searched by tens of thousands of users in a timely fashion. In a commercial environment, users may be charged for connect time. Therefore, it is important that the searches be performed in a matter of seconds. As an additional problem, new records appear by the thousands, and incrementally updating an index is difficult, particularly if the index needs to be continuously accessible for searching.
Most conventional indices are commonly arranged in memories as distinct data structures having different formats. The memory can store literal information in a first format. There usually is another format for numeric information. Attributes about the information may be stored according to a third format. Each format may require a separate application interface and access procedure.
It is also a problem to minimize the amount of memory required to store the index. If compression techniques are used, then it becomes a problem to minimize the decoding of the compressed data structures.
Therefore, it is desired to design a memory storing a single unified data structure which indexes a large database.