1. Field of the Invention
This invention relates to the retrieval of information from a database and, more particularly, to the indexing of information for retrieval from a database in a manner that compresses the index so as to consume less storage memory.
2. Discussion of the Related Art
The purpose of an information retrieval (IR) system is to search a database and return information (hereinafter, the term documents will be used to refer to returned information, though such information need not actually be documents in the word-processing sense, but rather may be any information, including web pages, numbers alphanumerics, etc., or pointers or handles or the like thereto) in response to a query.
Most high-precision IR systems in use today utilize a multi-pass strategy. Firstly, initial relevance scoring is performed using the original query, and a list of hits is returned, each with a relevance score. Secondly, a second scoring pass is made, using the information found in the high scoring documents.
Because document databases can be huge, it is desirable to represent the databases in a way that minimizes media space. Commonly, internal data in a database is represented by indexes. Note that the indexes for the two relevancy passes described above are usually different. The first relevancy pass usually uses what is known as an inverted index, meaning that a given term is associated with a list of documents containing the term. In the second index, a given document is associated with a list of terms appearing in it. The result is that a two pass system consumes roughly double the media space of a one-pass system. What is needed is a system that delivers the retrieval performance of the two-pass system without consuming as much media space.