Searching the content of large collections of documents or other types of files typically involves indexing the documents for later retrieval using a program such as the Finder program which operates on Macintosh computers from Apple Computer, Inc. of Cupertino, Calif. Indexing the documents is usually accomplished through the generation of an inverted index. For example, an inverted index might contain a list of references to documents in which a particular word appears. The inverted index allows a user to search and retrieve documents quickly.
However, given the large numbers of words and documents in which the words can appear, an inverted index can be extremely large. The size of an index presents many challenges in processing and storing the index, such as using the index to perform a search and updating the index when the content of documents change or when new documents are created and old documents deleted. For example, an inverted index can be implemented as a table of all referenced words, and for each word, a list of all documents that contain the word. When a document that has already been indexed changes, the search system must either delete all of the old invalid references and add the new references, or have some mechanism for quickly recognizing and ignoring stale references in future scans and implement a feature to prune them out in the future when the database is bloated.
Moreover, searching an inverted index is typically limited to searching for the words that were used to generate the inverted index.