A variety of mechanisms exist for searching vast numbers of documents, such as those available on the World Wide Web or large enterprise systems. A search engine or search engine program is a widely used mechanism for allowing users to search for information in vast numbers of documents. Typically, a search engine provides a user interface that includes a query field. In response to a query, for example, one or more keywords describing desired information, the user enters into the query field, the search engine attempts to locate, rank, sort and then return for display search results. The search results can be a list of ranked documents that includes for each document a link to the document and an excerpt of text meant to summarize the document.
In order to locate, rank, sort and return results in response to a user's query the search engine typically has previously indexed the documents and the items, such as, words, concepts, and images, contained in the documents so that these items can be matched to a user's query. Typically an index is created having an entry for each document and each entry containing the items appearing in the document. This index, sometimes referred to as a forward index, does not provide an easily searchable index for items. Therefore, an inverted index is usually created based on the forward index. An inverted index is indexed by items and for each item contains the documents that the item appears in.
Search engines typically take a user's query and parse it into words and then match the words with the words contained in an inverted index. Some search engines convert the words into concepts and match the concepts to previously determined concepts contained in an inverted index. The inverted index provides the search engine with the documents that the words or concepts appear in. The search engine then can further process these documents to rank them and decide whether to return them in a search result list to the user.
With the growing information on the world wide web and in enterprise network systems, inverted indices are becoming extremely large. The large size of inverted indices takes up a great deal of memory space. Therefore, there is a need for methods and systems for compressing an inverted index that overcomes the drawbacks of inverted indices used in the conventional search engines as described above.