Conventionally, a search engine utilizes indices to locate results that match a search term received by the search engine. The search indices are specialized databases that store, among other things, words included in a corpus of documents and location information associated with the documents. As new words are introduced to express thoughts and ideas, the search indices continue to expand. For instance, some search indices store millions of words. Thus, conventional search indices are very large databases.
To reduce the size of the search indices, compression algorithms are used to efficiently store the data in the search indices. A compressed index is able to store more words, which may potentially improve the results located by the search engine. Conventional compression algorithms for search indices have focused on reducing the storage space required for the location information, i.e., positions of the words in the documents, document identifiers, or both. The location information is sometimes referred to as a posting list. The posting lists typically require a large part of the available storage space in search indices. Accordingly, these conventional compression techniques compress the posting lists to increase the available space for additional words and posting lists.
Although compressed posting lists do offer significant savings in storage space, additional storage space may be saved if the words in the indices are also compressed.