The present disclosure relates to data compression, and in a particular aspect, to a computer-implemented method for data compression and a corresponding computer system for executing the data compression method.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
In computer science, an inverted index is an index data structure that stores a mapping from contents (such as words or numbers) in a file, to the locations of the contents in the file. The file may be a database file, a document, or a set of documents. An inverted index provides for fast search of text in a file with a cost for increased processing of the file at the time the file is added to a database. Inverted indices are relatively widely used data structures in document retrieval systems for large scale search, such as for searches performed by search engines.
There are two main types of inverted indexes. A first type of inverted index is a record level inverted index, which contains a list of references to documents for each word. A second type of inverted index is a word level inverted index (sometimes referred to simply as an “inverted list”) and contains the position of each word within a document. The latter form provides additional functionality, such as phrase searching.
Inverted indexes can be compressed with compressions variously focused on relatively high compression, high compression speeds, high decompression speeds, etc. These various compression focuses often have tradeoffs. For example, a relatively high compression may provide for relatively low decompression speeds, and relatively high decompression speeds may be associated with relatively low compression.