A vital component of many computer applications and systems is the management of text databases, including the storage, searching, and retrieval of text documents. Conventionally, at least a part of an index data structure of a text database is held in main memory while the associated texts are held on disks. Storing and retrieving data from disks limits the speed of text accesses. While RAM (random access memory) continues to increase in capacity, it remains many orders of magnitude more expensive than disks. A number of techniques and approaches have been investigated to efficiently utilize RAM in an effort to migrate towards in-memory databases.
In some instances, various data compression techniques have been explored and developed in an effort to efficiently store and search data stored in RAM.
Data compression techniques may include the use of numerical IDs for documents and for terms in documents such that the IDs are mapped with their documents or terms respectively using dictionaries, the use of lists of numbers to represent addresses of terms in documents such that each number gives the number of term positions to be counted beyond the previous position (initially 0) before reaching the next position of a given term (called difference or delta coding of term positions), Golomb coding using, for example, an optimizable parameter to represent integers compactly.
The foregoing data compression and mapping techniques may provide improvements in some instances for certain types of text database searching. However, methods and systems are desired to facilitate efficient in-memory document searching, retrieval, and reporting.