Full-text indexing systems, such as search engine indexing algorithms and document retrieval systems, typically utilize an inverted index (or a “postings file”) as an index data structure that stores a mapping from content (e.g., words or numbers) to locations in a database file or document. Inverted indexes generally allow for fast search operations, but may result in greater processing when a document is added to the database. An inverted index is formed by developing a forward index that stores lists of words per document, which is then inverted to develop an inverted index, which lists the documents per word. This speeds the query process by eliminating the need to sequentially iterate through each document and each word using the forward index to verify a matching document. With the inverted index created, the query can be resolved by jumping to the word identifier through random access operations in the inverted index.
Applying updates to a full-text indexing environment is often a challenging task, since the inverted index is designed for fast query and not for updates. Blacklisting deleted entries is a common technique for blocking access to deleted entries, and is generally much more efficient than updating them directly. That is, when blacklisted, a deleted object is essentially virtually deleted as it will be initially found during query processing and filtered out before the query operation completes. Once the index is rebuilt (during normal merge operations) the physical delete can be reflected into the index. This process is required because the post-query filtering of blacklisted objects hurts query performance.
Such procedures are in contrast with B-tree indexes of typical relational databases that are designed to support low-latency direct updates. That efficient-to-update capability allows relational B-tree indexes to be used in online transactional processing (OLTP) applications among others that need low-latency updates. The downside of this index method, however, is that it cannot provide the higher value fuzzy search of inverted full text indexes.
Unlike low-latency transactional database systems, typical full text environments have extremely high latency in reflecting updates. Full text database systems collect all changes and apply them to a new version of the index while the older static version is being queried. This process means that updates take on the order of minutes to hours to be reflected and applied. As a result, these applications sacrifice the low-latency update for the rich quality of search used in information retrieval applications.
A blacklist bitmap is a bitmap that records at least one service that cannot share a resource with the service at the same time. The construction of a transactional query view includes not only determining the set of indexes to use but also getting an up-to-date version of the blacklist bitmaps that reflect the recent changes. The blacklist bitmap must be re-created after another transaction completes to reflect the current changes to the system. In many cases, it is possible to just recreate the blacklist bitmap from the persistent blacklist structures. However, in the event of massive numbers of updates at any particular time, this process can become prohibitively expensive as the list of blacklisted items are scanned and used to create the new blacklist bitmap.
The recreation of this blacklist bitmap can be shared among users of different transactions as long as their transactional views are essentially the same. However, read transactions start and end at different times and write transactions can happen between these complete. This leads to multiple concurrent read transactions having different blacklist bitmaps. The creation of these unique blacklist bitmap views can cause queries to take on the order of up to a minute, as hundreds of thousands of blacklisted entries are processed to create that view.
What is needed therefore, is an improved method of reducing the overhead of maintaining the transactional index view in the case of a large number of updates.