A database index is a data structure that is associated with a table, a group of tables, or database files. The database index allows faster retrieval of database objects. For example, an index on a column of a table allows for retrieval of rows from the table without having to scan each row and compare the values in the column.
One type of database index is an inverted index. An inverted index maps content to their locations in a database file or document. For example, an inverted index may map a keyword to a set of one or more documents that contain the keyword.
To increase query efficiency, location values may be packed into as few rows as possible. That is, a single row may store a plurality of location values for a keyword, rather than storing each location value in a separate row. For example, if a plurality of documents each contain a particular keyword, a row in the inverted index table may store a list indicating the document IDs for the plurality of documents.
When a database object or file is changed, added, or removed by a database transaction, the inverted index needs to be updated to reflect the changes. For example, if a document is added that contains a keyword, then the corresponding row in the inverted index table is updated to include the document ID. However, as location values for a keyword are packed into the same row(s), the row for the keyword is locked while it is updated. If a second transaction affects the same keyword, the second transaction must wait for the lock to be released before it can update the row. This slows down the processing of database transactions.
A separate row may be created in the index table for each keyword that is updated by a database transaction. Since each database transaction has its own rows, a database transaction does not have to wait for any other database transactions to complete before writing to its rows. However, as the number of database transactions increase, the number of different rows for a keyword increases. The index table becomes fragmented, and using the index table for queries becomes less efficient.
A staging table may be used to store rows from database transactions. The staging table collects the fragmented rows and is periodically merged into the main index table. Rows are updated or added to the index table based on each row of the staging table. However, because rows in the index table cannot be modified at the same time, merging the index table with the staging table is processed serially. If the work for merging the staging table is divided between multiple slave processes and performed in parallel, there is a risk that two slave processes will attempt to modify the same row at the same time.
One solution is to write a custom program that can manipulate kernel threads in order to avoid contention. However, this is a low-level solution that is not flexible across different database systems. Therefore, there is a desire for an efficient method for merging index tables.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.