The various embodiments of the invention relate generally to maintaining indexes in a database management system. Modern database management systems, in particular relational databases, use indexes for a faster access to large database tables. The indexes are typically stored separately from the table storing the content and need to be maintained for table modification operations such as row inserts, deletes and those updates that change the value of the columns that make up the index key.
The maintenance of an index is typically performed automatically and synchronously with a table change by the database management system (DBMS). Index maintenance in relational database systems is a resource intensive task because each index that exists on a given table has to be adjusted for each and every insert, update, and delete (IUD) operation. Such maintenance is also needed when the index is rarely, or even not at all, used for any queries or other database operations. Performance overhead may occur but is not desired.
All modern database management systems implement some variation of B-trees as index structures, in particular B*-trees or B+-trees. These structures associate each leaf with index keys and row identifier (RID), which may be comprised of a few bytes containing the page number on which the row may be placed and a position of the row on the page. The tree structure of indexes grows and shrinks when values are inserted or removed.
During insertion, a leaf page into which a value is to be inserted may be searched. This may start at the root page and may follow the path of the sort order down to the leaf page into which the value belongs. If the leaf page is already filled, the new value cannot be inserted. In such a case, a new page may be allocated to the index, a part of the overfull index page may be moved to the new page, and additionally the parent node may be adjusted by adding a new guiding value to it to refer to the new page. It may happen that the parent node is already full and would need to be split as was just described. This splitting may propagate through the whole tree up to the root node. Aside from the actual operation to search for the leaf page and inserting the new entry there, page splitting may incur an even higher overhead. Here, the terms node and page may be used synonymously because the implementations of B-trees typically map a B-tree node to a database page.
Removing a value from an index also requires a search for the respective index entry, again starting from the root node. If the number of index entries on the page falls below a certain threshold, the page may be merged/melted with its sibling pages. Such a merge results in the removal of a guiding value in the parent node, which implies that the merging may propagate up to the root node as well.
Several techniques are used to reduce the costs on index maintenance as it is roughly outlined here. For example, database systems may just mark index entries or index data structure entries as being deleted without removing them and performing any merge operations. The underlying assumption is that a new entry will soon occupy the space of the index entry.
U.S. Patent Application Publication No. 2006/0074977 discloses techniques by which updates may be incorporated in database indexes without causing deadlocks of user transactions. In one embodiment, referred to as immediate-incorporate, updates are incorporated in the index at the time of occurrence of a data manipulation language command execution. In a particular embodiment, R-tree updates are incorporated as part of a system transaction. The system transactions are part of update changes to the index but do not make them visible to other transactions. At commit time, the changes are made visible to other transactions.
In another document, U.S. Pat. No. 5,806,058, it is disclosed that associated records for each index key values, included in each index are counted, and when the number of associated data records has reached a predetermined number, the database is directly accessed without passing through an index record having the index key value. As one concrete method thereof, the index key value is deleted from the index portion of the database or the index file.
Thus, there may be a need for an improved architecture for maintaining indexes in a database management system such that performance requirements are easily met while managing large data volumes. This may be equivalent to a need for reducing index maintenance overhead during a transaction that performs an IUD operation while still maintaining consistency of indexes.