Modern relational database systems are typically vast data repositories that are measured in terabytes (TB) or even hundreds of terabytes. Individual tables of hundreds of gigabytes (GB) and more are no longer rare cases. Retrieving required information from such a table cannot be efficiently performed by a sequential scanning of the entire table even though increasing processor memory provides very large caches to keep physical disk access at a minimum. Indexing is the most common technique to address the challenge of fast information access and it has been used for a long time. An index is based on the values of data in one or more columns of a table accompanied with pointers to the data. Accessing data through an index enables fast retrieval in the case of a query with matching selection condition. As the queries typically come with different selection conditions, there is a need to define multiple indexes or indices on the same table. Indices or indexes may also be used for other purposes such as enforcing uniqueness.
The indexes are typically stored separately from the table storing the content and need to be maintained for table modification operations such as row inserts, deletes and those updates that change the value of the columns that make up the index key.
The maintenance of an index is typically performed automatically by the database management system (DBMS). For example, for each inserted row, the DBMS updates every index that exists for the respective table. Thus, an index update consists of adding an entry pointing to the inserted row in the related table. The entry is added into the index at the appropriate position that is determined by the index tree structure and the index key value. The DBMS performs the updates of the indexes synchronously with the corresponding table row operation: it is an atomic operation that ensures that the table modification is visible to queries to the table at the same time irrespective of accessing the table directly or via an index. One can say that such an index is current, i.e. in synchronization, or shortly in sync, with the associated table.
Only in exceptional cases, it is possible to delegate an index maintenance function to a special user, namely a database administrator. That is, if the user knows the exact time when a particular query that needs a specific index access is executed, he can initialize building the index just in time for the incoming query. This could be the case for, e.g., an end of the month payroll calculation. It also requires a careful planning of maintenance operations because building an index on large tables can take a very long time (hours and, in extreme cases, even days).
While having appropriate indexes is crucial for efficient query execution, index maintenance results in performance degradation of almost all types of table content modification operations like inserts, updates and deletes. Every additional index of a database table adds more to the elapsed time of these operations. The elapsed time increase is typically very large because index maintenance often involves bringing relevant index pages (blocks) from a disk, a hard disk or other external storage, which is proportionally very slow if compared to main memory. This significantly contributes to one of the most common performance challenges: fitting a given data modification workload in a required time window.
If an index is used by queries that run concurrently with data modification operations, then that index needs to be current, i.e., maintained synchronously with data modification operations. But, that is not the case for all indexes: some of them are needed only periodically, when the corresponding queries are in the system. As mentioned earlier, the only available option for this situation is not to let the DBMS to maintain these indices, but to create them manually before the corresponding query workload. However, this method may be static, manual-work intensive and, in some cases not usable. For example, the times when a query is executed may be unpredictable, or the periods between two consecutive query executions may be too short to allow building the required index.
The document US 2008/0154994A1 discloses a database management system with two index types: current and aged indexes. The indexes statuses are determined in the context of business processes, e.g., paid or unpaid invoices. The indexes statuses may also be determined based on a time since a last access to a specific table. More detailed status levels are possible, e.g., based on the time an index has been last accessed.
In another document, US 2008/0275840A1, a statistics update process for a relational database management system is described. A data dictionary maintains bookkeeping information, typically in the form of meta-data that are used by processes of the database management system. The metadata also contain statistics for generating execution plans for several tasks in the database system. E.g., a stale flag and a modified count may be maintained by the statistics. The stale flag indicates if the statistics for a respective column or index are out of date.
In a similar way, reference is made to stale statistics in document US 2006/0294058A1. The statistics may be updated if, e.g., 20% of the tables have been updated since the statistics were created. In this case the statistics may be called stale. In case of a query to a table with an old index, a new index will be created.
Thus, there may be a need for an improved architecture for maintaining indexes in a database management system such that performance requirements are easily met while managing large data volumes.