1. Technical Field
The present invention relates to an improved database management system. In particular, the present invention relates to selecting clustering indexes in a database management system. Specifically, the present invention relates to automatic recommendation and selection of clustering indexes between multidimensional and record identifier (RID) based clustering indexes.
2. Description of Related Art
Data clustering is a storage methodology in which like or similar data records are grouped together. Data in a database may be clustered by use of a clustering index or multidimensional clustering. Multidimensional clustering (MDC) allows data to be ordered simultaneously along different dimensions.
The choice of which type of clustering should be used in a particular database design is a complex task. Various types of analysis need to be performed to determine which type of clustering design should be used in a database system. A database design specifies the physical and logical database data layout and definition of its schema including tables, indexes, etc.
One type of analysis performed involves using database workload queries and access patterns. In particular, a query optimizer is used to analyze the workload queries and consider query processing techniques, such as block index scan, block index only scan, block index Anding, block index ORing, and generates a query access plan. More information regarding this analysis may be obtained from “Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2”, A Very Large Data Base Endowment (VLDB) 2003 proceeding by Bhattacharjee, B., Paddmanabhan, S., Malkemus, T., Lai, T., Cranston, L., and Huras, M., which is incorporated herein by reference.
Another type of analysis is the selection of clustering indexes that help to reduce query execution time and improve database system performance. Several techniques have been used for index selection. Automated index wizards have been used to simplify the index creation process. Some index wizards recommend clustered and non-clustered indexes. Examples of these types of systems include, for example, IBM Design Advisor, which is a product available from International Business Machines Corporation, and SQL Server Index Wizard, which is a product available from Microsoft Corporation. Some index wizards only recommend non-clustered indexes. One example of this type of approach is, for example, Oracle SQL Access Advisor, a product available from Oracle Corporation.
A clustering index is an indexing system in which values are maintained in sequence on the data pages. When a clustering index is present, the database manager maintains data rows in a database table in the same order as the column(s) in the index. The database manager may process queries that involve range predicates, such as ‘BETWEEN’, ‘<’, and ‘>’, ‘GROUP BY’, ‘DISTINCT’, or ‘ORDER BY’ (on the ordered column) and equality predicates, such as ‘WHERE’ clause, more efficiently by reducing the amount of I/O, since all similar values are grouped together for faster retrieval. Thus, by leveraging clustering indexes, database workloads that involve these kinds of queries may benefit immensely.
The process of deciding whether to create a clustering index and on which table columns currently involves investigating the data access patterns and the types of queries that are involved.
However, this type of investigation is a tedious and time consuming process for a database administrator to administer. The approach for this type of process is often a trial and error process because the database administrator has to constantly monitor the data access patterns and determine the columns that are most frequently referenced by their running queries. This type of approach wastes valuable human and system process time.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions to automatically create, recommend, and select clustering indexes.