1. Field of the Invention
The present invention relates to the proper selection of access structures in a database system which are vital for achieving high performance. In particular, the present invention relates to a method and system for the selection of access structures in a manner which is fast, efficient, scalable and of high quality.
2. Description of the Related Art
In a typical relational database system users store, update, and retrieve database data by interacting with information retrieval applications. Information retrieval applications make use of database access structures when performing content-based searches on the database data in the database. Database access structures are auxiliary structures maintained by the database system which provide organization and reference to the data in a database system to permit a user to find particular items of data in the database, determine relationships among the data in the database, and also permit relationships between the data in a database system and data not included in the database system to be determined. The primary reason that access structures are widely deployed is they improve the performance of the user's queries. Examples of database access structures include indexes and materialized views. Among indexes, there are a number of different types supported by database systems such as R-trees, quadtrees, B-trees, bitmap and functional indexes. Materialized views are a more recent addition to database systems. They are database objects which contain the results of a query, which is called the defining query of the materialized view. The defining query, which may itself contain references to other database objects such as tables, views and even other materialized views.
Thus there is an abundance of options available among database access structures which can be used to provide better performance for end-user queries. The proper selection of access structures from among all these choices is vital for optimal performance of a database system.
Generally, modern database systems provide tools for monitoring and improving the performance of the system. The Access Advisor is one such tool, which can be used to assist in the selection of access structures. The target audience for the Access Advisor is the database administrator (DBA), a person whose job it is to keep the database running smoothly and troubleshoot performance problems that may arise from time to time.
The primary input to the Access Advisor is a set of queries, which the DBA wishes to tune for performance. These queries represent a workload to the database system, and may come from a variety of sources, such as real queries run by users or from a synthetic benchmark or queries which simulate the actual queries submitted to the database system. In addition to the workload, the DBA may wish to place constraints on the nature of the access structures which may be chosen by the tool. Constraints may include, but are not limited to, the storage-space available for access structures and the maximal number of indexes on a table. The output of the Access Advisor is a set of recommendations. A recommendation includes a set of actions. An action represents an actionable piece of advice provided by the Access Advisor to the DBA to achieve desirable performance subject to the constraints placed on the solution. An action can include the creation of a new access structure, the retention of an existing access structure, or the deletion (“dropping”) of an existing access structure. The output of the Access Advisor also includes a cost-based analysis for the generated recommendations. This analysis demonstrates the benefits and improvements of adopting the recommendations for the specified workload. Each recommendation may benefit one or more queries in the workload. The greater the number of queries that can benefit from a recommendation, the greater its usefulness. Actions may also be shared among multiple recommendations, thereby enhancing their usefulness.
The usefulness of recommendations and actions for a given query in the workload is quantitatively analyzed using a metric known as the query-cost. The component in the database system responsible for computing the cost is the Optimizer, hence these numbers are also referred to as optimizer-costs. The two costs of a query which are of significance to the Access Advisor are the pre-cost and post-cost. A query's pre-cost is the cost of computing the result of the query with an initial configuration, which is defined as the state of the database system at the time the Access Advisor is run. A query's post-cost is the cost of computing the result of the query using a recommendation. The post-cost is dependent on the recommendation chosen for a query. During its analysis the Access Advisor will consider several alternative (or candidate) recommendations for each query, each having a unique post-cost for the query. The difference between the pre-cost and the post-cost of a recommendation for a given query is the performance improvement for the query using that recommendation, It may appear that recommendations which offer the highest performance improvement are always to be preferred, and the Access Advisor has merely to choose those with this characteristic. However, it is not so simple, because in addition to the performance improvement enabled by the recommendations, the Access Advisor must also consider the cost of deploying the access structures comprising the recommendations. The cost of access structures is essentially the cost of creating them and the cost of maintaining them when the underlying data in the database system changes. Thus the problem of access structure selection essentially entails trading off the query performance improvement which can arise from new access structures with the creation and maintenance costs of the structures themselves. Since recommendations may overlap and the benefits of a recommendation is only realized when all the actions of a recommendation are performed, recommendations which are widely useful across queries and which share access structures are more likely to be useful, and the task of the Access Advisor becomes one of balancing the competing interests of improving the performance of individual queries versus the costs of access structures introduced specifically for each query.
The selection of an optimal set of recommendations from all the candidate recommendations for a given workload and set of constraints is a daunting one, and essentially intractable, especially, when a large number of queries are present in the workload. The complexity of the problem is further increased when constraints placed on their solution is considered. Two constraints of practical significance are selecting access-structures whose combined storage requirements are less than a value specified by the DBA. This value is denoted by STORAGE_LIMIT. Secondly the DBA may specify a limit on the maximum number of indexes per table, which is denoted by IPT_LIMIT. The term configuration is used to denote a set of candidate recommendations, and the candidate recommendations are themselves called basic-configurations. A configuration that satisfies a specified constraint is deemed to be feasible. An optimal set of recommendations (“the optimal configuration”) is the lowest-cost feasible configuration This is known to be essentially impossible to find with certainty in reasonable length of time. The term reasonably-optimal configuration is used to denote a configuration which is likely, but not known with absolutely certainty to be an optimal solution or fairly close to it.
There is a need for a method of selecting a reasonably-optimal configuration that is fast, efficient, scalable and produces high quality results. There is a need for the method to execute quickly. There is a need for the method to consume minimal computational resources during execution. There is a need for the method to deal with large-size inputs, and the quality of results is measured by how close the results are to the optimal.