The present invention relates generally to the field of databases, and more particularly to evaluating multiple access paths for optimization.
Relational database management systems (RDBMS) manage relational databases that store data by organizing the data into predefined data categories in the form of related tables also known as relations. Relationships between tables in relational databases are identified through the use of primary and foreign keys. A primary key is a special relational database column, or combination of columns, designed to uniquely identify all table records. The foreign key, while also a column or group of columns in a relational database, references the primary key of another table thus providing a cross-reference link between the data in the tables. The remaining table structure includes columns, also referred to as fields, denoting one or more predefined data categories and rows, also referred to as a record that contains a unique instance of the data respective to the columns. As the relational database is a collection of data items organized as a set of formally-described tables, minimal assumptions are included regarding how the data is related, or how the data will be extracted from the database. Therefore, due to the minimal assumptions, the data in a relational database can be accessed or reassembled in many different ways without reorganizing the relational database tables in order to meet the specifications of a user.
The data within a relational database may be accessed by the user through queries (i.e., requests for information). In a relational database query, the query specifies both tables and conditions that link the tables. The query results are generated by accessing relevant data within the relational database and manipulating the data in a way that yields the requested information. To process the query, the data may be acquired by the relational database management system by accessing the relational database in different ways through different data structures and in different orders. Due to the variations, the time expended to process the query depends upon the selected query plan or access path. Query plans identify an ordered set of steps to access data in the relational database whereas an access path refers to the path (e.g., query plan) chosen by the system to retrieve the data after executing a structured query request.
Query optimizers attempt to determine the most efficient way to execute a given query by considering possible query plans. Due to the structure of a relational database, more than one query plan may be possible. Query optimizers form a set of query plans by examining the possible access paths (e.g., primary index access, secondary index access, full file scan) and various relational table join techniques (e.g., merge join, hash join, product join) that may be employed to access the data requested by the query. Cost-based query optimizers evaluate the resource footprint for the set of query plans and utilize the cost assessment as the basis for plan selection. The cost is a number that represents the estimated resource usage for a query plan. Query optimizers determine the cost of query plans by using a mathematical model of query execution costs that rely on estimates of the cardinality or number of tuples (e.g., number of elements of the set) flowing through each edge in a query plan. Query optimizers utilize the costs to estimate a runtime cost of evaluating the query in terms of the number of I/O operations required, CPU path length, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism, and other factors determined from the data dictionary. The query optimizer then selects the query plan with the lowest associated cost to utilize to acquire the data requested by the query.