1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to improving an access path selection for Structured Query Language with variables in a relational database management system.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables, which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. The tables are typically stored on random access storage devices (RASD) such as magnetic or optical disk drives for semi-permanent storage.
RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO). The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages, such as C and COBOL. SQL allows the user to manipulate the data.
The definitions for SQL provide that a RDBMS should respond to a particular query with a particular set of data given a specified database content, but the method that the RDBMS uses to actually find the required information in the tables on the disk drives is left up to the RDBMS. Typically, there will be more than one method that can be used by the RDBMS to access the required data. The RDBMS will optimize the method used to find the data requested in a query in order to minimize the computer time used and, therefore, the cost of performing the query.
The RDBMS software determines an access path for an SQL statement during a bind process. SQL compilation is called a BIND process, and the output of the BIND process is a plan, which is a compiled run-time structure used for executing the SQL statement. The plan includes access paths, which are the paths the RDBMS uses to get to the data that SQL statements request. During the BIND process, an optimizer of the RDBMS software selects an access path for a SQL statement. The access path is the key to determining how well an SQL statement performs.
Most RDBMS such as DB2, Informix, Oracle and MS SQL Server implement so called cost-based optimizer. The optimizer is a component that, based on the application request specified in form of SQL statement, selects, i.e., generates an optimal access path to the requested data. The access path details are stored in a form of an executable code either on disk (for static SQL) or in the processors cache (for dynamic SQL). This code is referenced during subsequent statement executions. This way the RDBMS saves significant processor resources because, in general, the optimization process is many times (for a typical Online Transaction Processing (OLTP) workload, hundreds of times) more expensive that the statements execution itself.
In order to select the optimal access path, the cost-based optimizers (unlike rule-based ones) take into account the current status of some environmental variables. The most important of these variables is the statistics such as cardinality, size and frequency distribution of the database objects that need to be accessed. While the cardinality (the number of distinct values a database object has) and its size can always be available to the optimizer, the frequency distributions of table columns' values are often unknown. The reason is that the so-called host variables regularly change with each statements execution.
U.S. Pat. No. 6,006,220 by Donald J. Haderle et al., assigned to IBM Corporation, Armonk, N.Y., US, filed Sep. 30, 1997, issued Dec. 21, 1999, “Determining the optimal access path for a query at execution time using an actual value for each variable in a query for estimating a filter factor” states that, currently, when an SQL statement includes a variable, the default filter factor, also known as selectivity, selected for the SQL statement is typically not optimal, and, therefore, the access path that is determined is not necessarily optimal. Therefore, a method is suggested of executing a query in a computer, the query being performed by the computer to retrieve data from a database stored on a data storage device connected to the computer, the method comprising the step of: determining an optimal access path for the query at execution time using an actual value for each variable in the query to estimate a filter factor, wherein each variable is a special register.