1. Field of the Invention
The present invention is directed to tuning query execution performance of a database system through query optimization with memory I/O (input/output) awareness.
2. Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical relational database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details.
One purpose of a database system is to answer queries requesting information from the database. A query may be defined as a logical expression over the data and the data relationships set forth in the database, and execution of a query results in the identification of a subset of the database. In operation, for instance, the execution of a request for information from a relational DBMS is typically issued by a client system as one or more Structured Query Language or “SQL” queries for retrieving particular data (e.g., a list of all employees earning more than $25,000) from database tables on a server. In response to this request, the database system typically returns the names of those employees earning $25,000, where “employees” is a table defined to include information about employees of a particular organization. The syntax of SQL is well documented, see e.g., “Information Technology—Database languages—SQL”, published by the American National Standards Institute as American National Standard ANSI/ISO/IEC 9075: 1992, the disclosure of which is hereby incorporated by reference.
SQL queries express what results are requested but do not state how the results should be obtained. In other words, the query itself does not tell how the query should be evaluated by the DBMS. Rather, a component of the DBMS called the optimizer determines the “plan” or the best method of accessing the data to implement the SQL query. The query optimizer is responsible for transforming a SQL request into an access plan composed of specific implementations of the algebraic operator selection, projection, join, and so forth. The role of a query optimizer in a relational DBMS system is to find an adequate execution plan from a search space of many semantically equivalent alternatives.
Most modern query optimizers for relational database management systems (RDBMS) determine the best query execution plan for executing a SQL query by mathematically modeling the execution cost for each plan and choosing the valid cheapest plan. An example of a learning optimizer is described in U.S. Pat. No. 6,763,359, which provides an approach for selection of a query execution plan that utilizes statistics collected during query execution that are fed back to an optimizer to allow it to learn of a modeling mistake in a query execution plan.
Though this basic approach to query optimization has not changed dramatically over the years, much of the rest of the environment in which the databases operate has. For example, processors have become exceedingly faster and memories have become many times bigger. The ability to execute increasingly complex queries over very large database (VLDB) environments has grown at a higher rate than the ability to optimize such complex queries. Attention to query optimization is occurring with more emphasis on modeling optimizers to the dynamic nature of the databases.
Since the physical I/O (PIO) is a costly operation to execute, it naturally has an important weight in DBMS classical cost models, which assume that the data is disk-resident and does not fit in the available main memory. However, this assumption is no longer true with the advent of cheap large main memories. Those plans which are considered to be cheap might not be actually so, if the base tables (or part of them) are instead available in the main memory. Further, optimizers that ignore the contents of the buffer pool while optimizing queries can identify sub-optimal plans. While the aforementioned U.S. Pat. No. 6,763,359 generally mentions that sizes of buffer pools can be adjusted in response to a detected lack of buffer resources for certain queries as a way of self-tuning system configuration, it fails to provide a manner of determining the necessary information to perform such tuning nor does it provide an approach for evaluating and utilizing PIO data that would allow the reoptimization of queries by the optimizer.
Accordingly, a need exists for an ability to provide better awareness of buffer pool and memory I/O usage in query optimization. The present invention provides a solution for these and other needs.