Advent of a global communications network such as the Internet has facilitated exchange of enormous amounts of information. Additionally, costs associated with storage and maintenance of such information has declined, resulting in massive data storage structures.
Accordingly, it is important to store such data in a manageable manner that facilitates user friendly and quick data searches and retrieval. In general, a common approach is to store electronic data in a database. A database functions as an organized collection of information, wherein data is structured such that a computer program can quickly search and select desired pieces of data, for example. Commonly, data within a database is organized via one or more tables, and the tables are arranged as an array of rows and columns.
Moreover, such tables can comprise a set of records, wherein a record includes a set of fields. Records are commonly indexed as rows within a table and the record fields are typically indexed as columns, such that a row/column pair of indices can reference particular datum within a table. For example, a row can store a complete data record relating to a sales transaction, a person, or a project. Likewise, columns of the table can define discrete portions of the rows that have the same general data format, wherein the columns can define fields of the records.
In general, each individual piece of data, standing alone, is not very informative. Database applications allow the user to compare, sort, order, merge, separate and interconnect the data, so that useful information can be generated from the data. Moreover, capacity and versatility of databases have grown incredibly to allow virtually endless storage capacity utilizing databases.
Queries in database systems are posed in high level, declarative (non-procedural) languages that are translated into a procedural execution plan. Many relational database management systems employ query optimizers in order to choose a query plan for a given query. The purpose of query optimization is to explore the manners in which this declarative request can be translated into procedural plans and to select the most efficient plan among those explored. The desired query execution plan can consist of a series of primitive database operators, and is typically selected according to a least estimated execution cost.
Query execution in Structured Query Language (SQL) is an integral part of system performance in terms of efficiency, time, accuracy, and cost. The way in which a query is executed can avoid or create problems regarding the quantity of space that needs to be sacrificed to execute the query and the amount of time required to retrieve the query result.
Moreover, in Relational Database Management System (RDBMS), users issue queries against the data that is stored in software constructs. When a user issues a query, the system compiles the query by producing a physical plan that can be executed against the database. For any such query, there are a variety of different physical plans that can be appropriately used. However, each plan is different-some plans will execute the query with a low cost, while other plans will execute the query with a high cost. The step of optimizing a query execution process by costing various plans is inherently expensive.
Therefore, for a class of queries that are similar to one another, one could use the same physical plan to execute those queries and costing could be performed only once for the whole class of queries. In SQL, these queries are identified as parameterized queries, where a user can submit a query with constants missing—those constants are not provided until runtime. In this situation, just one plan will be generated and stored in the plan cache for every query that belongs in that class of queries, effectively avoiding the cost of optimizing more than once for these queries. However, depending on the constant selected, there can be a dramatically wide range of plans corresponding to different costs.
One approach in plan selection is to optimize the first query submission with constants in place and use the selected plan for all subsequent queries in that cost. Another approach in plan selection is to provide the optimizer with a hint when submitting the parameterized query so that optimization will use the suggested constant to generate a plan.
While such approaches ensure that an optimal plan is selected for at least one constant, different plans have different levels of optimality for different constants. Selecting a single plan will almost always be suboptimal for at least some other parameter setting that does not correspond to the initial optimization approach. Thus, a plan that may perform well for some parameter values may perform poorly for other parameter values. In addition, there are parameterized queries for which no plan achieves close to optimal performance.