In database systems, to access, retrieve and process stored data, a query is generated, automatically or manually, in accordance with the application program interface protocol for the database. In the case of a relational database, the standard protocol is the structured query language (SQL), SQL statements are used both for interactive queries for data from the database and for gathering data and statistics. The efficiency of the query method underlying the actual query is dependent in part on the size and complexity of the data structure scheme of the database and in part on the query logic used.
Conventionally, query optimizers can be used on any database, such as a relational database provided by Oracle™ a company with headquarters in Redwood Shores, Calif. Such query optimizers work generally as follows: for each table, column, or index, aggregate statistics are gathered (typically periodically or on demand by a database administrator (“DBA”)). The gathered statistics typically include the total number of rows, average size of rows, total number of distinct values in a column or index (an index can span multiple columns), histograms of column values (which place a range of values into buckets), etc. The optimizer then uses these statistics to decide among a possible set of data access paths.
However, such conventional query optimizers fail when presented with situations such as when data is not homogeneously distributed throughout the database, because the optimizer is unaware that for specific columns the data may have different characteristics.
In the case of table joins, the optimizer's decisions may be even more important—deciding which table to retrieve first can have a profound impact on overall query performance. Here again, by using system-wide aggregate statistics the optimizer might choose a query plan that is incorrect or inefficient when confronted with data that does not conform to the “normal” average of the entire database as determined from the gathered statistics.
Accordingly, it is desirable to provide systems and methods for improving database queries which overcome the above and other problems.