1. Field of the Invention
The present invention generally relates to data processing and, more particularly, to query analysis.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
A DBMS is structured to accept commands to store, retrieve and delete data using, for example, high-level query languages such as the Structured Query Language (SQL). The term “query” denominates a set of commands for retrieving data from a stored database. These queries may come from users, application programs, or remote systems (clients or peers). The query language requires the return of a particular data set in response to a particular query but the method of query execution (“Query Execution Plan”) employed by the DBMS is not specified by the query. The method of query execution is typically called an execution plan, an access plan, or just “plan”. There are typically many different useful execution plans for any particular query, each of which returns the required data set. For large databases, the execution plan selected by the RDBMS to execute a query must provide the required data return at a reasonable cost in time and hardware resources. In general, the overall optimization process includes four broad stages. These are (1) casting the user query into some internal representation, (2) converting to canonical form, (3) choosing prospective implementation procedures, and (4) generating executable plans and choosing the cheapest of the plans.
Optimization, and execution generally, can be a resource intensive and time-consuming process. Further, the larger the database, the longer the time needed to execute the query. From the end user's standpoint, the undesirable impact of query execution overhead is increased when a plurality of queries is executed. In many data mining and data query scenarios, it is often the case that the end user does not know, at the outset, the precise data they are after. Nor does the user appreciate the performance implications of a running a particular query. In this scenario, the user typically issues a query, examines the results, modifies the query based on analysis of the results and then runs the modified query. In cases where the data being queried is very extensive and complex, this can be a very time and resource intensive process, given the duplicative processing that takes place each time the user submits a new query.
In order to prevent an excessive drain on resources, many databases are configured with query governors. A query governor prevents the execution of large and resource intensive queries by referencing a defined threshold. If the cost of executing a query exceeds the threshold, the query is not executed. However, the provision of a query governor does not address the issue faced by users (particularly novices) who do not understand the connection between a given query and the time and resources required to execute the query. Further, a query governor does not provide users any insight into what aspect of the query lead to the query being rejected by the governor.
Therefore, there is a need for providing users with information about the queries they construct, in a manner that facilitates construction of efficient and effective queries.