1. Field of the Invention
The present invention generally relates to query execution management and, more particularly, to managing execution of queries against database samples.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
A DBMS is structured to accept commands to store, retrieve and delete data using, for example, high-level query languages such as the Structured Query Language (SQL). The term “query” denominates a set of commands for retrieving data from a stored database. These queries may come from users, application programs, or remote systems (clients or peers). The query language requires the return of a particular data set in response to a particular query but the method of query execution (“Query Execution Plan”) employed by the DBMS is not specified by the query. The method of query execution is typically called an execution plan, an access plan, or just “plan”. There are typically many different useful execution plans for any particular query, each of which returns the required data set. Selecting a particular plan from the plurality of possible plans is the job of an “optimizer”. For large databases, the execution plan selected by the optimizer to execute a query must provide the required data at a reasonable cost in time and hardware resources.
Optimization, and execution generally, can be a resource intensive and time-consuming process. Further, the larger the database, the longer the time needed to execute the query. From the end user's standpoint, the undesirable impact of query execution overhead is increased when a plurality of queries is executed. In many data mining and data query scenarios, it is often the case that the end user does not know, at the outset, the precise data they are after. Nor does the user appreciate the performance implications of running a particular query. In this scenario, the user typically issues a query, examines the results, modifies the query based on analysis of the results and then runs the modified query. In cases where the data being queried is very extensive and complex, this can be a very time and resource intensive process, given the duplicative processing that takes place each time the user submits a new query.
Given the increasing demands by users for faster execution, various efforts are being directed to returning results to users more quickly. One known approach is to utilize sampling. In general, sampling refers to the execution of queries against a subset of the entire database. In this way, query results can be identified and returned more quickly, since the queries are only being executed against a portion of all available data. One of the disadvantages of sampling is that the result often reflects an approximation rather than an exact answer. However, in many situations, the approximations are adequate for practical purposes. Thus, sampling techniques may provide users an adequate approximate response in a fraction of the time needed to get the exact answer.
Despite the utility of known sampling techniques, these techniques do not provide sufficient flexibility to accommodate certain situations. For example, query execution against a sample of a database is typically performed in single iterations. In other words, when a user submits a query for execution, the RDBMS will select a sample of the database, execute the query and then return the results to the user. The user then inspects the results and may determine to execute the query against a different sample of the database. This manual process may go on over multiple iterations until the user is satisfied with the result. However, this is a time-consuming, manual process.
Therefore, there is a need for query execution sampling techniques providing additional functionality and flexibility.