1. Field of the Invention
The present invention generally relates to query execution management in a data processing system and, more particularly, to optimizing execution of queries against one or more databases in a data processing system.
2. Description of the Related Art
Complex computing systems may be used to support a variety of applications. One common use is the maintenance of databases, from which information may be obtained. Databases are computerized information storage and retrieval systems. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. Another type of database is a distributed database that can be dispersed or replicated among different points in a network.
Regardless of the particular architecture, a requesting entity (e.g., an application or the operating system) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL) in the case of a relational database. Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data in a database, and so forth.
Queries and, consequently, query workload can consume significant system resources, particularly processor resources. The system resources consumption of a query against one or more databases depends on the complexity of the query and the searched database(s). For instance, assume a large data warehouse that runs in a production environment for its intended use in the day-by-day business of a company. The production environment may support query requests submitted to the large data warehouse. The data warehouse can include large sets of data for data mining environments such as insurance claims, property listings and the medical industry. The large sets of the data can be organized, for instance, in a relational database schema in large distributed databases arranged in a distributed environment. In the distributed environment, a given query may be executed against multiple databases. Accordingly, executing queries against the large data warehouse may lead to searches that consume significant system resources. Furthermore, significant amounts of time can be required for completion of such queries. Some queries can literally take hours for data gathering, retrieving and sorting. Moreover, such queries can be executed repeatedly against the large data warehouse. For instance, assume a researcher who frequently performs an identical search for data in the data warehouse by repeatedly issuing the same query. The first execution of the query would result in an initial query result. The second and all subsequent executions will in all likelihood be an attempt to find new data in the data warehouse that matches the original query. The new data can be added to or removed from the data warehouse by update, delete or insert operations on the data in the data warehouse. Assume now that the query consumes significant system resources and that no new data is added to or removed from the data warehouse between different executions of the query. However, each time the query is re-issued, it will be executed against the large data warehouse and consume the significant system resources. Therefore, one difficulty when dealing with query requests against large data warehouses in distributed data environments is ensuring an acceptable turnaround. Another difficulty is ensuring successful completion of a query. This applies especially to queries that are long running and involve multiple databases in a distributed data environment as well as to queries that are repeatedly executed and consume significant system resources.
Accordingly, a number of techniques have been employed to deal with these difficulties. For instance, additional hardware may be allocated to ensure that queries are satisfied with adequate response times. Moreover, large database applications have query optimizers which construct search strategies. An optimizer is an application program which is intended to construct a near optimal search strategy for a given set of search parameters, according to known characteristics of the database, the system on which the search strategy will be executed, and/or optional user specified optimization goals. But not all strategies are equal and various factors may affect the choice of an optimum search strategy. However, in general such search strategies merely allow for an improved use of available hardware/software components to execute respective queries.
A major drawback of these approaches is that they generally lead to less than optimal utilization of computing resources. Moreover, they usually require extra computing resources that are not fully utilized, except for peak query workload periods. Furthermore, these approaches are not suitable to optimize execution of queries that are repeatedly issued against databases.
Therefore, there is a need for an effective query execution management in a data processing system for optimizing execution of queries that are repeatedly issued against one or more databases in the data processing system.