1. Technical Field
The description generally relates to workload management and, more particularly, to systems and methods for resource-adaptive workload management.
2. Description of the Related Art
Workload management for Relational Database Management Systems (RDBMS) has been extensively studied in the database literature. Every major commercial RDBMS comes with workload management tools. However, due to a lack of information about the progress of queries running in the RDBMS, the existing tools cannot always make intelligent decisions.
For example, consider the following scheduled maintenance problem. At time 0, planning begins for the scheduled maintenance (e.g., deciding the maintenance start time t, which queries should have their execution blocked, and so forth). The maintenance may involve installing some new software on the computer, or adding several new data server nodes to the parallel RDBMS. The maintenance is scheduled to begin at time t. A common practice is to perform the following two operations (designated O1 and O2):
O1: Starting from time 0, new queries are no longer allowed to enter into the RDBMS for execution.
O2: The existing queries in the RDBMS are kept running until time t. At time t, we abort those queries that have not finished execution. This is lost work, since those queries need to be rerun after the RDBMS is restarted.
One issue to be addressed in the preceding scenario is how can the maintenance time t be chosen to minimize the amount of lost work without over-delaying the maintenance. In general, the state-of-the-art RDBM workload management tools do not know which queries can finish by time t. As a result, they cannot provide much aid in suggesting a “good” maintenance time. Rather, the database administrator (DBA) needs to guess an arbitrary maintenance time that he or she thinks is appropriate. However, if it can be estimated which queries can finish by time t, then more intelligent decisions can be made to prepare for the scheduled maintenance. For example, in this case, the operation O2 can be replaced with the following two operations (designated O2′ and O3):
O2′: The queries that cannot finish by time t are estimated. These queries are aborted at time 0. Note that aborting the queries will reduce the load on the RDBMS and, thus, change the estimate about which queries cannot finish by time t.
O3: Other queries in the RDBMS are kept running. Suppose at time t, some of these queries have not finished execution. Such a scenario is possible if our estimation has errors. Then, at time t, the queries that have not yet finished are either aborted or allowed to run to completion. The appropriate action depends on both the application requirement and the estimate of how soon those queries are going to finish subsequent to time t.
Compared to operation O2, operations O2′ and O3 have the following advantages: (1) even for the same maintenance time t, by aborting some “hopeless” queries at time 0, more queries can be allowed to finish by time t; (2) the amount of lost work can be controlled by adjusting the maintenance time t.
As a second example, suppose that for some reason, the DBA needs to speed up the execution of a target query Q. The DBA decides to do this by choosing one running query (the victim query) and blocking its execution. In this case, a common approach is to choose the victim query to be the heaviest resource consumer. However, if it happens that this victim query will finish quickly, then blocking the execution of this query will not speed up the execution of the target query Q as much as blocking some other query that has a longer remaining execution time. In this example, if the remaining execution times of the running queries are estimated, the choosing of a victim query that is about to finish can be avoided.
From the above discussion, it can be seen that it is desirable to achieve the following two goals: (1) give the workload management tool more information about the remaining execution time of the running queries; and (2) let the workload management tool use this information to make more intelligent decisions.
Regarding the first goal, recently progress indicators for SQL queries in RDBMSs have been proposed in the following, all of which are incorporated by reference herein in their entireties: Chaudhuri et al., “Estimating Progress of Execution for SQL Queries”, Special Interest Group on Management Of Data (SIGMOD) Conf., June 2004, pp. 803-814; Luo et al., “Toward a Progress Indicator for Database Queries”, SIGMOD Conf., June 2004, pp. 791-802; and Luo et al., “Increasing the Accuracy and Coverage of SQL Progress Indicators”, Proc. 2005 Int. Conf. on Data Engineering (ICDE′05), Tokyo, Japan, April 2005, pp. 853-864. In each of the preceding references, for an SQL query, a progress indicator keeps track of the work completed and continuously estimates the remaining query execution time. Suppose progress indicators are integrated into the workload management tool. Then at any time, for each query running in the RDBMS, the workload management tool can estimate the amount of remaining work and the remaining query execution time this is the extra information about the state of the query (relating to the first goal above).
It is to be noted that the progress indicator described in the above-referenced article entitled “Estimating Progress of Execution for SQL Queries” does not try to predict the remaining query execution time.
The progress indicators in the above-referenced articles entitled “Toward a Progress Indicator for Database Queries” and “Increasing the Accuracy and Coverage of SQL Progress Indicators” estimate the remaining query execution time in the following way. For a query Q, at any time, the progress indicator uses a query execution plan and an intermediate result generated thus far to estimate the remaining cost c that is measured in U's. Each U represents one unit of work that is equal to that required to process one page of bytes. The progress indicator also monitors the current query execution speed s. The remaining query execution time t is estimated as t=c/s.
From the above description, it can be seen that in estimating the remaining execution time of a query Q, the above-referenced articles entitled “Toward a Progress Indicator for Database Queries” and “Improving the Accuracy and Coverage of SQL Progress Indicators” do not consider the predicted remaining execution time for concurrently running queries.
Accordingly, with respect to the first goal, it would be desirable and highly advantageous to have an apparatus and method for estimating remaining query execution time by at least considering the predicted remaining execution time for concurrently running queries.
Moreover, with respect to the second goal, it would be desirable and highly advantageous to have a system and method for enabling a workload management tool to use additional information about the remaining execution time of running queries to render more intelligent decisions regarding managing the workload.
It is to be noted that transaction scheduling has been studied in a real-time database system. See, for example, Abbott et al., “Scheduling Real-time Transactions: a Performance Evaluation”, Very Large Databases (VLDB) Conf., pp. 1-12, 1988; and Abbott et al., “Scheduling Real-Time Transactions with Disk Resident Data”, VLDB Conf., pp. 385-396, 1989. Regarding the preceding transaction scheduling articles, the main concern is to meet deadlines rather than to maximize resource utilization. Moreover, most real-time systems are memory resident and the jobs there can be finished in a short amount of time (say, less than a few seconds). Hence, they need special time-cognizant protocols (e.g., to handle critical sections). Further, many real-time systems use hard deadlines. As a result, the jobs there are usually pre-defined (i.e., “canned” jobs). Also, almost all jobs there have deadlines.
Accordingly, it would be desirable and highly advantageous to have an apparatus and method that maximizes resource utilization, does not require special time-cognizant protocols, and is capable of handling non-fixed deadlines.