Business Intelligence (BI) database systems are typically used to process extremely complex queries operating upon massive amounts of data. Processing of complex queries is a fundamental task of modern enterprises which rely upon the collection, integration, and analysis of an expanding sea of information.
Analysts predict that data warehouses will grow in size and complexity, and find that Business Intelligence (BI) is at a point in evolution between manual and automated techniques. Traditionally, enterprises have used BI for offline, strategic decision-making with a small number of expert users analyzing historical data and decision-making cycles last weeks or months. As enterprises become more automated, real-time, and data driven, the industry is evolving toward adaptive, operational BI systems that support online, operational decision-making at all levels in the enterprise using high quality information that is delivered in real-time to analytic applications and integrated into the enterprise's business processes.
For example, an on-line retailer may want to analyze a user's real-time click stream data and up-to-the-minute inventory to offer dynamically priced product bundles. A bank's goals generally include detection and reaction in real-time to fraudulent transactions. A logistics provider would like to dynamically reconfigure shipping routes in response to weather conditions. The worldwide business analytics market, which includes data warehouse platform software, BI tools, and analytics applications, is expected to have steady growth in the coming years.
Workload management is particularly useful to data warehouses, for the same reasons that workload management is especially challenging in the data warehouse context. BI queries exhibit a huge variance in response times. Most queries are known to execute in under a minute, but some small number of queries may require hours of execution time. According to conventional wisdom, presence of even a few poorly optimized queries can significantly impact the performance of a data warehouse system by dominating resources that could otherwise be used by properly functioning queries. Accurate estimation of the duration of a long-running query is not straightforward. Although customers may have service level agreements (SLAs) that set job deadlines and specify financial consequences for failing to meet the deadlines, some deadlines are not necessarily explicit linking of SLAs to deadlines may be difficult or impossible. Furthermore, prediction of runtime characteristics of a database query can be extremely difficult because resource contention with other queries running in the system concurrently can cause a query to spend significantly more time waiting on one or more system resources than is spent actually using the resources. Such contention is also extremely difficult to predict.
Query execution has traditionally been monitored using query progress indicators, parameters formed as part of workload execution management, and query runtime statistics. Query progress indicators are tools that attempt to quantify as a fraction the work that a running query has completed over the amount of work the query is likely to complete in total. Most commonly, progress indicators are based on usage of intermediate result size as a direct reflection of progress. For example, statistics including output cardinality (count) and average tuple size can be collected to determine progress. Some techniques distinguish between classes of physical operators based on how effective estimation methods are for different classes. Specifically, accuracy of estimation can be improved when excluding physical operators that perform nested iteration. Query progress indicators have several disadvantages. First, query progress indicators rely on accurate counts of the tuples processed by various operators and thus require developers to instrument the database core engine in order to count the tuples input and emitted from every operator. Second, different types of operators process tuples at different rates so that tuple-count based progress indicators require a model of how the time spent to process a query is divided among various types of operators. Thus, per-operator models of tuple processing rates and a model of mutual interactions among the per-operator models within processing of a single query are required for analysis. Third, query progress indicators do not reflect measures of work that are independent of tuples processed. For example, analysis of a workload management system may depend on whether a given query has already completed the disk-I/O portion of work, information that is not available through usage of a tuple-count based query progress indicator.
Workload execution management is implemented in many commercial database systems and includes various techniques and systems for dealing with problem queries. For example, HP-UX Workload Manager, IBM Query Patroller for DB2, SQLServer Query Governor, Teradata's Dynamic Workload Manager, and Oracle's Database Resource Manager all include functionality to control or address queries that exceed a limit on estimated row counts, processing time, or place a limit on the number of join operations that can appear in a plan. IBM's Query Patroller for DB2 and Oracle's Database Resource Manager enable a system administrator to define usergroups to which a static priority and a share of system resources for each group are assigned. The higher the priority of a group, the more resources are allocated. However, the static prioritization is not associated with response time requirements or service level agreement (SLA) conformance. Similarly, SQLServer Query Governor prevents queries with estimated query costs that exceed a user-set upper cost limit from starting, as opposed to stopping the queries after reaching a predefined limit. These commercial attempts at managing long-running queries have required one or more of the following: (1) absolute limits on resource usage (for example, not admitting a query or stopping a query that exceeds a limit on estimated row counts, processing time, or placing a limit on the number of join operations that can appear in a plan), and (2) capability to obtain statistics such as actual input and output cardinalities. Obtaining such statistics can be prohibitively expensive, placing a great load on a running system.