The present invention relates to a query management subsystem wherein queries are submitted to a parallel database, and more particularly relates to resubmitting queries to a parallel database in the event of a system failure or the reconfiguration of the parallel database wherein the parallel database operates in a degraded state.
Query Management Subsystems (QMS), are known in which large parallel systems comprised of hundreds of computers execute complex queries in a data warehousing environment that is implemented on a parallel cluster of computers. Each query can potentially take hours to complete. It is imperative to manage and schedule effectively the workload as well as guarantee the completion of a query, even for events such as brief system outages or recovery actions. These events tend to occur more frequently in a large parallel cluster. Even more important is the allowing of dynamic changes of policies to be enforced on the queries if the parallel system is functioning in a degraded state or if it is desired to prioritize certain types of queries.
Most sizable data warehouses are built from large parallel computers, where every processing node (computer) in the parallel system will work on a piece of the total database for each incoming query. This imposes a database structure that is segmented across the entire parallel computer. Thus, in a system of N nodes, the query speedup can approach 1/N, compared to a query run on just one node. This performance is critical and germane to why the parallel architecture is employed. In addition, the types of queries run on a data warehouse are different from transactions usually run on an On Line Transaction Processing (OLTP) system. The data warehousing queries are complex, long-running and, because of the database structure, requires multiple node participation in the queries. Thus, since all parallel database nodes are generally used for all warehousing queries, should any one of the nodes fail, all queries running at the instant of the failure will be aborted. Furthermore, these warehousing queries will have to be resubmitted after the parallel database system is made available.
U.S. Pat. No. 5,247,664 issued Sep. 21, 1993 to Thompson et al. for FAULT-TOLERANT DISTRIBUTED DATABASE SYSTEM AND METHOD FOR THE MANAGEMENT OF CORRECTABLE SUBTRANSACTION FAULTS BY THE GLOBAL TRANSACTION SOURCE NODE, discloses a fault-tolerant method and system for processing global transactions in a distributed database system. If a fault occurs in the distributed database system, a transaction management system will suspend the processing of a transaction and renew it when the fault is remedied.
U.S. Pat. No. 5,495,606 issued Feb. 27, 1996 to Borden et al. for SYSTEM FOR PARALLEL PROCESSING OF COMPLEX READ-ONLY DATABASE QUERIES USING MASTER AND SLAVE CONTROL PROCESSOR COMPLEXES, discloses a parallel query processing system comprising a front end processor, a query processing complex attached to the front end processor, and a database on a data repository attached by a first path to the front end processor, and by one or more additional paths, to one or more slave processors within the query processing complex. An external operations command processor within the front end processor quiesces database managers in the slave processors to permit maintenance operations by the front end processor, and restarts the slave processors in read-only mode after maintenance operation completion.
U.S. Pat. No. 5,590,319 issued Dec. 31, 1996 to Cohen et al. for QUERY PROCESSOR FOR PARALLEL PROCESSING IN HOMOGENOUS AND HETEROGENEOUS DATABASES, discloses a query processor for parallel processing which translates an input query which references data stored in one or more homogenous or heterogeneous databases into a plurality of parallel output queries each of which is directed to a single one of the databases or a partition thereof.
U.S. Pat. No. 5,675,791 issued Oct. 7, 1997 to Bhide et al. for METHOD AND SYSTEM FOR DATABASE LOAD BALANCING, discloses a method and system for partitioning a database and for balancing the processing load among processing nodes in a data processing system.
U.S. Pat. No. 5,437,032 issued Jul. 25, 1995 to Wolf et al. for TASK SCHEDULER FOR A MULTIPROCESSOR SYSTEM, discloses a task scheduler for use in a multiprocessor, multitasking system in which a plurality of processor complexes, each containing one or more processors, concurrently execute tasks into which jobs such as database queries are divided.
U.S. Pat. No. 5,613,106 issued Mar. 18, 1997 to Thurman et al. for METHOD FOR PROCESSING AND STORING A TRANSACTION IN A DISTRIBUTED DATABASE SYSTEM, discloses a transaction, consisting of a compilation of changes made to one or more data objects of a database, being transferred to a primary transaction engine of a primary database of processing. If one of the transactions is not successfully processed, the system takes corrective action and optionally notifies the user.
U.S. Pat. No. 5,742,806 issued Apr. 21, 1998 to Reiner et al. for APPARATUS AND METHOD FOR DECOMPOSING DATABASE QUERIES FOR DATABASE MANAGEMENT SYSTEM INCLUDING MULTIPROCESSOR DIGITAL DATA PROCESSING SYSTEM, discloses a system for database query processing by means of xe2x80x9cquery decompositionxe2x80x9d which intercepts database queries prior processing a database management system. The system decomposes at least selected queries to generate multiple subqueries for application, in parallel to the database management system, in lieu of the intercepted query. Responses by the database management system to the subqueries are assembled by the system to generate a final response.
U.S. Pat. No. 5,692,174 issued Nov. 25, 1997 to Bireley et al. for QUERY PARALLELISM IN A SHARED DATA DBMS SYSTEM, discloses a system and method for a computer system having a plurality of database management systems providing a coordinating and assisting function. Each coordinating database management system receives a query from a user application, decomposes the query into multiple parallel tasks, and allocates the parallel tasks to all of the database management systems in the system. Each assisting database management system receives one or more parallel tasks from a coordinating database management system, executes the parallel task and returns the results to the coordinating database management system. The disclosed system dynamically disables a parallel mode on the coordinating database management systems and the assisting database management systems.
U.S. Pat. No. 5,857,180 issued Jan. 5, 1999 to Hallmark et al. for METHOD AND APPARATUS FOR IMPLEMENTING PARALLEL OPERATIONS IN A DATABASE MANAGEMENT SYSTEM, discloses a system and method for locating transaction and recovery information at one location and eliminates the need for read-locks and two-phase commits in a parallel processing database management system.
The present invention is an enhancement for the management and recovery of transactional workloads (herein referred to as queries) in a data warehousing environment that is implemented on a parallel cluster of computers. These large parallel systems are comprised of hundreds of computers that execute complex queries which potentially take hours to complete. It is imperative to manage and schedule the workload effectively, as well as guarantee its completion, even in the event of a brief system outage or recovery action, which tends to occur more frequently in a large parallel cluster. It is also important to allow dynamic changes of policies to be enforced on queries, or to prioritize certain types of queries, if the parallel system is functioning in a degraded state.
The preferred data warehouse implementation is one that views the long-running queries as batch jobs with a Query Management Subsystem (QMS) that can accommodate and manage all incoming query workloads. In the present invention, a Query Resubmittal Mechanism (QRM) is part of the QMS and guarantees the completion of all submitted queries. QRM of the present invention gives the applications and users the perception that the parallel data warehouse database system is never unavailable.
It is a primary object of the present invention to provide a QMS that includes a QRM which will provide the ability to manage a durable work queue of queries running on the system.
It is another object to provide a QRM with the ability to manage a dynamic limit of concurrent queries allowed on the system, beyond which subsequent queries submitted will be queued.
It is another object to provide a QRM with the ability to detect a system outage and/or reconfiguration.
It is another object to provide a QRM with the ability to retain queries aborted as a result of a system outage and/or reconfiguration.
It is another object to provide a QRM with the ability to submit and resubmit queries after the system is detected to be online.
It is another object to provide a QRM with the ability to readjust the limit of active queries when the system is operating in a degraded state.
It is another object to provide a QRM which itself is not a single point of failure.