The present invention relates generally to electronic computers and more particularly to a method of controlling the allocation of resources such as processors and memory buffers in a parallel processor computer in executing a task such as evaluating a query of a database.
A typical parallel processor computer system has a plurality of resources such as processors, memory buffers and the like. These resources can operate simultaneously, thereby greatly improving the performance of the computer when executing a task which has a plurality of subtasks that can be executed independently of each other.
Executing a task usually involves executing a number of subtasks each of which in turn may have several parts. In a computer having only one processor, each step in executing each part of each subtask is performed sequentially. In a parallel processor computer, several such steps can be performed simultaneously, but typically the computer does not have enough resources to go around. Resolving conflicting demands by the various subtasks for access to such resources has been a problem in the design of parallel processor computer systems, especially in the context of using such computer systems to evaluate complicated queries of a database.
Various kinds of parallel-processor database computer architectures have been proposed. See, for example, Ozkarahan, Database Machines and Database Management, Prentice-Hall, 1986; DeWitt et al., "A Single-User Performance Evaluation of the Teradata Database Machine", Technical Report No. DB-081-87, MCC, Mar. 5, 1987; and The Tandem Performance Group, "A Benchmake of Non-Stop SQL on the Debit Credit Transaction", ACM SIGMOD, Chicago, 1988. A principal design goal is to maximize parallelism in such computers. Carey et al., "Parallelism and Concurrency Control Performance in Distributed Database Machines", ACM SIGMOD, Portland, Oreg., 1989.
Most of the proposed architectures for parallel-processor computers use a "shared-nothing" approach; that is, a collection of independent processors each having its own memory and disk are connected via a high-speed communication network. Copeland et al., "Data Placement in Bubba", ACM SIGMOD, Chicago, 1988; DeWitt et al., "A Performance Analysis of the Gamma Database Machine", ACM SIGMOD, Chicago, 1988; and DeWitt et al., "A Single-User Performance Evaluation of the Teradata Database Machine", Technical Report No. DB-081-87, MCC, Mar. 5, 1987. In such an architecture, communication and synchronization overhead are critical factors in overall query performance. Ghandeharizadeh et al., "A Multiuser Performance Analysis of Alternative Declustering Strategies", Proceedings of the Sixth International Conference on Data Engineering, Los Angeles, 1990. "Shared-nothing" computers are particularly well suited to evaluate queries that can be partitioned into independent subproblems, each of which can be executed in parallel with the others.
In contrast, in a "shared-everything" multiprocessor computer there are no communication delays and synchronization can be accomplished using high-performance, low-level techniques. Shared-memory multiprocessor architectures have been described in, for example, Sequent Computer et al., "Combining the Benefits of Relational Database Technology and Parallel Computing", Technical Seminar, San Francisco, Sep. 28, 1988. Computers of this kind are well adapted to evaluate queries that can be partitioned into temporally overlapping subproblems which can share the available computational resources. However, resource contention becomes a major source of performance degradation as many processors attempt to simultaneously access disks and share a limited pool of memory buffers. Stonebraker, "The Case for Shared Nothing", Database Engineering, 9(1), 1986.
There is a continuing need for a way to optimize query execution in a shared-everything computer so as to make the most effective use of the various resources of the computer. Six key issues in developing optimization techniques for multiprocessor computers are discussed in von Bultzingsloewen, "Optimizing SQL Queries for Parallel Execution", Database Query Optimization, Proceedings of the ODBF Workshop, Portland Oreg., 1989. The problem of allocating main memory buffers in uniprocessor query optimization has also been studied. Chou et al., "An Evaluation of Buffer Management Strategies for Relational Database Systems", Proceedings of the 11 th International Conference on Very Large Database Systems, 1985; Effelsberg et al., "Principles of Database Buffer Management", ACM TODS, 9(4), pages 560-595, 1984; Sacco et al., "Buffer Management in Relational Database Systems", ACM TODS, 11(4), pages 473-498, 1986. One approach to integrating buffer management with query optimization has been to apply traditional integer programming techniques with queueing analysis to analytically estimate the optimal execution strategy. Cornell et al., "Integration of buffer management and query optimization in relational database environment", Proceedings of the Fifteenth International Conference on Very Large Data Bases, Amsterdam, 1989.
Traditional query optimization proceeds in three stages. Jarke et al., "Query Optimization in Database Systems", ACM Computing Surveys, 16(2), pages 111-152, 1984; Elmasri et al., Fundamentals of Database Systems, Benjamin/Cummings, Redwood City, Calif. 1989. First, a query in a high-level language such as SQL is translated into an internal representation. This representation describes the equivalent relational algebraic operations to be performed along with any dependencies needed to restrict the overall order of execution. Second, the internal query representation is transformed using heuristic rules (based on theorems in the relational algebra) into an equivalent query that in most cases can be evaluated more efficiently. Third, an execution plan is generated by assigning computational algorithms to the relational operators. This is done by estimating the execution costs associated with each assignment of algorithms to operators and selecting the overall minimum cost assignment for execution. Most practical optimizers restrict the number of assignments that are considered in order to reduce the complexity of this step.
There are two major difficulties in using such traditional optimization techniques in a shared-everything, multiprocessor environment. First, the standard algorithms for computing relational operators are inherently sequential and cannot take advantage of the parallelism supported by the computer. And second, there is no straightforward way to assign processors and other computational resources to distinct subsets of the overall query computation.
Accordingly, it will be apparent that there remains a need for an efficient way to optimize queries in multiprocessor computer systems, especially "shared-everything" computer systems. Stated another way, there is a need for a way to control the allocation of resources in a parallel processor computer system when executing complicated tasks such as evaluating database queries.