1. Field of the Invention
The present invention relates to databases, and more particularly, to algorithms for more efficiently processing database queries using parallelism.
2. Background Art
Parallel database systems improve query performance by using multiple processors and storage devices. They choose and execute query plans specially constructed to utilize parallel resources. For example, the “volcano” query processing system uses various data flow operators in query plans that are organized as trees. See Graefe, Goetz, “Encapsulation of Parallelism in the Volcano Query Processing System,” SIGMOD Conference (1990), pp. 102-111. Each operator in a volcano query tree has the same interface. Thus, an operator in a node of a tree does not need to know anything about its parent or child nodes. Parallel plans in volcano are simply trees that contain special parallel operators. The “exchange” operator is one example. It acts as a boundary between threads or processes, handles the startup of these threads or processes, and buffers rows as they are passed between the threads or processes.
The volcano model is a framework for implementing parallel query evaluation in databases. It does not address the issue of how to generate parallel query plans for particular queries. There are many different ways to design parallel operators, and many different ways to construct plans that use these operators. Given a query and a set of parallel operators, it is difficult to generate a parallel query plan that makes effective use of multiple processors or disks. Thus, what is needed are ways of more easily generating parallel query plans that improve query performance.
One difficulty with constructing parallel query plans is arranging for all available processors to remain busy throughout the execution of the query. See S. Manegold, J. Obermaier, and F. Waas, “Load Balanced Query Evaluation in Shared-Everything Environments”, European Conference on Parallel Processing, pp. 1117-1124, 1997, for one solution to this. The described solution applies to a plan containing only one type of operator (hash join), organized into a specific type of tree (right deep). It also only applies to one execution phase of the hash join (the probe phase). What is needed is a method for obtaining the load balancing benefits of this approach with more operator types, when these operators are organized into arbitrary tree configurations.