Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. Many databases are relational databases, which organize information into formally-defined tables consisting of rows (i.e., records) and columns (i.e., fields), and which are typically accessed using a standardized language such as Structured Query Language (SQL).
Database management systems (DBMS's) are the computer programs that are used to access the information stored in the databases and to process searches, or queries, against the databases. To execute the query, many DBMS's perform query optimization, in which multiple execution plans or access plans for satisfying the database query are examined by a query optimizer to determine the most efficient way to execute the query. One type of optimization that may be utilized is parallel query execution.
Parallel query execution may include parallel processing of multiple queries. In particular, multiple queries may be concurrently executed by various processors and/or computers in a database management system. When a large number of users attempt to access the database, the parallel processing of multiple queries often decreases wait times for individual users and improves overall database throughput. Furthermore, parallel query execution may include parallel processing of an individual query, in which various sub-operations in the query are concurrently executed. Furthermore, parallel query execution may include the processing of a portion of an individual query in serial or in another non-parallel manner while another portion of the individual query is processed in parallel.
To implement parallelism, multiple threads and a buffer (e.g., a temporary buffer) are typically utilized. Threads generally represent independent paths of execution for a database query that are capable of being processed in parallel. A temporary buffer is often a temporary object or unsorted data structure that is utilized to store intermediate rows during execution of a database query. A temporary buffer is often viewed as a serialization point between parallel and nonparallel portions of a query. A buffer pool structure may be utilized to provide the parallel threads with data to consume. In some designs, a buffer pool structure may be organized with other components that are utilized during execution of into an execution tree. In such designs, the data for the threads may be produced from a subtree of the execution tree, which includes the buffer and functions as a feeding tree.
One difficulty associated with implementing parallel query execution in many designs is determining the optimal position to place a buffer in an execution tree. For example, there may be multiple suitable positions for the buffer, and to select a position, the query optimizer may perform costing comparisons to pick the best location based on the total cost to insert the buffer. Once the buffer position for the buffer is chosen, the buffer position is added to the execution plan and the query is executed with the buffer positioned at the chosen position in the execution plan. A buffer can operate as a serialization point in an execution plan, and as a result, improper placement of the buffer can result in sub-optimal performance. If, for example, the rate at which results are produced and added to the buffer is greater than the rate at which the results are consumed, the buffer can overflow and stall upstream query processing. On the other hand, if the rate at which results are produced and added to the buffer is greater than the rate at which the results are consumed, downstream operations can become starved for results.
Special tools and instrumentalities may be utilized to monitor the consumption of data by the threads during execution of the database query (i.e., runtime) to determine how well a buffer is working at a selected buffer position. However, even if a problem is detected, the DBMS is typically “stuck” with the selected buffer position during the current execution. Some conventional techniques may note that the selected buffer position is less optimal so that the next time the query is received that buffer position is not selected by the optimizer during optimization of that query, but generally the DBMS is stuck with the selected buffer position during the current execution of the database query.
Less optimal buffer positions are often selected because buffer positions are selected by the query optimizer based on cost estimation and estimates may be based on errors. As such, a more optimal buffer position may have been neglected due to the incorrect cost estimation, often resulting in data that may be consumed too quickly or too slowly by the threads, a waste in resources, and a decrease in performance.
Therefore, a need exists in the art for an improved manner of implementing parallelism in the execution of database queries, and in particular, for an improved manner of optimizing the locations of buffers used in processing database queries.