Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. As the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS's), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query “optimization”, whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query. In addition, a reduced reliance on runtime interpretation of queries in favor of increased usage of directly-executable program code has improved query engine performance.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
For example, one manner of improving database performance is through the use of parallelism, e.g., by utilizing multiple microprocessors and/or multiple computers to handle a database's management and query execution functionalities. In many instances, such parallelism is limited to parallel processing of multiple queries, i.e., so that multiple queries and concurrently executed by various processors and/or computers in a database management system. Particularly where a large number of users are attempting to access a database at the same time, the parallel processing of multiple queries often decreases wait times for individual users and improves overall database throughput.
However, in other instances, it may be desirable to execute individual queries using parallel processing, so that various sub-operations in the queries are concurrently executed. As a result of utilizing parallelism when executing individual queries, substantially faster execution of individual queries may be obtained.
Implementing parallelism within individual queries, however, is often more problematic than simply executing different queries in parallel, given that many operations within a query are interdependent, i.e., many later operations depend upon the results of earlier operations. Thus, parallelism has to date found only limited applicability in the execution of individual database queries.
One difficulty associated with implementing parallelism within queries, for example, is due to the difficulty associated with dividing ranges of records into discrete subranges in many circumstances. For example, an index probe of a table is not readily adaptable to being broken up into sub-operations because accessing a compacted, space-efficient index data structure is typically not well suited for linear decomposition.
This is in contrast to other types of operations, such as scan probes, which, due to their sequential nature, could be implemented in parallel with much less difficulty, typically just by breaking up the range of records in the search space for the probe into multiple, discrete subranges, and handling those subranges in different threads. Thus, given a scan probe that steps through a table of 100,000 records, separate execution threads could implement such a scan probe by operating upon discrete subranges such as records 1-9,999, 10,000-19,999, etc. Since the collections of records in each subrange would be unique, the uniqueness of the overall result set culled from the results of all of the threads would be ensured.
Therefore, a significant need exists in the art for a manner of implementing parallelism in the execution of individual database queries, and in particular, for a manner of implementing parallelism in individual database queries that incorporate operations that are not readily divisible into discrete subranges.